0% found this document useful (0 votes)
1 views37 pages

Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning

The article presents a two-phase model for detecting fake news using feature extraction, natural language processing, and deep learning techniques. It addresses challenges in existing datasets, such as lack of subject diversity and insufficient information, by extracting new structural features and employing a curriculum learning strategy. The proposed model outperforms benchmark models in accuracy and effectiveness in fake news detection.

Uploaded by

NUSRAT HUSSAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views37 pages

Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning

The article presents a two-phase model for detecting fake news using feature extraction, natural language processing, and deep learning techniques. It addresses challenges in existing datasets, such as lack of subject diversity and insufficient information, by extracting new structural features and employing a curriculum learning strategy. The proposed model outperforms benchmark models in accuracy and effectiveness in fake news detection.

Uploaded by

NUSRAT HUSSAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/368799205

Fake News Detection Using Feature Extraction, Natural Language Processing,


Curriculum Learning, and Deep Learning

Article in International Journal of Information Technology & Decision Making · February 2023
DOI: 10.1142/S0219622023500347

CITATIONS READS

5 592

3 authors:

Mirmorsal Madani Homayun Motameni


Islamic Azad University of Gorgan Islamic Azad University, Sari
14 PUBLICATIONS 57 CITATIONS 66 PUBLICATIONS 689 CITATIONS

SEE PROFILE SEE PROFILE

Reza Roshani
Technical and Vocational University of Iran
5 PUBLICATIONS 60 CITATIONS

SEE PROFILE

All content following this page was uploaded by Reza Roshani on 23 August 2023.

The user has requested enhancement of the downloaded file.


2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

International Journal of Information Technology & Decision Making


(2023)
c World Scientific Publishing Company
DOI: 10.1142/S0219622023500347

Fake News Detection Using Feature Extraction,


Natural Language Processing, Curriculum Learning,
and Deep Learning

Mirmorsal Madani
Department of Computer Engineering
Gorgan Branch, Islamic Azad University, Gorgan, Iran

Homayun Motameni∗
Department of Computer Engineering
Sari Branch, Islamic Azad University, Sari, Iran
homayun [email protected]

Reza Roshani
Department of Computer Engineering
Technical and Vocational University (TVU)
Tehran, Iran

Received 20 December 2021


Revised 21 February 2023
Accepted 23 February 2023
Published 6 April 2023

Following the advancement of the internet, social media gradually replaced the tradi-
tional media; consequently, the overwhelming and ever-growing process of fake news
generation and propagation has now become a widespread concern. It is undoubtedly
necessary to detect such news; however, there are certain challenges such as events,
verification and datasets, and reference datasets related to this area face various issues
such as the lack of sufficient information about news samples, the absence of subject
diversity, etc. To mitigate these issues, this paper proposes a two-phase model using
natural language processing and machine learning algorithms. In the first phase, two
new structural features, along with other key features are extracted from news samples.
In the second phase, a hybrid method based on curriculum strategy, consisting of statis-
tical data, and a k-nearest neighbor algorithm is introduced to improve the performance
of deep learning models. The obtained results indicated the higher performance of the
proposed model in detecting fake news, compared to benchmark models.

Keywords: Deep learning; feature extraction; fake news; curriculum learning.

∗ Corresponding author.

2350034-1
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

1. Introduction
Fake news refers to false news or incorrect information that follows a specific
agenda and is propagated in a society with the purpose of deceiving the audience.
Accordingly, false information is generated and spread among people in skillful and
attractive ways so that they appear to be true. Regrettably, insufficient knowledge
of people about the media1 leads them to believe fake news which, in turn, sig-
nificantly influences their overall approach to the presented issue. Notably, fake
news generation is not a new phenomenon and dates back to a time prior to the
emergence of the internet.2 Nonetheless, as the internet was further developed and
the traditional media gave way to social media, the increasing and overwhelming
process of generating and spreading this type of news has become a widespread con-
cern. Fake news can be classified into different categories including religion-based
news, politics-based news, and news related to important figures.3 Fake news cov-
ers a variety of topics such as racism, humor, conspiracy, and economy, and often
serves the purpose of creating fear in the society. Ultimately, the detection of fake
news is a necessity as it can potentially lead to serious problems in the society.4
Social media have been shown as the primary platform for fake news.5 Given the
fact that approximately 47% of Americans have pointed to social media as their
dominant news source, the prevalence of fake news on social media and its implica-
tions appear to be considerably clear.6 Consequently, a suitable safeguard system is
definitely required. However, implementing such a system faces several challenges
and different issues including events, consumption, verification, diversion, and ref-
erence datasets.1 Despite the introduction of numerous methods by different studies
to solve these issues, several problems still remain regarding fake news detection
with high accuracy; subsequently, researchers are attempting to present methods
that involve higher performance levels.7,8 Considering the significance of fake news
detection, it remains an open issue among scientific communities and researchers
continue to conduct studies in this area. This study was also carried out to serve
the same purpose. Today, given the considerable capabilities of machine learning
models, strategies for employing these models in text classification are prevalent.
On the other hand, the issues present in reference datasets in the area of fake news
detection reduce the performance of said models. As a result, this study seeks to
examine these issues and offer an advanced model to mitigate them. After review-
ing many related common datasets, four reference datasets were selected which are
described in “Datasets” section; each dataset indicated specific problems regard-
ing fake news detection. Certain problems found in reference datasets are listed as
follows:

• Lack of subject diversity; the news samples present in the datasets are mostly
about a specific subject such as politics (e.g., Fake or Real News Dataset);9
• The short length of the news samples (e.g., Liar Dataset);10

2350034-2
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

• Lack of access to sufficient information and features required for the news samples
such as headlines and news sources (e.g., FakeNewsNet and ISOT Dataset);1
• And imbalanced datasets (e.g., FakeNewsNet Dataset).
All of these problems can raise the difficulty of the relevant datasets.11 For instance,
the difficulties (Eq. (1)) of the Liar and the FakeNewsNet datasets are 0.68% and
49.3%, respectively.
(#matched samples in dataset)
difficulty = 1 − . (1)
(#dataset samples)
First, to discover the causes behind the problems listed above, a number of the
referenced datasets were investigated and the following results were obtained:
• Several researchers have collected news samples and produced datasets related to
their subject of research. For example, the “Fake or Real News” dataset pertains
to the 2016 US elections. Therefore, a number of produced datasets in this area
is limited to a specific subject(s).
• One of the methods of fake news generation is to adopt only a part of the news
body. Moreover, a large number of the news samples present in the datasets are
tweets written on Twitter or materials posted on Facebook. As a result, certain
datasets such as the “Liar” dataset contain news samples with short lengths.
• In many cases, the privacy policies as well as the rules governing social media
do not allow the propagation of further information about news samples. Conse-
quently, the information available on certain datasets such as the “FakeNewsNet”
and “ISOT” is insufficient.
• In real-life stories, such as fraud detection, reject inference, and credit evalua-
tion,12 the number of real news about a specific subject (e.g., only political news
or only economic news) is greater than the number of fake news. Therefore, col-
lecting new samples about a specific subject can lead to imbalance in the dataset.
To mitigate these problems, researchers have employed a variety of strategies
which, along with their shortcomings, are elaborated in the following. In many
studies, described in detail in Sec. 2, the authors have employed data-level strate-
gies including (i) using only one part of a dataset such as selecting news samples
related to a single subject,13 (ii) combining two datasets to solve the imbalance
problem in datasets,14 (iii) combining the Liar, and the Fake or Real datasets to
mitigate the problem of short news samples, increasing the average length of the
news samples from 18 to 644 words,15 and (iv) eliminating news samples that can
reduce the performance of classification algorithms.13 Though data-level methods
can help to enhance the performance of machine learning algorithms, particularly
with regards to the dataset imbalance problem, they face their own challenges. For
example, the elimination or duplication of news samples in the dataset may result
in the removal of important information or the generation of artificial data. On the
other hand, in a real-world situation, access to other data to be combined with

2350034-3
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

the main dataset may not be available. In addition, the manipulation of the main
dataset eliminates the possibility of assessing the ability of machine learning algo-
rithms under real conditions. There are certain researchers who have attempted
to increase the performance of machine learning models in fake news detection by
changing the architecture of these models or integrating several models together.
A number of algorithm-level algorithms can be found in Refs. 16–20. For instance,
Kaliyar et al.20 proposed a model called FakeBERT which was a combination of
Convolutional Neural Network (CNN) and BERT algorithms in line with using the
advantages of both algorithms to raise the accuracy of their model. “Curriculum
Learning” (CL) is another algorithm-level technique inspired by human learning
principles that is employed nowadays to improve the performance of deep learning
models.21 Nevertheless, it was first introduced by Bengio et al.22 as a training model
for machine learning models. In fact, this model is considered a proper substitute
for conventional training based on random mini-batch sampling. Certain benefits of
this method include enhanced training process performance in terms of convergence
speed and accuracy.23 Considering the deep learning components, this technique
can be employed at various levels including the task, model, data, and performance
metrics. The idea behind this technique involves indicating the difficulty level of
training data and arranging them based on a specific order (mostly from easy to
hard58 in terms of being used in deep learning models). Techniques based on the CL
have been proposed in numerous studies such as in Refs. 24–26, 28, and 29. Notably,
the majority of studies were conducted in the area of image classification and com-
puter vision, while less attention has been paid to the application of this strategy in
text classification. In many studies, the authors used feature-based methods to over-
come problems including the lack of sufficient information in datasets and the short
length of news samples (e.g., Refs. 17 and 31–33). They extracted the key features
from the news body and headlines in datasets. Results show that the extraction of
key features from texts can improve the overall performance of machine learning
models. However, in the majority of related studies, authors have solely focused
on extracting style, surface and polarity features using Natural Language Process-
ing (NLP) tools and have overlooked the structural features. Meanwhile, special
attention is paid to these features in news articles written by humans.34 Today, a
significant portion of fake news is generated by machines, based on altering real
news texts. Machines tend to overlook these features, as opposed to humans who
write news articles. Consequently, the extraction of these features can enhance the
performance of machine learning models in fake news detection. Examinations into
studies on fake news detection and their findings (addressed in Sec. 2) demonstrate
that techniques that employed a combination of the aforementioned strategies per-
formed better than those techniques in which a single strategy was employed. In
general, the shortcomings mentioned above were the motivations behind this study
to propose an advanced two-phase process models based on a combination of Fea-
ture Extraction (FE) and CL strategies. This resulted in a superior model than
those in benchmark studies, in terms of performance metrics such as accuracy and

2350034-4
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

area under the Receiver Operating Characteristic (ROC) curve (AUC). In the first
phase of the proposed model, important features were extracted from the body and
titles of the news samples. In the second phase, a method based on the CL strategy
was employed to reduce dimension and sort news samples. In the FE phase carried
out using the NLP, a number of common key features along with two new structural
features (i.e., called coherence and cohesion) were extracted from the news sam-
ples and were assumed as metadata. Other studies on examining text readability34
and producing believable fake texts35 have employed these features as well, while
they have remained unused in research on fake news detection. The reasons behind
the extraction of said features in this paper are summarized below: As previously
mentioned, a significant portion of fake news is generated by machines through
altering real news texts in news articles written by actual humans. Machines tend
to overlook a set of features as opposed to humans who pay special attention to
them, when writing news articles. For example, Singh et al.34 calculated the coher-
ence in two ISOT and HWB datasets and concluded that the degree of coherence
in real news is higher than that of fake news. Therefore, extracting this feature
can improve the performance of machine learning models in fake news detection.
In addition, in real news articles, there are correlation and coherence between the
news title and the news body. This may not be the case in fake news as it is gen-
erated through procedures such as manipulating real news or adopting only a part
of the real news, etc.; in turn, these procedures might disarray the structure of
sentences, distort the correlation between the news, body and news title, disrupt
the correlation and coherence between the sentences in news body, and so on. For
instance, Karuna et al.35 examined the generation of believable fake news and con-
cluded that the believability of fake news can be increased by applying different
methods such as raising the level of coherence between news title and news body
and increasing the cohesion between the news sentences through creating synthetic
data. The proposed model addresses issues including the lack of subject diversity
in fake news detection datasets, the short length of sentences, and the insufficient
information on news samples. In this study, four different datasets with diverse sub-
jects were used. Each employed reference dataset is focused on a specific subject
or encompasses different subjects. For example, the Liar dataset entails political
news, while the news samples in the ISOT dataset include a variety of subjects
such as the government, left news, the world, etc. As a result, the proposed model
showed an adequate performance in addressing news articles with different topics.
Moreover, in the majority of studies in this area (i.e., Refs. 15, 17, 31, 32, 36,
and 46), researchers have addressed issues including short length of sentences and
the lack of sufficient information on news samples, using strategies such as FE and
algorithm-level techniques. The proposed model in this study also addresses these
issues through FE and CL strategies. The remainder of this paper is structured
as follows. The related works are discussed in Sec. 2. The datasets used in this
paper (which are notably the most popular datasets) are introduced in Sec. 3. The

2350034-5
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

methodology used in the study is presented in Sec. 4 which includes the following
sections: text preprocessing, FE, a hybrid method of statistical data and k-nearest
neighbor (KNN) named (Statistical Data and KNN (SDKNN)), and machine learn-
ing models. The experimental results along with a comparative analysis involving
several state-of-the-art methods used for fake news detection based on deep learn-
ing and classical machine learning are elaborated in Sec. 5. Finally, the conclusion
of the study and suggestions for future research are provided in Sec. 6.

2. Related Work
The proposed model in this study is based on FE from textual datasets and a
method aimed at enhancing the performance of the classical machine learning and
deep learning algorithms. In addition, four common datasets in the area of fake news
detection were used to assess the proposed model. This section entails a summary
of studies related to FE and deep learning, and FE and classical machine learning.
In addition, the related works were classified based on the datasets used, which are
presented in Table 1.

2.1. FE and classical machine learning


In a study by Goseva et al.,40 texts were preprocessed and then converted into a
term–document matrix. Next, the features were extracted using Term Frequency–
Inverse Document Frequency (TF-IDF) and information gain techniques. The
authors used both supervised and unsupervised methods for classification and
adopted an anomaly detection-based learning method described in another work
by Chandola et al.,41 to find abnormal data patterns in unsupervised learning.
This method uses42 to convert the problem into a binary classification problem,
separate noise words from useful words, and create the validation feature vector.43
Finally, the test feature vector was compared to the validation feature vectors using
cosine similarity. The test feature vector would have been chosen as a secure fea-
ture vector, if it had exceeded a threshold limit determined by the Gscore-based
metric presented by Manevitz and Yousef.44 Another method was presented by
Yukari and Ichiro36 to increase classification accuracy based on the most recent
topics. They found frequent words using the TF–IDF technique and ranked them
using the PageRank algorithm. They then used the KNN method to classify news
based on the most up-to-date information in news texts. They also used the k-
means algorithm to compare their method’s accuracy to that of other techniques.45
also attempted to detect fake news on social media. They detected and filtered
misinformation-spreading websites based on users’ activities and performance. The
authors, also attempted to detect fake news by examining the title and news text
and looking for the critical features. The logistic classifier was used as well, for
detection and classification purposes. Among the essential features extracted were
the number of capital letters, the number of digits in the news texts and titles, and
the number of keywords.46 In this paper, the supervised learning approaches is used

2350034-6
Table 1. Summary of related studies on fake news detection.

Study Dataset Extracted features Classifier Best results


Acc Prec F 1-Score Recall AUC

Ref. 22 Liar Lexical/polarity SVM/LR/DT/ 0.6 0.6 0.59 0.6


AdaBoost/NB/ KNN
Ref. 69 Clues/News contents CNN 0.536 0.489 0.352 0.375 0.539
Ref. 69 Clues/News contents Conv-HAN 0.557 0.514 0.565 0.628 0.574
Ref. 69 Clues/News contents Bi-LSTM 0.586 0.554 0.523 0.495 0.607
Ref. 41 News contents/metadata Capsule NN 0.395
Ref. 39 News contents/metadata Bagging/ 0.70 0.70 0.70 0.70
AdaBoost/RF/Extra
Trees/XGBoost
Ref. 69 Clues/News contents BERT 0.619 0.583 0.628 0.596 0.617
Ref. 22 Fake or Real Lexical/polarity SVM/LR/DT/ AdaBoost 0.71 0.72 0.72 0.71
/KNN

2350034-7
Ref. 22 Lexical/polarity CNN/LSTM/Bi-LSTM/ 0.95 0.95 0.89 0.95
C-LSTM/Conv-HAN
Ref. 22 Lexical/polarity/n-grams NB 0.90 0.91 0.90 0.90
Ref. 69 Clues/News contents CNN 0.54 0.498 0.447 0.405 0.52
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Ref. 69 Clues/News contents Conv-Han 0.448 0.443 0.569 0.795 0.436


Ref. 69 Clues/News contents Bi-LSTM 0.586 0.554 0.523 0.495 0.607
Ref. 81 ISOT News contents CNN 0.99 0.99 0.99 0.99
Ref. 81 News contents RNN 0.98 0.98 0.98 0.98
Ref. 81 News contents CNN+RNN 0.99 0.99 0.99 0.99
Ref. 32 Linguistic Inquiry/Word contents Bagging, Boosting 0.99 0.99 0.99 1
Ref. 41 News contents Capsule NN 99.8
Ref. 18 News contents DT 96.2
Ref. 18 News contents SVM 80.1
2nd Reading

(Continued )
Fake News Detection Using FE, NLP, CL, and Deep Learning
Table 1. (Continued)

Study Dataset Extracted features Classifier Best results


Acc Prec F 1-Score Recall AUC

Ref. 18 News contents NB 80.2


Ref. 18 News contents BERT 96.9
Ref. 81 News contents LR 0.52 0.52 0.42 0.5
Ref. 81 News contents RF 0.92 0.92 0.92 0.92
Ref. 81 News contents KNN 0.6 0.67 0.61 0.56
Ref. 37 News contents AdaBoost 0.92 0.91 0.99 0.99
Ref. 17 Stylometric (three feature sets) RF 0.84 0.87 0.82 0.79
M. Madani, H. Motameni & R. Roshani

Ref. 17 Stylometric Bagging/Boosting 0.86 0.86 0.86 0.85


Ref. 17 Hybrid FakeNewsNet (three feature sets) Bagging 0.912 0.93 0.91 0.89
+CBOW(Word2Vec)
Ref. 17 and McIntire Stylometric Boosting 0.954 0.95 0.95 0.95
+CBOW(Word2Vec)

2350034-8
Ref. 17 (three feature sets) LR/RF/Ada 0.914 0.91 0.92 0.93
Ref. 13 CNN + RNN 0.82
Ref. 38 Stochastic Gradient 0.772
LSTM+CNN
Ref. 69 FakeNewsNet Clues/News contents BERT 0.588 0.563 0.628 0.449 0.578
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Ref. 69 CNN 0.54 49.8 44.7 40.5 0.52


Ref. 69 Conv-HAN 44.8 44.3 56.9 79.5 43.6
Ref. 69 Bi-LSTM 58.6 55.4 52.3 49.5 60.7
Ref. 48 Reuters, Kaggle SVM/DT/KNN/ 0.92
LSTM/SGD

Notes: Acc: accuracy, Prec: precision, AUC: area under ROC curve, BERT: bidirectional encoder representations from transformers, CNN: convo-
lutional neural network, DT: decision tree, KNN: k-nearest neighbor, LR: logistic regression, LSTM: long short-term memory, NB: Naive Bayes,
RF: random forest, SVM: support vector machine, Bi-LSTM: bi-directional long short-term memory, SGD: stochastic gradient descent. 73
2nd Reading
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

to detect fake news by extracting specific features from news sources, news texts,
etc. Ultimately, a new set of grammatical and semantic features were introduced for
fake news detection. Younus Khan et al.15 detected fake news using n-gram analysis
and machine learning techniques. They used Term Frequency (TF) and TF-IDF
to extract features, after preprocessing the text. To classify real and fake news,
they used traditional methods such as Support Vector Machine (SVM) and KNN,
as a well-known instance-based learning algorithm47 as well as a five-fold cross-
validation method for evaluation. A specialized team gathered and prepared the
used dataset, as well as the dataset in the study by Horne and Adali,49 and applied
the proposed techniques to both datasets. According to the results, the linear SVM
and TF–IDF FE techniques performed better than other methods. Younus Khan
et al.15 employed traditional algorithms and deep learning to detect fake news using
the Liar and Fake or Real datasets. Using the Sentiment Intensity Analyzer func-
tion78 from the NLTK library in Python, they first extracted lexical features such
as word count, article length, and sentiment analysis. Next, they used the TF–IDF
to extract bigrams and unigrams. They also utilized the Empath tool to extract
important topics and data. They implemented GloVe for word embedding and KNN
for traditional algorithms on the two datasets above. In the Liar dataset, the best
result was obtained from Naive Bayes (NB) with an accuracy of 60%. The NB algo-
rithm showed its best performance on the Fake or Real dataset as well, achieving
an accuracy of 90%.14 detected fake news using text features such as stylometric
features and text-based word vector representation suggested in their own research.
Due to the imbalance in the FakeNewsNet, they combined it with the McIntire
dataset to form a single dataset. This allowed them to create a single dataset with
49% real news and 50.15% fake news. Then, three groups of stylometric features
were created during the FE phase: Group 1 entailed the number of unique words,
complexity and Gunning Fox index; Group 2 encompassed the number of words,
sentences, syllables, and capital letters; and Group 3 consisted of the number of
characters, figures, short words, etc. For fake news detection, they used a variety of
classification methods such as Random Forest, SVM, KNN, and Bagging.50 They
used the Liar dataset and embedding techniques suggested in their own study to
detect fake news. For simplicity purposes, they assumed the target class as binary,
that is, in the form of either real or fake news. They used the Bag of Word (BoW),
TF–IDF, and n-gram methods for embedding and FE, respectively. Subsequently,
they used Min–Max scaling to normalize the numerical features they had generated.
To classify and detect fake news, they used AdaBoost, Extra Trees, Random Forest,
XGBoost, and Bagging methods. After 150 tests, the Bagging method produced the
best results with 70% accuracy.

2.2. FE and deep learning


Pierri and Ceri51 conducted an in-depth investigation into features and datasets for
fake news detection on social media. They discussed the techniques recently used

2350034-9
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

in the literature to detect and classify fake news. According to Pierri and Ceri,51
there are certain challenges in fake news detection which include: (i) difficulty in
detecting fake news from real news, as the former is created using a metric of high
similarity to a real news broadcast in traditional media, (ii) the fast-paced fake
news propagation, (iii) the inability of experts in early detection of fake news, and
(iv) the limitations of social media platforms in authorizing access to information.
The algorithms were then classified based on content, context, or a combination
of both. A number of examined methods included Linguistic Inquiry and Word
Count (LIWC),52 Gated Recurrent Unit (GRU),53 and Long Short-Term Memory
(LSTM).54 Bajaj55 developed a classifier to determine the fake parts of a news
article, using only the title and the text. The datasets used in that study were Kag-
gle56 and Signal Media News.57 The author then converted the text into a feature
vector using GloVe58 with 300 dimensions. Subsequently, he attempted to detect
the fake segments of a piece of news using two-layer feed-forward neural networks,
Recurrent Neural Network (RNN), LSTM, and GRU. Notably, Goldani et al.31 used
capsule neural networks to detect fake news; they also used static and nonstatic
word embedding models for short and medium-to-long news samples. Moreover,
n-grams were applied on the Liar and ISOT datasets for FE. To evaluate the pro-
posed method and compare it to traditional methods such as SVM, the authors
only used the accuracy metric. Ultimately, the best result was obtained using the
ISOT dataset and the nonstatic capsule network with an accuracy of 99.8%. Due to
the problems with the dataset and the short length of the text, the authors concen-
trated on metadata for the Liar dataset. The best result was obtained on “history as
metadata” with an accuracy of 39.5%.59 The authors investigated how closely news
texts corresponded to their respective headlines. To this aim, they created a dataset
by extracting millions of news articles and separating their texts and titles, and then
used a deep hierarchical encoder to detect fake news. A method was also proposed
for summarizing news articles. Younus Khan et al.15 used deep learning methods
to detect fake news. They employed the Liar and Fake or Real datasets. First, they
extracted lexical features such as word count, article length, and sentiment anal-
ysis. Then, they used the TF–IDF to extract bigrams and unigrams. Notably, the
authors used CNN, LSTM, Conv-HAN, etc. models at the character level in their
deep learning models. The Conv-HAN model yielded the best results in the Liar
dataset with an accuracy of 59%.17 In this study a hybrid RNN-CNN deep learn-
ing model is presented for fake news detection. The ISOT and FA-KES reference
datasets were then used to test the model. After preprocessing the news texts, they
divided the dataset into two sets including training (80%) and test (20%). Next,
they used GloVe to embed the data. The proposed method was implemented on the
ISOT dataset with 99% accuracy.32 They utilized machine learning and ensemble
techniques, and extracted a set of LIWC features to classify news articles into two
categories of true and fake. They used the ISOT dataset as well as two Kaggle
datasets.60 Using the LIWC2015 tool, they extracted a total of 93 features, includ-
ing words related to positive and negative emotions in the text, the number of verbs,

2350034-10
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

adverbs, etc. They also divided the data into two groups including training (70%)
and testing (30%). Subsequently, they used different hyperparameters to train var-
ious learning algorithms to achieve the highest classification accuracy possible. To
this end, they applied various ensemble techniques such as Bagging and Boosting
and created two voting classifiers based on these algorithms. They employed perfor-
mance metrics such as accuracy for evaluation purposes. Finally, using the Random
Forest and Perez-LSVM algorithms, they achieved an accuracy of 99% (on average
95.25%) on the ISOT dataset. McIntire9 developed a fake news detection method
based on deep learning and BERT algorithm. They concentrated on searching for
clues in the news contents. After text preprocessing, three classification techniques
were used which included classical classification algorithms, deep learning approach,
and multimedia approach. Following data ingestion and preprocessing, fake news
was classified and detected using the two modules of NLP processing and multi-
media processing. Masciari et al.13 extracted statistical data (e.g., mean, variance,
etc.) from the Liar and FakeNewsNet datasets. The authors omitted news samples
with short lengths (less than ten words) from the Liar dataset; subsequently, this
dataset was reduced by 1675 news samples. In the FakeNewsNet dataset, they only
used political news and eliminated the news collected from the GossipCop website
to mitigate the imbalance in the dataset. Ultimately, the Google BERT model on
the Liar dataset produced the best results with an accuracy of 61.9%. In addition,
the Google BERT model performed best on the FakeNewsNet dataset. Amer et al.18
used classical machine learning models such as SVM, NB, and Decision Tree (DT) as
well as deep learning algorithms including LSTM and GRU to detect fake news on
the ISOT dataset. Also, a set of tests were carried out to compare the performance
of word embedding methods with BERT. The results showed that the performance
of deep learning models using word embedding methods was higher than the BERT
model. Ennejjai et al.16 implemented various operations on three datasets, includ-
ing Fake or Real dataset, for fake news detection. These operations, respectively,
involved preprocessing text features, converting them into numerical vectors via
different methods such as GloVe, Word2Vec, and TF–IDF, and using different deep
learning methods including LSTM, CNN. Agarwal et al.19 attempted to detect fake
news without taking the author’s name, news source, etc. into account. After pre-
processing the text, the authors used GloVe for word embedding and then sent the
numerical vectors to the hybrid model of CNN and LSTM for classification. Kaliyar
et al.20 employed a model called FakeBERT to detect fake news. The architecture
of the model was a combination of CNN and BERT algorithms. The authors tuned
the hyperparameters of CNN to achieve higher performance and managed to reach
99.8% accuracy in detecting fake news on the ISOT dataset.

3. Datasets
In this study, four popular datasets61 were used to test the performance of the
proposed model under a variety of conditions, including news articles with different

2350034-11
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

Table 2. The structural information of the datasets.

Dataset # Fake news # Real news Subject

Liar 5657 7134 Politics


FakeNewsNet 5755 17,441 SocialContext/Spatiotemporal info/News
Fake or Real 3164 3171 Politics
ISOT 23,481 21,417 World/Politics/Government/Middle
East/ US/Left News

subjects, insufficient information about news samples, and news samples with var-
ious lengths. These datasets included the Liar,10 ISOT Fake News,48 FakeNews-
Net,62 and Fake or Real News,9 which are discussed in the following. The structural
information of the datasets is listed in Table 2.

3.1. Liar dataset


Obtained from POLITICAL.com, the Liar dataset consists of 12.8 k news samples
based on 13 types of political features, manually labeled into six different classes by
humans. In this study, the problem was reduced to binary classification by decreas-
ing these six classes to only two which included real or fake. Certain issues present
in this dataset which could reduce the performance of classification algorithms are
listed as follows:
— Many items have null values in certain features.
— Short news samples (on average #18 words) cause a variety of issues, including
overfitting in machine learning algorithms.
— News samples are classified into different categories due to the lack of a single
subject.
Consequently, extracting more features can help boost performance. In this paper,
the TF–IDF was used in this dataset to extract the keywords from news samples.
Then, Doc2Vec and cosine similarity were utilized to assign a subject to each item
based on the semantic similarity between the keywords in each news sample and
the subjects presented for that item. These subject classes were then numerically
replaced and normalized in a [0, 1] interval. Finally, as described in the “Feature
Extraction” Section, the key features of each news sample were extracted similar
to other datasets.

3.2. FakeNewsNet dataset


This dataset was also compiled and prepared at Arizona State University using
popular fake news detection datasets.62 Fact-checking websites such as PolitiFact
and GossipCop were used to create the labels. This dataset contains news content,
social context, and spatiotemporal information, and involves an imbalanced struc-
ture which intensifies performance reduction in fake news classification algorithms.

2350034-12
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

3.3. Fake or real news dataset


This dataset was first introduced by McIntire,9 in which real news samples from
organizations such as the New York Times, NRP, and the 2016 US Election were
combined with fake news samples from Kaggle Fake News datasets. There are 7800
news samples in this dataset and the average number of words in each news sample
is 765 words.

3.4. ISOT dataset


The ISOT dataset was first presented by Ahmed et al..48 Reuters.com and Kag-
gle.com were used to create balanced datasets, which included medium- and long-
length news articles. This dataset entailed political news, world news, government
news, US news, left news, and other news articles.

4. Methodology
The model proposed in this study is illustrated in Fig. 1 and the relevant pseudo-
code is provided in this section. In the first phase of the proposed model, each news
sample from the datasets is partitioned into two parts including textual features
(named “Text”) and relevant features (named “Primitive Metadata”). The prepro-
cessing operation is carried out on “Text”. Then, “FE” is done on the preprocessed
text, and the extracted features would be added to the “Final Metadata”. To create
the same conditions as those of other studies, the dataset was split into two sets
including training (80%) and test (20%). Next, to prevent overfitting during model
training, 20% of the training data was considered as validation. The Doc2Vec para-
graph embedding was used to convert the preprocessed data into numerical vectors
with a fixed length of 400, which was considered as input for the second phase.
In the second phase, the statistical data were used to reduce feature dimension.
Subsequently, the “Input layer” was created through outer-joining the outputs of
the “Dimension reduction”, and “Final Metadata” parts. Then, the news samples
of the “Final Training Set” were sorted in E2H order using the SDK method, and
were sent to deep learning models. Additionally, the “Final Training Set” was con-
sidered as input for the classical machine learning. The components of the proposed
model explained below includes: Text Preprocessing, FE, the SDKNN method, and
classifiers (classical machine learning and deep learning models were used for fake
news detection).
The ds is vertically partitioned into Text (which includes only textual features),
and Primitive metadata (which includes only categorical or numerical features).
Just the Liar dataset is included the primitive metadata. We denote the main
dataset by ds, the number of news samples by N , the number of extracted fea-
tures by M , scoring function by f , and data pacing function by F , loss function
by gi (w).

2350034-13
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

Fig. 1. Proposed model for fake news detection.

4.1. Text preprocessing


Text preprocessing is a critical step, as it provides the tools required to convert
natural language text into machine-readable text. This process involves several
operations such as the removal of irrelevant data, stopping words, and sparse terms,
and turning the text into a usable format for machine learning algorithms.

4.2. Feature extraction


There are several challenges in fake news detection such as the lack of a complete
dataset or issues present within the existing datasets, including short-length texts,
and the absence of key metadata entailing titles and topics. Accordingly, these
difficulties can be partially addressed by extracting key, useful features from the
contents of the datasets in question. The researchers in the majority of studies in

2350034-14
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

Algorithm 4.1 The proposed model is summarized in the form of pseudo-code as


follows
preprocess Text using Beautiful Soup and NLTK //(the libraries of Python)
in the FE phase, extract key features from Text and include them in the final metadata.
Let’s split Text to training set (80%) and a testing set (20%)
convert training set and testing set to numeric vectors using doc2vec embedding
//First step of SDKNN (transformer technique)
for each sample l in doc2vectraining: and doc2vectesting:
E(X−E[x])j
calculate Moms[j] = σj
using Moment-Generating Function
(j=2:6)
input layer=outer join (Moms, Final metadata)
Final Training and Testing Sets is created
Use the Final Training Set as input for Classical machine learning models

//second step of the SDKNN


Input: “Training Set”, “Testing Set”, scoring function: f, pacy function: F
Output: Stored “Training Set”
//Scoring function (f)
for each item l in “Training set”:
neigh=[n][K+1], X=[n][2]
Determine the K nearest neighbors of l and add to neigh1:k+1
for each item l in neigh1[0]:
Determine the rank of l using target classes of l and it’s k nearest neighbors
and add to X[l][1]
ascending sortX (E2H order)

pacy function F (X, W, batch-size, K)


Split the X to K distinct parts (P) based on the X (Ranks)
t=0
while t = 1 :
p=P(M)
if F(t, p) is true:
e=[]
for k in range K:
n=int(—Pk—/ batch-size) //(batch per epoch)
for i in range n:
Bi = batch-size
take a sample
S Bi from Pk
E ∗ = E Bi
train (M, E ∗ , P )
p=P(M)
if li < li−1 :
Update W ∗ (by Eq. (3.1)), E ∗ , M
use the E ∗ as input for deep learning models

this field (e.g., Refs. 17, 31, 32, and 36) have attempted to extract lexical, linguistic,
stylometric, and text polarity features, among others, as the best features that affect
the performance of the classifier. In this study, after preprocessing the datasets, two
operations were carried out during the FE phase; first, the textual features of the
datasets were transformed into numerical vectors with fixed lengths, using Doc2Vec
paragraph embedding, to prepare them to be used in machine learning algorithms.
Second, as the employed datasets entailed the textual features of news body or news

2350034-15
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

title while providing no other key information, a number of important features were
extracted and taken into account as the “Final metadata”.

4.2.1. Paragraph embedding


There are two reasons behind transforming texts into numerical vectors with a
fixed length; first, the datasets implemented in this study included textual fea-
tures, while machine learning methods can only be used on numerical data. Sec-
ond, when extracting semantic and structural features such as unique subjects for
each news sample in the Liar dataset, converting the text into numerical vectors
is essential due to the similarity between sentences and the semantic relationships
between the news body and the news title. In NLP, there are various methods
for mapping words, phrases, or documents onto numerical vectors including BoW
and n-grams (Bag of n-grams), word vector methods such as Word2Vec, FastText,
and GloVe, along with paragraph vector methods such as Doc2Vec and BERT.
Notably, BoW and n-grams are two different types of bags; using the BoW method
breaks the order of words. Due to the presence of similar words, different sentences
will have similar representations. The Bag of n-grams method is not suitable for
high-dimensional data, despite the fact that the word order is preserved in short
texts. Overall, these methods disregard the meaning of the words and the distance
between them. The paragraph vector refers to unsupervised framework for learning
continuous distributed representations of the text. Texts can be of various lengths.
Each paragraph is mapped onto a unique vector.63 The word vector technique is
used in this method. Word vector techniques such as Scikit-learn64 vectorization,
Gensim Word2Vec,65 and FastText are used to convert words into numerical vec-
tors. Notably, the Word2Vec method is unable to represent the words that are not
present in the training dataset. This problem can be partially solved by increasing
the size of the training dataset, hence increasing the dictionary size. In the FastText
method, which is an extension of the Word2Vec method, the words are broken down
into n-grams;28 this would facilitate the representation of infrequently-used words.
Also, several embedding methods proposed for multiplex networks. For example,
the RMNE66 uses the structural role information of nodes to preserve the structural
similarity between nodes in the entire multiplex network. In this paper, Doc2Vec
paragraph embedding was employed as the paragraph vector method to take advan-
tage of Word2Vec, in which similar words are mapped onto a similar location in
the vector space, and to maintain the order of words in a given paragraph (in this
case, the news samples contained in datasets)14,63 so as to convert texts to a set of
400-length numeric vectors.

4.2.2. Extraction of key features


In this phase, some important features are extracted and deemed as “Final meta-
data”. The related algorithm was presented as follow. We denote the main dataset

2350034-16
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

Algorithm 4.2 Extraction of key Features


input: Text (preprocessed Text)
Output: Final metadata
Final metadata=[N][M]
TS=[ ], PS=[ ], TW=[ ]

//Surface features extraction


for each news sample nsi in Text:
TS=Tokenize nsi to Sentences using NLTK
TW=Tokenize nsi to words using NLTK
PS=Part of speech tagging (POS) using NLTK
nsj =the number of sentences, words, characters, adjectives, adverbs, specific nouns, verbs,
proper nouns, etc.
for j in SFj :
Calculate the dependency between SFj and the target class using Spearman’s rank corre-
lation coefficient
//Polarity features extraction
for the news body and title of the news (if it exists):
Calculate the polarity score in PS using SentWordNet
body score=polarity score of news body
title score=polarity score of news title
If body score== title score then matched=1 else matched=0
//Grammatical cohesion calculation
for each news sample (nsi ) in the Text:
n=#TS (the number of sentences)
Rqueue=[ ], F SK+3 = [N ], F SK+4 = [N ]
Queue1=[n ], Queue2=[n ]
Calculate sum of nat(snat ) and nct(snct ) in nsi using Python Spacy library’s NeuralCoref
function
Add Snat to F SK+3 [i]
Add Snct to F SK+4 [i]
//after completion the FE:
Use min-max normalization for F SK+2
//lexical cohesion calculation
t=0
for each sentence j in TS:
Convert j to numeric vector using doc2vec embedding
Add j to Queue1
Add j to Rqueue
ss=0
for k in Rqueue:
ss=ss+cosine similarity (Rqueue[k], Rqueue[k+1])
F SK+5 = ss/n
//Coherence calculation
F SK+6 = cosine similarity (Queue1 [0] and Queue1 [n − 1])
TTS= Tokenize each nsi in news title (if it exist in ds) to Sentences using NLTK
n=#TSS (the number of sentences)
for each sentence j in TSS:
Convert j to numeric vector using doc2vec embedding
Add j to Queue2
end for
sum1 = sum2 = 0

2350034-17
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

for k in Queue2 :
sum1 = sum1 + cosine similarity (Queue1 [0], Queue2 [k])
sum2 = sum2 + cosine similarity (Queue1 [n − 1], Queue2 [k])
end for
F SK+7 = (sum1 + sum2 )/(2n)
Add F SK+6 and F SK+7 to Final metadata

by ds, the number of news samples by N, the number of accurate references by nat ,
the number of null references by nrt , the number of poorly structured sentences by
nrf , the Ring Queue by Rqueue, the well-structured sentences by nst , the number
of true anaphoric references by nat , the number of true cataphoric references by
nct , the number of extracted features by M.

Surface features
Numerous features were extracted from the news samples in the applied datasets
(with the exception of the Liar dataset in which, due to the short length of the news
samples, the extraction of these features was impossible) at the levels of paragraph,
sentence, and character. These extracted features included the number of sentences
and words in each paragraph, the number of words in each sentence, the number
of adjectives adverbs, and specific names, the number of verbs in each paragraph
and each sentence, and the average number of characters in words. Finally, those
features in the text that had the highest dependence on the target output (class
label) based on Spearman’s rank correlation coefficient,67 e.g., the number of words,
adjectives, adverbs, specific nouns, and verbs, were maintained and the rest were
removed.

Text polarity features


The political and economic status quo of a community can significantly impact
how a piece of news with a positive or negative polarity is received. Given the
current state of affairs in the community, fake news producers take advantage of
such opportunities and spread believable fake news on that topic. This paper used
the SentWordNet from the WordNet-based Python NLTK library for sentiment
analysis, because text sentiment analysis68 can make the text more structured and
thus contribute significantly to detection and classification processes. Two types of
features were used to calculate sentiment analysis, i.e., polarity scores and polarity
adaptability. For calculating polarity scores, the words in news body and news title
are divided into categories such as adjectives, verbs, nouns, etc. Finally, polarity
scores are calculated as positive, negative, or neutral scores based on the dictionary
for news samples, which we added to the final metadata. The sentiment analysis
of the news title is critical because it can have a greater impact on attracting the
audience’s attention compared to the news text. Since in real news, there must be
a consistency between the sentiments in the body and title of a news sample and

2350034-18
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

since the majority of fake news nowadays is generated by machines, with the news
body and title misaligned, this paper determines if the sentiment analysis of the
news body and title (if present in the dataset) are matched and adds them to the
final metadata.

Structural features
In this work, two features, which are considered structural features and have been
somewhat overlooked in previous studies on fake news detection, were extracted
from news body and news title. These features include cohesion and coherence and
they were calculated based on semantic similarity. Cohesion refers to the relation-
ship between the components of a given text, and it is measured by considering
factors such as reference pronouns, grammar, and linking words including conjunc-
tive adverbs, coordinates, subordinates, and correlative conjunctions. Coherence
refers to the semantic relationship and consistency between ideas in a text. Subse-
quently, the sentences in a text should be semantically linked to its title. In real
news, correlation and coherence exist between the news title and the news body.
For example, Singh et al.34 calculated the coherence in ISOT and HWB datasets
and concluded that the degree of coherence in real news is higher than that in fake
news. Therefore, these features can possibly be used as measures for distinguishing
real news from fake news. On the other hand, coherence may be nonexistent in fake
news because it is produced by operations such as manipulating the real news or
adopting only a part of the real news which, in turn, might disarray the structure
of sentences, distort the correlation between the news body and news title, dis-
rupt the35 between the sentences in news body, and so on. As an example, Karuna
et al.35 addressed believable fake news generation and concluded that the believ-
ability of the fake news could be increased by applying different methods such as
increasing the coherence between the news title and the news body and increasing
the cohesion between the news sentences through artificial data generation. Given
the above-mentioned arguments, these structural features were extracted from the
news samples in this study and then added to the “Final metadata” using the NLP
method as well as the appropriate tools and techniques.

Calculation of cohesion
In this paper, to determine the cohesion in each news sample, the distinguish-
able lexical and grammatical cohesion devices were extracted. Ampa and Basri71
reviewed 809 handwritten articles and reported that more than 69% of cases of
grammatical cohesion devices appear to be related to reference pronouns and con-
junctions. They also explored 1224 student papers and discovered that reiterations
(synonyms, repetition, and antonyms) were employed to create lexical cohesion,
82.84% of the time. To simplify the process in this study, the window size of
each news sample was set to three consecutive sentences. The number of reference

2350034-19
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

pronouns and conjunctions are calculated in the Grammatical Cohesion Devices


section. Synonyms, antonyms, and repetition were addressed based on the results
of the study by Ampa and Basri71 and elaborated in the Lexical Cohesion Devices
section.

Calculation of coherence
In this study, coherence is calculated by measuring the semantic relationship
between the sentences at the beginning and the end of each news sample, rather
than checking the entire sentences in the sample. This strategy was adopted for
three reasons; first, for the sake of simplicity; second, due to the fact that in many
cases, the first and last sentences of each paragraph are referred to as the topic
sentence and conclusion, respectively; and third, because the semantic relationship
between the other sentences had already been calculated in cohesion. To this end,
similar to the case of lexical cohesion, the Doc2Vec paragraph embedding and cosine
similarity were combined. Notably, since the inconsistency between keywords in the
news text and the title can significantly impact fake news detection, the semantic
relationship between the two sentences at the beginning and the end of the body
of each news sample was also computed, along with the sentences that formed the
title of news samples.

4.3. Statistical data + KNN-based method


The SDKNN method was offered based on the CL strategy, to increase the per-
formance of deep learning models in fake news detection. The CL strategy and its
benefits were elaborated in the introduction section. The following includes more
related details as well as more information on the SDKNN architecture. The rel-
evant pseudo-code is provided in Sec. 4. The CL strategy has been employed in
numerous studies in line with proposing different models. This strategy can be
variety of models such as the task, the model, data, and performance metric, based
on machine learning components. There are various methods for implementing the
CL strategy such as the simple CL, self-paced learning,30 and balanced CL. The
SDKNN technique can be classified as simple CL. The main challenge in employing
CL involves the data scoring process; and to resolve this challenge, two primary
components including “data difficulty level” and “data pacing function” should
be specified. Data difficulty level can be determined by indicating the difficulty
metric and performance metric, given the nature of the problem and the type of
data. For instance, Vijjini et al.29 conducted sentiment analysis on textual datasets
and considered the length of sentences and accuracy as the difficulty metric and
performance metric, respectively. They believed that the longer the length of the
sentence, the harder it is for machine learning models to learn them. The data pac-
ing function indicates which part of the data scored should be employed at each
training step. In other words, this function indicated the suitable time for updating
the training loss function based on factors such as the number of training steps.

2350034-20
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

The other problem with the use of CL is making a decision on how the training
set data should be organized. Data can be organized in a specific order such as
E2H, hard to easy, or random, in terms of learning by deep learning models. Hav-
ing conducted numerous experiments, Wisniewski et al.27 concluded that sorting
the data according to the E2H order yields better results. Other studies have also
achieved similar results; including the model offered by Hacohen and Weinshall.25
Consequently, the E2H order was employed in this study to arrange the data. Schol-
ars have presented numerous algorithms for data arrangement. For instance, in the
model proposed by Vijjini et al.,29 the BERT algorithm and sentence length were
employed as data pace function and the difficulty metric, respectively, to arrange
textual data. The authors believed that sentences with shorter lengths are easier to
learn by deep learning models such as CNN and LSTM. Hacohen and Weinshall25
use boot strapping and transfer learning to arrange training data. Liu et al.72 used
embedding vector norm to calculate data difficulty level. In this case, the authors
believed that longer sentences or sentences containing less-repetitive words are more
difficult to learn; as a result, they have a higher effect on changes in the loss func-
tion. Therefore, they counted the number of rare or significant words to indicate
the difficulty level of each sentence. In this study, the SDKNN method was used as
a simple CL for the data scoring process. As mentioned in the introduction section,
the employed datasets were textual and involved high difficulty levels in terms of
learning. As a result, SDKNN entails two basic steps in line with increasing the
performance of deep learning models. In the first step, a transformer technique
using Doc2Vec embedding (dim = 400) as well as statistical data were provided to
transform textual data into numerical vectors, and to extract high-level statistical
features and reduce dimension, respectively. In the second step, the scoring function
(f) and data pacing function (g) are indicated based on CL strategy. These steps
are explained in detail in the following. In the first step, the statistical data were
applied to the datasets; subsequently, the numerical vectors created by the Doc2Vec
method with a constant length of 400 were converted into numerical vectors with a
constant length of 4, which amounted to a dimension reduction by 99% (Eq. (2)).
The complexity of the probability function of the used datasets, especially, the Liar
and FakeNewsNet datasets, indicated the high difficulty of these datasets which
results in the news samples overlapping with two classes present in these datasets.
Therefore, it was necessary to extract the features that could be utilized to separate
the news samples (i.e., to provide separable samples). Notably, given the presence
of overlapping samples in these two datasets, the distance metric was not suitable
for separating the samples; in turn, this necessitated more detailed features of the
news samples, which were extracted using high-order statistical data.

sf
Reduction Rate = 1 − , (2)
tf
where sf and tf stand for the selected features and total features, respectively. For
each news sample, Variance (Eq. (3)) is a measure of spread, Skewness (Eq. (4))

2350034-21
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

is a measure of distribution asymmetry, Kurtosis (Eq. (5)) is a measure of the


outliers (i.e., measures how heavy the tail values are when vectors with normal
and identical distributions are considered), and the fifth moment (Eq. (6)) helps
separate the data by providing distinct characteristics of vectors.

σ 2 = E[(X − E[X]2 )], (3)


Skew[X] = E[(X − E[X]3 )/σ 3 ], (4)
Kurt[X] = E[(X − E[X]4 )/σ 4 ], (5)
Mom[X] = E[(X − E[X]5 )/σ 5 ]. (6)

To calculate these data, the Moment-Generating Function (MGF) was used


(SciPy). The moment function in Python was also employed for this purpose.46 In
the second step of the SDKNN method, the scoring function (f) and data pacing
function (F) are indicated. The difficulty metric in news samples can be specified
based on the nearest neighbors. In fact, a technique based upon the KNNs algorithm
was utilized to determine sample rankings. Accordingly, the lower the number of
the nearest neighbors of a sample with the same target class label, the higher
the difficulty level of that sample in terms of learning by deep learning models.
Subsequently, such a sample would receive a higher rank. It is clear that based on
this strategy, samples with noise and the outliers have a higher difficulty level in
terms of the CL; therefore, they would receive higher ranks. Notably, considering
how the transformer technique (the first step in SDKNN) has a vector output, the
cosine similarity with a numerical result in the range of [−1, 1], which is suitable for
multi-dimensional vectors, was used instead of the Euclidean distance to calculate
the nearest neighbors (the angle between vectors). Therefore, the samples were
ranked according to the target class labels of their nearest neighbors. Then, the
samples were arranged from E2H based on their ranking. Therefore, samples are
ranked based on the target class of their nearest neighbors. Then, it is determined by
the F function, the operation of segmentation and determining the network update
time. The purpose of this function is to minimize the objective function presented
in Eq. (7). Here, W represents the weights. L is the goal of learning. In fact, the
goal is to minimize the model error in the entire training set. yi is the output of
the neural network. N is the size of the training dataset (indicates the batch size).
M represents the model, P is the performance metric, and E is a training set. The
regulator is used to prevent overfitting, the dropout is used here. According to the
SDKNN algorithm, if the error of model M in the training process on batch i is
less than batch i − 1, then the pacing function updates the weights and the model.
Equation (7) optimization function

N
 N

W ∗ = Min E(W ) L(yi , yi ) = Min E(W ) Li + regularizer(W ). (7)
w w
i=1 i=1

2350034-22
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

In the function f , the samples are sorted in ascending order from E2H based
on the level of difficulty and the rank obtained according to the number of nearest
neighbors with the same target class as that sample. Then, this collection of samples
sorted in K separate sections (based on the specified ranks in terms of difficulty
and in the order of E2H) and numbered with values from 1 to k. Then, in the
training process and in each iteration, the samples in a section are sorted based
on the amount of error and decreasing gradient. This process continues until the
entire training dataset is sorted. Therefore, in each iteration of the training process,
a section is selected based on the numbered priority. Then, a batch of samples from
that section (mini-batch) is selected for training the network. Then, the difficulty
level is determined based on the gradient and error values. If the training error rate,
indicated by the symbol l in the algorithm, is decreasing, then the weights and the
model are updated. And that mini-batch is added to the training sorted set. This
operation is done for each section and all samples as batch processing. And finally,
the batches in each section are sorted according to the loss function. By performing
this operation for all sections, finally the training set samples are sorted based on
the performance of the model. Finally, this sorted training set is sent as input to
machine learning models.

4.4. Classical and deep learning models


In this paper, the classical machine learning and deep learning models are used to
classify and detect fake news.

4.4.1. Classical machine learning models


The employed classical machine learning models included Logistic Regression (LR),
NB, KNN, SVM, and DT, which were implemented using Scikit-learn library of
Python. It should be noted that the classical machine learning models were imple-
mented on the “Final Training Set”. The results are provided in Sec. 5.

4.4.2. Deep learning models


The following presents a brief explanation of the deep learning models used in
this paper. Due to the limitation in the volume of the article, the theories and
equations related to these models were omitted. TensorFlow and Keras were used
to implement the deep learning algorithms. As the results of the deep learning
models depend on hyperparameters, it was necessary to adjust them. To adjust
these hyperparameters in this paper, the Random Search74 was used, which is
one of the widely used hyperparameter optimization techniques. KerasClassifier
which acts as a wrapper for the Scikit-learn API, should also be used in order to
implement the Random Search. With this cover, the user is able to utilize various
tools available in Scikit-learn. The required class was RandomizedSearchCV, which
implements the Random Search.

2350034-23
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

Recurrent neural network


The main architecture of RNNs consists of input units, output units, and hidden
units. The hidden units perform the entire calculations by adjusting the weights in
order to produce the outputs.75 The RNN model has a one-way flow of informa-
tion from the input units to the hidden units. It also has a directional loop that
compares the error of this hidden layer with the previous hidden layer and adjusts
the weights between the hidden layers. RNNs can be used for supervised classifi-
cation learning.76,77 However, RNNs are difficult to train because of the vanishing
and exploding gradients. The problems with vanishing and exploding gradients are
caused by the improperly assigned weights (assigned excessively high or excessively
low). To address these issues, there are several types of typical RNN architectures
including LSTM networks and GRUs. In this paper, the utilized RNN included
an embedding layer, a bidirectional layer, a dense layer with Rectified Linear Unit
(ReLU) activation (the activation function converts an input signal of a node into
an output signal, which is then used as the input for the next hidden layer, and
so on), the output layer with sigmoid activation, Binary Crossentropy (BCE) loss
function (which is a common function for binary classification problems), Adam
optimizer (due to its simplicity in implementation, less memory consumption, and
computational performance), Learning rate is the proportion of the weights that
are updated during the training of the LSTM model. It can be chosen from the
range of [0.0–1.0]. and dropout. Dropout is a regularization technique, in which the
randomly selected neurons are ignored during training, and used to prevent over-
fitting. The best results were obtained in Table 7 with the hyperparameters listed
in Table 3, and after repeating the experiments several times.

Convolutional neural networks


CNNs process data using a network-based topology known as convolution. CNNs are
multi-layered and are widely used for local FE in NLP applications.79 Complexity
operations in these networks are performed through linear filters on the input fea-
tures.80 In this study, the CNN used included an embedding layer, a Conv1D (due to
its simplicity and considering that the model input entailed numerical vectors) with
ReLU activation, GlobalMaxPooling1D layer, a dense layer with ReLU activation,

Table 3. Best hyperparameters of RNN.

Hyperparameter RNN

Activation function Sigmoid


Optimizer Adam
Epoch 50
Batch size 32
Dropout 0.3
Loss function BCE
Learning rate 0.0002

2350034-24
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

Table 4. Best hyperparameters of CNN.

Hyperparameter CNN

Activation function Sigmoid


Optimizer Adam
Epoch 50
Batch size 50
Dropout 0.2
Loss function BCE
Learning rate 0.0002

and an output layer with sigmoid activation. The hyperparameters presented in


Table 4 are based on the best results obtained from experiments when CNN was
used for fake news detection.

Gated recurrent unit


First introduced by Chung et al.,53 the GRU is a special type of optimized LSTM-
based RNN.82 The internal unit of GRU is similar to that of LSTM,53 except
that GRU combines the input port and the forgotten port in LSTM into a single
update port. In this study, the used GRU included an embedding layer, a GRU
layer with 32 nodes, a dense layer with Adam optimizer and BCE loss function.
The hyperparameters presented in Table 5 are based on the best results obtained
from experiments involving the use of GRU.

Long short-term memory


The ability of the LSTM network to model long-term dependencies has led
researchers to use it in NLP and text classification.83,84 LSTM is an RNN-based
deep neural network architecture that has solved the vanishing error problem in
RNNs.13,85–87 The employed LSTM in this research entailed one input layer with
the same size as the number of inputs, two dense layers with ReLU activation for
processing the complex relations between the inputs, and one output layer with
sigmoid activation. The logarithmic loss function (BCE) was used during training.
The model also uses the efficient Adam optimization algorithm for gradient descent.

Table 5. Best hyperparameters of GRU.

Hyperparameter GRU

Activation function Sigmoid


Optimizer Adam
Epoch 40
Batch size 32
Dropout 0.3
Loss function BCE
Learning rate 0.0001

2350034-25
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

Table 6. Best hyperparameters of LSTM.

Hyperparameter LSTM

Activation function Sigmoid


Optimizer Adam
Epoch 50
Batch size 32
Dropout 0.2
Loss function BCE
Learning rate 0.0003

The best results were obtained when the LSTM model was used in Table 7, based
on the random search with the hyperparameters instead in Table 6.
Learning rate is the proportion of the weights that are updated during the
training of the LSTM model. It can be chosen from the range [0.0–1.0].

5. Results and Discussion


5.1. Evaluation metrics
Classification algorithms are typically evaluated using performance metrics like
accuracy, Precision, Specification, Sensitivity, Recall, F 1-Score, and AUC, which
can be calculated using the equations as follows.
TP + TN
Accuracy = , (8)
TP + FP + FN + TN
TP
Precision = , (9)
TP + FP
TN
Specificity = , (10)
TN + FP
TP
Recall = , (11)
TP + FN
Recall ∗ Precision
F 1-Score = 2 ∗ . (12)
Recall + Precision
In the Confusion Matrix, True Positives (TP) and True Negatives (TN) repre-
sent the correctly predicted positive and negative values, respectively; meanwhile,
False Positives (FP) and False Negatives (FN) refer to the incorrectly predicted
positive and negative values, respectively. Since the fake news detection problem
was assumed to be a binary classification, the ROC curve can be used to specify
the separation capability of the model or its performance in distinguishing between
two classes. Accordingly, the AUC can be used to assess the model’s performance.
The AUC value approaches 1 as the model becomes more accurate, and approaches
0 as the model’s class detection performance degrades. This metric is especially
important when using imbalanced datasets, because the detection rate and proper
classification of existing samples in the minority class are the main challenges for

2350034-26
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

classification models. The minority class in binary classification refers to a group


of samples with a significantly lower number of samples than the other groups; this
leads classifiers to incorrectly detect and classify samples from the minority class
as the majority class.

5.2. Classifiers result


All experiments were conducted on a computer with 2.2 GHz Intel R Core i7 CPU,
and 8GB of RAM. The algorithms were implemented using the Python program-
ming language, and libraries related to NLP and machine learning models, includ-
ing TensorFlow, NLTK, etc. The deep learning and the classical machine learning
models were utilized in the proposed model to classify news samples into one of the
two fake or real categories. Notably, the arranged training set using the SDKNN
technique was used as the input for the deep learning models. The results of the
proposed model in the four employed datasets are presented in Table 7. The perfor-
mance of the classical machine learning and deep learning models were compared in
terms of the “accuracy” metric. Undoubtedly, the performance of the DT algorithm
is higher than that of the other classical algorithms in both the ISOT and the Liar
datasets. However, its performance on the ISOT dataset was even higher than on
the deep learning algorithms.
According to Table 7, the deep learning algorithms have been more efficient
compared to the classical machine learning algorithms, in both the Liar, and Fake
or Real datasets. Such a superiority results from the considerable capability of the
deep learning models in discerning complex relations between data and the use of
the SDKNN technique as a secondary step. According to Table 7, the LSTM model
is more efficient in detecting fake news compared to the RNN, CNN, and GRU
models. Consequently, it was considered as the optimal model in this research to
compare against the other benchmark studies. In the proposed model, FE and the
SDKNN method were used to improve the performance of deep learning models. To
perform the ablation analysis to measure the contribution of each individual part

Table 7. Results of the proposed model (Acc %).

Method Liar Fake or Real ISOT FakeNewsNet

LR 65 94 98.6 78
LDA 64 93 98.3 79
Classical machine DT 69 85 99.9 74
learning SVM 66 95 98.5 79.5
KNN 68 91.5 99.5 76
NB 62 90 97 79

Deep learning RNN 73 94 97.5 75


CNN 75 96 98.2 76
GRU 82 96 99.3 79
LSTM 82 95.8 99.8 81

2350034-27
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

Table 8. Results of the proposed model (using LSTM) (Acc %).

Method Liar Fake or Real ISOT FakeNewsNet

Before FE 67 91 96 75
After FE 78 94.5 99 80
After FE & SDKNN 82 95.8 99.8 81

Table 9. Results of the proposed model (using LSTM) (AUC %).

Method Liar Fake or Real ISOT FakeNewsNet

Before FE 71 90 97 68
After FE 80 94 99.2 72
After FE & SDKNN 85 96 99.8 78.5

of the whole result, Tables 8 and 9, respectively, present the accuracy and AUC
of the proposed model (using the LSTM model) on preprocessed datasets after
the following operations: paragraph embedding (called “before FE”), extraction of
key features such as polarity scores and primitive metadata in the Liar database,
or polarity scores and35 in the other datasets (called “after FE”), and FE and
SDKNN phases (called “after FE & SDKNN”). In the Liar dataset, for instance,
the extraction of key features during the FE phase and the SDKNN technique
increased model performance rate to 80% (AUC = 80%) with 9% growth and 85%
(AUC = 85%) with 14% growth, respectively. Moreover, 10.5% growth was observed
following the implementation of the FE and SDKNN phases in the FakeNewsNet
dataset (AUC = 78.5%).
Figure 2 illustrates the effectiveness of the extracted features (from textual and
primitive features in the used datasets) on the performance of the proposed model

Fig. 2. The effectiveness of extracted features on performance (AUC%).

2350034-28
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

(AUC metric). The structural features have the greatest impact on method per-
formance, due to the medium or long-length news samples in ISOT, FakeNewsNet,
and Fake or Real datasets, as well as the availability of news body or title in each
news sample. Given the short length of sentences in the Liar dataset, these fea-
tures showed no acceptable effect on the performance of the classification methods.
Instead, in this dataset, the extracted features from primitive metadata and also
the surface features significantly improved the performance rate in these methods.
For example, extracting the surface features in the Liar dataset increased the AUC
of the model by 3%. Extracting the structural features in the FakeNewsNet dataset
raised the AUC of the model by 2.5%.
Considering the application of classical machine learning methods and the deep
learning methods in fake news detection in this study, the results of these meth-
ods were compared to those of the methods in other studies; the results of this
comparison are presented in two parts. The results of using the classical methods
on Fake or Real, ISOT, and Liar datasets in terms of “accuracy” are provided in
Table 10. As can be seen in this table, despite using the same classical methods
and the same datasets, the performance of these methods in this study appears to
have been more efficient than those in other studies; this superiority result from
the extraction of important features in the proposed model. For example, using the
DT method on ISOT and Fake or Real datasets in this study offered 99.9% and
85% accuracy, respectively. These values are higher than the results presented in
studies by Younus Khan et al.15 and Nasir et al..17
Tables 11–14 present a comparison of the results of the proposed model (with
LSTM) with those of other deep learning models on the four datasets used in the
study. The datasets used in benchmark studies according to Tables 11–14 were
the same as those used in this paper, with the only difference being the presented
models. Notably, the models with higher performance metrics than other methods in
all the tables are highlighted to become more recognizable. According to the results,
the proposed model performed better than the other benchmark models, in terms of

Table 10. Comparing the classical machine learning models (Acc %).

Method Dataset LR DT SVM KNN NB

Ref. 15 56 52 56 56.5 60
Ref. 37 Liar
62 57 62 58
Proposed model 65 69 66 68 62
Ref. 15 67 63 66 73 90
Ref. 16 Fake or Real
90
Proposed model 94 85 95 91.5 90
Ref. 17 ISOT 52 96 60 60 92
Ref. 18 96.2 80.1 80.2
Proposed model 98.6 99.9 98.5 99.5 97

2350034-29
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

Table 11. Comparing the deep learning models for the Fake or Real
dataset.

Model/Study Acc (%) Prec F 1-Score Recall

CNN+GloVe22 86 86 86 86
Conv-HAN22 86 86 86 86
LSTM22 76 78 76 76
C-LSTM22 86 87 86 86
Bi-LSTM+GloVe22 85 86 85 85
LSTM+Word2Vec16 86 86 86 86
Bi-LSTM+GloVe16 84 84 84 84
Conv-HAN+GloVe16 80 82 80 80
Proposed model (with LSTM) 95.8 96 95.5 96

Table 12. Comparing the deep learning models for the ISOT dataset.

Model/Study Acc (%) Prec F 1-Score Recall AUC

Boosting/Bagging32 99 99 99 1 95.2
CNN81 99 99 99 99
RNN81 98 98 98 98
(CNN+RNN)81 99 99 99 99
Capsule NN41 99.8
LSTM-single layer18 98.9
GRU-stacked-2-Layers18 99.1
BERT18 96.9
CNN+LSTM19 94.7
Fake BERT20 98.9
Proposed model (with LSTM) 99.8 99.8 99.7 99.9 99.8

Table 13. Comparing the deep learning models for the Liar dataset.

Model/Study Acc (%) Prec F 1-Score Recall AUC

Bi-LSTM69 58.6 55.4 52.3 49.5 60.7


Bi-LSTM+GloVe22 58 58 58 57
Capsule NN41 39.5
CNN69 53.6 48.9 35.2 37.5 53.9
CNN+GloVe22 58 58 58 58
BERT69 61.9 58.3 62.8 59.6 61.7
Conv-HAN69 55.7 51.4 56.5 62.8 57.4
Conv-HAN22 59 59 59 59
Bagg Boost39 70 70 70 70
LSVM37 60 60 60 60
Proposed model (with LSTM) 82 86 84 82 85

performance metrics. For example, the highest performance belongs to the method
presented in Ref. 13 in the FakeNewsNet dataset among the methods provided by
other authors with an AUC value of 60.7%. However, an 17.8% difference can be
observed when comparing this finding against the results obtained from performing

2350034-30
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

Table 14. Comparing the deep learning methods for the FakeNewsNet dataset.

Model/Study Acc (%) Prec F 1-Score Recall AUC

BERT69 58.8 56.3 62.8 44.9 57.8


CNN69 54 49.8 44.7 40.5 52
Bi-LSTM69 58.6 55.4 52.3 49.5 60.7
Conv-HAN69 44.8 44.3 56.9 79.5 43.6
Proposed model (with LSTM) 81 82.8 83 82 78.5

the proposed model (with LSTM) on the same dataset, with an AUC of 78.5%. In
addition, the method presented by Waikhom and Goswami50 had an accuracy of
70% in the Liar database, which was the highest performance among the methods
offered by other authors. However, there still exists a 12% difference compared to
the performance of the proposed model (with LSTM) in this dataset.

6. Conclusion and Future Work


This study focused on fake news detection which has grown into a widespread con-
cern in the recent years. Due to the lack of a complete and suitable dataset in
this area, the majority of studies attempted to collect balanced data on a spe-
cific topic or alter the existing datasets; notable instances can be selecting news
samples with a similar topic, or combining multiple datasets to create a suitable
dataset. The current research concentrated on discovering more information from
the existing datasets, rather than merging datasets or removing news samples with
different topics; indeed, this involved the assumption that suitable datasets for
classification algorithms are not always present. Therefore, in addition to extract-
ing surface features, the structure of the news samples was evaluated, qualitative
text features such as text35 were extracted, and sentiment analysis was performed
on the title and text of the news samples. The improvement in performance metrics
such as AUC when using classification algorithms highlights the significant effects
of these features. In addition, with the use of higher-order statistical data and a
KNN based algorithm, a method was proposed in this study based on curriculum
strategy. In this paper, this method was used as a secondary step before train-
ing various classical machine learning and deep learning models. The idea behind
this technique involves indicating the difficulty level of training data and arrang-
ing them based on a specific order (from E2H in terms of learning) to be used in
deep learning models. The obtained results demonstrated its positive effect on the
performance of deep learning models in fake news detection. In the future studies,
researchers can address the problem of imbalance in datasets, which is an issue that
reduces the performance of machine learning models. It is a considerably signifi-
cant problem; for instance, the reason behind the low performance of models used
in this paper on the FakeNewsNet dataset was the imbalanced structure of the
dataset.

2350034-31
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

References
1. K. Tanveer, M. Antonis and A. Adnan, Fake news outbreak 2021: Can we stop
the viral spread? Journal of Network and Computer Applications 190 (2021) 103–
112.
2. Y. Chen, N. Conory and V. Rubin, News in an online world: The need for an “auto-
matic crap detector”, Proceedings of the Association for Information Science and
Technology 52 (2015) 1–4.
3. A. Giachanou, P. Rosso and F. Crestani, The impact of emotional signals on credibility
assessment, Journal of the Association for Information Science and Technology 72(9)
(2021) 1117–1132.
4. X. Zhang and A. A. Ghorbani, An overview of online fake news: Characteriza-
tion, detection, and discussion, Information Processing & Management 57(2) (2020)
102025.
5. Z. Ziegler, Michael Polányi’s fiduciary program against fake news and deepfake in the
digital age, AI & Soc. (2021), https://doi.org/10.1007/s00146-021-01217-w.
6. E. Shearer and J. Gottfried, News use across social media platforms 2017, Report,
Pew Research Center (2017).
7. Á. Figueira and L. Oliveira, The current state of fake news: Challenges and opportu-
nities, Procedia Computer Science 121 (2017) 817–825.
8. F. Li, X. Zhang, X. Zhang, C. Du, Y. Xu and Y. C. Tian, Cost-sensitive and hybrid-
attribute measure multi-decision tree over imbalanced data sets, Information Sciences
422 (2018) 242–256.
9. G. McIntire, Fake real news dataset, GeorgeMcIntire’s Github (2018).
10. W. Y. Wang, “Liar, liar pants on fire”: A new benchmark dataset for fake news
detection, preprint (2017), arXiv:1705.00648.
11. M. Koziarski, M. Woźniak and B. Krawczyk, Combined cleaning and resampling algo-
rithm for multi-class imbalanced data with label noise, Knowledge-Based Systems 204
(2020) 106223.
12. T. Li, G. Kou, Y. Peng and P. S. Yu, An integrated cluster detection, optimization,
and interpretation approach for financial data, IEEE Transactions on Cybernetics
52(12) (2022) 13848–13861, doi:10.1109/TCYB.2021.3109066.
13. E. Masciari, V. Moscato, A. Picariello and G. Sperli, A deep learning approach to
fake news detection, in Int. Symp. Methodologies for Intelligent Systems (Springer,
Cham, 2020), pp. 113–122.
14. H. Reddy, N. Raj, M. Gala and A. Basava, Text-mining-based fake news detection
using ensemble methods, International Journal of Automation and Computing 17(2)
(2020) 210–221.
15. J. Younus Khan et al., A benchmark study of machine learning models for
online fake news detection, Machine Learning with Applications 4 (2021) 100032,
doi:10.1016/j.mlwa.2021.100032.
16. I. Ennejjai, S. I. El Ahrache and B. Hassan, Fake news detection using deep learning,
8th Int. Conf. Innovation and New Trends in Information Technology, 2020, Tangier,
Morocco, pp. 1–8.
17. J. A. Nasir, O. S. Khan and I. Varlamis, Fake news detection: A hybrid CNN-RNN
based deep learning approach, International Journal of Information Management
Data Insights 1 (2020) 100007, doi:10.1016/j.jjimei.2020.100007.
18. E. Amer, K. S. Kwak and S. El-Sappagh, Context-based fake news detection model
relying on deep learning models, Electronics 11(8) (2022) 1255.

2350034-32
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

19. A. Agarwal, M. Mittal, A. Pathak and L. M. Goyal, Fake news detection using a
blend of neural networks: An application of deep learning, SN Computer Science 1(3)
(2020) 1–9.
20. R. K. Kaliyar, A. Goswami and P. Narang, FakeBERT: Fake news detection in social
media with a BERT-based deep learning approach, Multimedia Tools and Applications
80(8) (2021) 11765–11788.
21. J. L. Elman, Learning and development in neural networks: The importance of starting
small, Cognition 48(1) (1993) 71–99.
22. Y. Bengio, J. Louradour, R. Collobert and J. Weston, Curriculum learning, in Proc.
26th Annual Int. Conf. Machine Learning (Association for Computing Machinery,
New York, 2009), pp. 41–48.
23. P. Soviany, R. T. Ionescu, P. Rota and N. Sebe, Curriculum learning: A survey,
International Journal of Computer Vision 130 (2022) 1526–1565.
24. D. Weinshall, G. Cohen and D. Amir, Curriculum learning by transfer learning: The-
ory and experiments with deep networks, in Proc. 35th Int. Conf. Machine Learning
(PMLR, 2018), pp. 5238–5246.
25. G. Hacohen and D. Weinshall, On the power of curriculum learning in training
deep networks, in Proc. 36th Int. Conf. Machine Learning (PMLR, 2019), pp. 2535–
2544.
26. X. Wei, L. Wei, H. Xiaolin, Y. Jie and Q. Song, Multi-modal self-paced learning for
image classification, Neurocomputing 309 (2018) 134–144.
27. M. G. Wisniewski et al., Easy-to-hard effects in perceptual learning depend upon the
degree to which initial trials are “easy”, Psychonomic Bulletin & Review 26 (2019)
1889–1895, doi:10.3758/s13423-019-01627-4.
28. P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, Enriching word vectors with
subword information, Transactions of the Association for Computational Linguistics
5 (2017) 135–146.
29. A. R. Vijjini, K. Anuranjana and R. Mamidi, Analyzing curriculum learning for sen-
timent analysis along task difficulty, pacing and visualization axes, preprint (2021),
arXiv:2102.09990.
30. T. Pi, X. Li, Z. Zhang, D. Meng, F. Wu, J. Xiao and Y. Zhuang, Self-paced boost
learning for classification, in Proc. Twenty-Fifth Int. Joint Conf. Artificial Intelligence
(IJCAI’16) (AAAI Press, 2016), pp. 1932–1938.
31. M. H. Goldani, S. Momtazi and R. Safabakhsh, Detecting fake news with capsule
neural networks, Applied Soft Computing 101 (2021) 106991.
32. A. Iftikhar, Y. Muhammad, Y. Suhail and O. A. Muhammad, Fake news detection
using machine learning ensemble methods, Complexity 2020 (2020) 8885861.
33. T. Li, G. Kou and Y. Peng, Improving malicious URLs detection via feature engi-
neering: Linear and nonlinear space transformation methods, Information Systems 91
(2020) 101494, doi:10.1016/j.is.2020.101494.
34. I. Singh, P. Deepak and K. Anoop, On the coherence of fake news articles,
in ECML PKDD 2020 Workshops: Joint European Conf. Machine Learning and
Knowledge Discovery in Databases, eds. I. Koprinska et al., Communications in
Computer and Information Science, Vol. 1323 (Springer, Cham, 2020), pp. 591–
607.
35. P. Karuna et al., Enhancing cohesion and coherence of fake text to improve believabil-
ity for deceiving cyber attackers, in Proc. First Int. Workshop Language Cognition
and Computational Models, Santa Fe, New Mexico, United States (Association for
Computational Linguistics, 2018), pp. 31–40.

2350034-33
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

36. O. Yukari and K. Ichiro, Text classification based on the latent topics of important sen-
tences extracted by the PageRank algorithm, in Proc. ACL Student Research Work-
shop, Sofia, Bulgaria (Association for Computational Linguistics, 2013), pp. 46–51.
37. M. K. Elhadad, K. F. Li and F. Gebali, A novel approach for selecting hybrid features
from online news textual metadata for fake news detection, in Int. Conf. P2P, Parallel,
Grid, Cloud and Internet Computing (Springer, Cham, 2019), pp. 914–925.
38. O. Ajao, D. Bhowmik and S. Zargari, Fake news identification on twitter with hybrid
CNN and RNN models, in Proc. 9th Int. Conf. Social Media and Society (Association
for Computing Machinery, New York, 2018), pp. 226–230.
39. S. Gilda, Notice of violation of IEEE publication principles: Evaluating machine learn-
ing algorithms for fake news detection, in 2017 IEEE 15th Student Conf. Research
and Development(SCOReD) (IEEE, 2017), pp. 110–115.
40. K. Goseva et al., Identification of security related bug reports via text mining
using supervised and unsupervised classification, https://ntrs.nasa.gov/search.jsp?
R=20180004739 2020-02-02T17:46:02+00:00Z.
41. V. Chandola, A. Banerjee and V. Kumar, Anomaly detection: A survey, ACM Com-
puting Surveys 41(3) (2009) 1–58.
42. CWE-888, Software fault pattern (SFP) clusters, MITRE Corporation,
https://cwe.mitre.org/data/graphs/888.html.
43. N. Mansourov, Software fault patterns: Towards formal compliance points for
CWE (2011), https://buildsecurityin.uscert.gov/sites/default/files/MansourovSW
FaultPatterns.pdf.
44. L. Manevitz and M. Yousef, One-class document classification via neural networks,
Neurocomputing 70(7–9) (2007) 1466–1481.
45. M. Aldwairi and A. Alwahedi, Detecting fake news in social media networks, Procedia
Computer Science 141 (2018) 215–222.
46. J. C. S. Reis, A. Correia, F. Murai, A. Veloso and F. Benevenuto, Supervised
learning for fake news detection, IEEE Intelligent Systems 34(2) (2019) 76–81,
doi:10.1109/MIS.2019.2899143.
47. T. Li, G. Kou, Y. Peng and Y. Shi, Classifying With adaptive hyper-spheres: An incre-
mental classifier based on competitive learning, IEEE Transactions on Systems, Man,
and Cybernetics: Systems 50(4) (2020) 1218–1229, doi:10.1109/TSMC.2017.2761360.
48. H. Ahmed, I. Traore and S. Saad, Detection of online fake news using n-gram analysis
and machine learning techniques, in Int. Conf. Intelligent, Secure, and Dependable
Systems in Distributed and Cloud Environments (Springer, Cham, 2017), pp. 127–
138.
49. B. D. Horne and S. Adali, This just in: Fake news packs a lot in title, uses simpler,
repetitive content in text body, more similar to satire than real news, in 2nd Int.
Workshop on News and Public Opinion at ICWSM (AAAI Press, 2017), pp. 759–766.
50. L. Waikhom and R. S. Goswami, Fake news detection using machine learning, in Proc.
Int. Conf. Advancements in Computing & Management (ICACM) (2019), pp. 252–
256, https://ssrn.com/abstract=3462938, http://dx.doi.org/10.2139/ssrn.3462938.
51. F. Pierri and S. Ceri, False news on social media: A data-driven survey, ACM Sigmod
Record 48(2) (2019) 18–27.
52. J. W. Pennebaker, R. L. Boyd, K. Jordan and K. Blackburn, The development and
psychometric properties of LIWC2015, University of Texas at Austin, Austin (2015).
53. J. Chung, C. Gulcehre, K. Cho and Y. Bengio, Empirical evaluation of gated recurrent
neural networks on sequence modelings, preprint (2014), arXiv:1412.3555.
54. S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation
9(8) (1997) 1735–1780.

2350034-34
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

Fake News Detection Using FE, NLP, CL, and Deep Learning

55. S. Bajaj, “The pope has a new baby!” Fake news detection using deep learning,
Technical Report, Stanford University (2017), pp. 1–8.
56. Kaggle, Getting real about fake news (2017), https://www.kaggle.com/mrisdal/fake-
news.
57. Signal Media, The signal media one-million news articles dataset (2017),
http://research.signalmedia.co/newsir16/signal-dataset.html.
58. J. Pennington, R. Socher and C. D. Manning, GloVe: Global vectors for word rep-
resentation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing
(EMNLP) (Association for Computational Linguistics, 2014), pp. 1532–1543.
59. S. Yoon et al., Detecting incongruity between news headline and body text via a deep
hierarchical encoder, in Proc. AAAI Conf. Artificial Intelligence (AAAI Press, 2019),
pp. 791–800, doi:10.1609/aaai.v33i01.3301791.
60. Kaggle, Fake news detection, San Francisco, CA, USA (2018), https://www.kaggle.
com/jruvika/fake-news-detection.
61. P. Meel and D. K. Vishwakarma, Fake news, rumor, information pollution in social
media and web: A contemporary survey of state-of-the-arts, challenges and opportu-
nities, Expert Systems with Applications 153 (2020) 112986.
62. K. Shu, D. Mahudeswaran, S. Wang, D. Lee and H. Liu, Fakenewsnet: A data repos-
itory with news content, social context and spatialtemporal information for studying
fake news on social media, preprint (2018), arXiv:1809.01286.
63. Q. Le and T. Mikolov, Distributed representations of sentences and documents, in
Proc. 31st Int. Conf. Machine Learning (PMLR, 2014), pp. 1188–1196.
64. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot and E. Duchesnay, Scikit-learn: Machine learning in Python,
Journal of Machine Learning Research 12 (2011) 2825–2830.
65. R. Řehåuřek and P. Sojka, Software framework for topic modelling with large corpora,
in Proc. LREC 2010 Workshop on New Challenges for NLP Frameworks (University
of Malta, Malta, 2010), pp. 46–50.
66. H. Zhang and G. Kou, Role-based multiplex network embedding, in Proc. 39th
Int. Conf. Machine Learning (PMLR, 2022), pp. 26265–26280, https://proceedings.
mlr.press/v162/zhang22m.html.
67. C. Spearman, The proof and measurement of association between two things, The
American Journal of Psychology 100(3/4) (1987) 441–471.
68. F. N. Ribeiro, M. Araújo, P. Gonçalves, M. A. Gonçalves and F. Benevenuto, Sen-
tiBench - A benchmark comparison of state-of-the-practice sentiment analysis meth-
ods, EPJ Data Science 5(1) (2016) 1–29.
69. S. Baccianella, A. Esali and F. Sebastiani, SentiWordNet 3.0: An enhanced lexical
resource for sentiment analysis and opinion mining, in Proc. Seventh Int. Conf. Lan-
guage Resources and Evaluation (LREC’10) (European Language Resources Associ-
ation, 2010), pp. 2200–2204.
70. D. Ippolito et al., Automatic detection of generated text is easiest when humans
are fooled, in Proc. 58th Annual Meeting of the Association for Computational
Linguistics (Association for Computational Linguistics, 2020), pp. 1808–1822,
doi:10.18653/v1/2020.acl-main.164.
71. A. T. Ampa and D. M. Basri, Lexical and grammatical cohesions in the students’
essay writing as the English productive skills, Journal of Physics: Conference Series
1339(1) (2019) 012072.
72. X. Liu, H. Lai, D. F. Wong and L. S. Chao, Norm-based curriculum learning for neural
machine translation, preprint (2020), arXiv:2006.02014.

2350034-35
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034

M. Madani, H. Motameni & R. Roshani

73. L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proc.
COMPSTAT’2010 (Physica-Verlag HD, 2010), pp. 177–186.
74. J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, Journal
of Machine Learning Research 13(2) (2012) 281–305.
75. J. Kim, J. Kim, H. L. T. Thu and H. Kim, Long short term memory recurrent neural
network classifier for intrusion detection, in 2016 Int. Conf. Platform Technology and
Service (PlatCon) (IEEE, 2016), pp. 1–5.
76. T. A. Tang, L. Mhamdi, D. McLernon, S. A .R. Zaidi and M. Ghogho, Deep recurrent
neural network for intrusion detection in SDN-based networks, in 2018 4th IEEE Conf.
Network Softwarization and Workshops (NetSoft) (IEEE, 2018), pp. 202–206.
77. C. Yin, Y. Zhu, J. Fei and X. He, A deep learning approach for intrusion detection
using recurrent neural networks, IEEE Access 5 (2017) 21954–21961.
78. A. Onan, Bidirectional convolutional recurrent neural network architecture with
group-wise enhancement mechanism for text sentiment classification, Journal of King
Saud University - Computer and Information Sciences 34(5) (2022) 2098–2117.
79. L. Zhang, S. Wang and B. Liu, Deep learning for sentiment analysis: A survey, Wiley
Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4) (2018) e1253.
80. G. Gutierrez, J. Canul-Reich, A. O. Zezzatti, L. Margain and J. Ponce, Mining:
Students comments about teacher performance assessment using machine learning
algorithms, International Journal of Combinatorial Optimization Problems and Infor-
matics 9(3) (2018) 26–40.
81. K. Cho, M. van Merriënboer, D. Bahdanau and Y. Bengio, On the properties of neural
machine translation: Encoder-decoder approaches, preprint (2014), arXiv:1409.1259.
82. A. S. Santra and J. L. Lin, Integrating long short-term memory and genetic algorithm
for short-term load forecasting, Energies 12(11) (2019) 2040.
83. T. Young, D. Hazarika, S. Poria and E. Cambria, Recent trends in deep learning
based natural language processing, IEEE Computational Intelligence Magazine 13(3)
(2018) 55–75.
84. S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu and J. Gao, Deep
learning-based text classification: A comprehensive review, ACM Computing Surveys
54(3) (2021) 1–40.
85. R. C. Staudemeyer, Applying long short-term memory recurrent neural networks to
intrusion detection, South African Computer Journal 56(1) (2015) 136–154.
86. R. C. Staudemeyer and C. W. Omlin, Evaluating performance of long short-term
memory recurrent neural networks on intrusion detection data, in Proc. South African
Institute for Computer Scientists and Information Technologists Conf. (Association
for Computing Machinery, New York, 2013), pp. 218–224.
87. H. Hindy, D. Brosset, E. Bayne, A. Seeam, C. Tachtatzis, R. Atkinson and X.
Bellekens, A taxonomy and survey of intrusion detection system design techniques,
network threats and datasets, preprint (2018), arXiv:1806.03517v1.

2350034-36

View publication stats

You might also like