Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning
Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning
net/publication/368799205
Article in International Journal of Information Technology & Decision Making · February 2023
DOI: 10.1142/S0219622023500347
CITATIONS READS
5 592
3 authors:
Reza Roshani
Technical and Vocational University of Iran
5 PUBLICATIONS 60 CITATIONS
SEE PROFILE
All content following this page was uploaded by Reza Roshani on 23 August 2023.
Mirmorsal Madani
Department of Computer Engineering
Gorgan Branch, Islamic Azad University, Gorgan, Iran
Homayun Motameni∗
Department of Computer Engineering
Sari Branch, Islamic Azad University, Sari, Iran
homayun [email protected]
Reza Roshani
Department of Computer Engineering
Technical and Vocational University (TVU)
Tehran, Iran
Following the advancement of the internet, social media gradually replaced the tradi-
tional media; consequently, the overwhelming and ever-growing process of fake news
generation and propagation has now become a widespread concern. It is undoubtedly
necessary to detect such news; however, there are certain challenges such as events,
verification and datasets, and reference datasets related to this area face various issues
such as the lack of sufficient information about news samples, the absence of subject
diversity, etc. To mitigate these issues, this paper proposes a two-phase model using
natural language processing and machine learning algorithms. In the first phase, two
new structural features, along with other key features are extracted from news samples.
In the second phase, a hybrid method based on curriculum strategy, consisting of statis-
tical data, and a k-nearest neighbor algorithm is introduced to improve the performance
of deep learning models. The obtained results indicated the higher performance of the
proposed model in detecting fake news, compared to benchmark models.
∗ Corresponding author.
2350034-1
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
1. Introduction
Fake news refers to false news or incorrect information that follows a specific
agenda and is propagated in a society with the purpose of deceiving the audience.
Accordingly, false information is generated and spread among people in skillful and
attractive ways so that they appear to be true. Regrettably, insufficient knowledge
of people about the media1 leads them to believe fake news which, in turn, sig-
nificantly influences their overall approach to the presented issue. Notably, fake
news generation is not a new phenomenon and dates back to a time prior to the
emergence of the internet.2 Nonetheless, as the internet was further developed and
the traditional media gave way to social media, the increasing and overwhelming
process of generating and spreading this type of news has become a widespread con-
cern. Fake news can be classified into different categories including religion-based
news, politics-based news, and news related to important figures.3 Fake news cov-
ers a variety of topics such as racism, humor, conspiracy, and economy, and often
serves the purpose of creating fear in the society. Ultimately, the detection of fake
news is a necessity as it can potentially lead to serious problems in the society.4
Social media have been shown as the primary platform for fake news.5 Given the
fact that approximately 47% of Americans have pointed to social media as their
dominant news source, the prevalence of fake news on social media and its implica-
tions appear to be considerably clear.6 Consequently, a suitable safeguard system is
definitely required. However, implementing such a system faces several challenges
and different issues including events, consumption, verification, diversion, and ref-
erence datasets.1 Despite the introduction of numerous methods by different studies
to solve these issues, several problems still remain regarding fake news detection
with high accuracy; subsequently, researchers are attempting to present methods
that involve higher performance levels.7,8 Considering the significance of fake news
detection, it remains an open issue among scientific communities and researchers
continue to conduct studies in this area. This study was also carried out to serve
the same purpose. Today, given the considerable capabilities of machine learning
models, strategies for employing these models in text classification are prevalent.
On the other hand, the issues present in reference datasets in the area of fake news
detection reduce the performance of said models. As a result, this study seeks to
examine these issues and offer an advanced model to mitigate them. After review-
ing many related common datasets, four reference datasets were selected which are
described in “Datasets” section; each dataset indicated specific problems regard-
ing fake news detection. Certain problems found in reference datasets are listed as
follows:
• Lack of subject diversity; the news samples present in the datasets are mostly
about a specific subject such as politics (e.g., Fake or Real News Dataset);9
• The short length of the news samples (e.g., Liar Dataset);10
2350034-2
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
• Lack of access to sufficient information and features required for the news samples
such as headlines and news sources (e.g., FakeNewsNet and ISOT Dataset);1
• And imbalanced datasets (e.g., FakeNewsNet Dataset).
All of these problems can raise the difficulty of the relevant datasets.11 For instance,
the difficulties (Eq. (1)) of the Liar and the FakeNewsNet datasets are 0.68% and
49.3%, respectively.
(#matched samples in dataset)
difficulty = 1 − . (1)
(#dataset samples)
First, to discover the causes behind the problems listed above, a number of the
referenced datasets were investigated and the following results were obtained:
• Several researchers have collected news samples and produced datasets related to
their subject of research. For example, the “Fake or Real News” dataset pertains
to the 2016 US elections. Therefore, a number of produced datasets in this area
is limited to a specific subject(s).
• One of the methods of fake news generation is to adopt only a part of the news
body. Moreover, a large number of the news samples present in the datasets are
tweets written on Twitter or materials posted on Facebook. As a result, certain
datasets such as the “Liar” dataset contain news samples with short lengths.
• In many cases, the privacy policies as well as the rules governing social media
do not allow the propagation of further information about news samples. Conse-
quently, the information available on certain datasets such as the “FakeNewsNet”
and “ISOT” is insufficient.
• In real-life stories, such as fraud detection, reject inference, and credit evalua-
tion,12 the number of real news about a specific subject (e.g., only political news
or only economic news) is greater than the number of fake news. Therefore, col-
lecting new samples about a specific subject can lead to imbalance in the dataset.
To mitigate these problems, researchers have employed a variety of strategies
which, along with their shortcomings, are elaborated in the following. In many
studies, described in detail in Sec. 2, the authors have employed data-level strate-
gies including (i) using only one part of a dataset such as selecting news samples
related to a single subject,13 (ii) combining two datasets to solve the imbalance
problem in datasets,14 (iii) combining the Liar, and the Fake or Real datasets to
mitigate the problem of short news samples, increasing the average length of the
news samples from 18 to 644 words,15 and (iv) eliminating news samples that can
reduce the performance of classification algorithms.13 Though data-level methods
can help to enhance the performance of machine learning algorithms, particularly
with regards to the dataset imbalance problem, they face their own challenges. For
example, the elimination or duplication of news samples in the dataset may result
in the removal of important information or the generation of artificial data. On the
other hand, in a real-world situation, access to other data to be combined with
2350034-3
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
the main dataset may not be available. In addition, the manipulation of the main
dataset eliminates the possibility of assessing the ability of machine learning algo-
rithms under real conditions. There are certain researchers who have attempted
to increase the performance of machine learning models in fake news detection by
changing the architecture of these models or integrating several models together.
A number of algorithm-level algorithms can be found in Refs. 16–20. For instance,
Kaliyar et al.20 proposed a model called FakeBERT which was a combination of
Convolutional Neural Network (CNN) and BERT algorithms in line with using the
advantages of both algorithms to raise the accuracy of their model. “Curriculum
Learning” (CL) is another algorithm-level technique inspired by human learning
principles that is employed nowadays to improve the performance of deep learning
models.21 Nevertheless, it was first introduced by Bengio et al.22 as a training model
for machine learning models. In fact, this model is considered a proper substitute
for conventional training based on random mini-batch sampling. Certain benefits of
this method include enhanced training process performance in terms of convergence
speed and accuracy.23 Considering the deep learning components, this technique
can be employed at various levels including the task, model, data, and performance
metrics. The idea behind this technique involves indicating the difficulty level of
training data and arranging them based on a specific order (mostly from easy to
hard58 in terms of being used in deep learning models). Techniques based on the CL
have been proposed in numerous studies such as in Refs. 24–26, 28, and 29. Notably,
the majority of studies were conducted in the area of image classification and com-
puter vision, while less attention has been paid to the application of this strategy in
text classification. In many studies, the authors used feature-based methods to over-
come problems including the lack of sufficient information in datasets and the short
length of news samples (e.g., Refs. 17 and 31–33). They extracted the key features
from the news body and headlines in datasets. Results show that the extraction of
key features from texts can improve the overall performance of machine learning
models. However, in the majority of related studies, authors have solely focused
on extracting style, surface and polarity features using Natural Language Process-
ing (NLP) tools and have overlooked the structural features. Meanwhile, special
attention is paid to these features in news articles written by humans.34 Today, a
significant portion of fake news is generated by machines, based on altering real
news texts. Machines tend to overlook these features, as opposed to humans who
write news articles. Consequently, the extraction of these features can enhance the
performance of machine learning models in fake news detection. Examinations into
studies on fake news detection and their findings (addressed in Sec. 2) demonstrate
that techniques that employed a combination of the aforementioned strategies per-
formed better than those techniques in which a single strategy was employed. In
general, the shortcomings mentioned above were the motivations behind this study
to propose an advanced two-phase process models based on a combination of Fea-
ture Extraction (FE) and CL strategies. This resulted in a superior model than
those in benchmark studies, in terms of performance metrics such as accuracy and
2350034-4
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
area under the Receiver Operating Characteristic (ROC) curve (AUC). In the first
phase of the proposed model, important features were extracted from the body and
titles of the news samples. In the second phase, a method based on the CL strategy
was employed to reduce dimension and sort news samples. In the FE phase carried
out using the NLP, a number of common key features along with two new structural
features (i.e., called coherence and cohesion) were extracted from the news sam-
ples and were assumed as metadata. Other studies on examining text readability34
and producing believable fake texts35 have employed these features as well, while
they have remained unused in research on fake news detection. The reasons behind
the extraction of said features in this paper are summarized below: As previously
mentioned, a significant portion of fake news is generated by machines through
altering real news texts in news articles written by actual humans. Machines tend
to overlook a set of features as opposed to humans who pay special attention to
them, when writing news articles. For example, Singh et al.34 calculated the coher-
ence in two ISOT and HWB datasets and concluded that the degree of coherence
in real news is higher than that of fake news. Therefore, extracting this feature
can improve the performance of machine learning models in fake news detection.
In addition, in real news articles, there are correlation and coherence between the
news title and the news body. This may not be the case in fake news as it is gen-
erated through procedures such as manipulating real news or adopting only a part
of the real news, etc.; in turn, these procedures might disarray the structure of
sentences, distort the correlation between the news, body and news title, disrupt
the correlation and coherence between the sentences in news body, and so on. For
instance, Karuna et al.35 examined the generation of believable fake news and con-
cluded that the believability of fake news can be increased by applying different
methods such as raising the level of coherence between news title and news body
and increasing the cohesion between the news sentences through creating synthetic
data. The proposed model addresses issues including the lack of subject diversity
in fake news detection datasets, the short length of sentences, and the insufficient
information on news samples. In this study, four different datasets with diverse sub-
jects were used. Each employed reference dataset is focused on a specific subject
or encompasses different subjects. For example, the Liar dataset entails political
news, while the news samples in the ISOT dataset include a variety of subjects
such as the government, left news, the world, etc. As a result, the proposed model
showed an adequate performance in addressing news articles with different topics.
Moreover, in the majority of studies in this area (i.e., Refs. 15, 17, 31, 32, 36,
and 46), researchers have addressed issues including short length of sentences and
the lack of sufficient information on news samples, using strategies such as FE and
algorithm-level techniques. The proposed model in this study also addresses these
issues through FE and CL strategies. The remainder of this paper is structured
as follows. The related works are discussed in Sec. 2. The datasets used in this
paper (which are notably the most popular datasets) are introduced in Sec. 3. The
2350034-5
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
methodology used in the study is presented in Sec. 4 which includes the following
sections: text preprocessing, FE, a hybrid method of statistical data and k-nearest
neighbor (KNN) named (Statistical Data and KNN (SDKNN)), and machine learn-
ing models. The experimental results along with a comparative analysis involving
several state-of-the-art methods used for fake news detection based on deep learn-
ing and classical machine learning are elaborated in Sec. 5. Finally, the conclusion
of the study and suggestions for future research are provided in Sec. 6.
2. Related Work
The proposed model in this study is based on FE from textual datasets and a
method aimed at enhancing the performance of the classical machine learning and
deep learning algorithms. In addition, four common datasets in the area of fake news
detection were used to assess the proposed model. This section entails a summary
of studies related to FE and deep learning, and FE and classical machine learning.
In addition, the related works were classified based on the datasets used, which are
presented in Table 1.
2350034-6
Table 1. Summary of related studies on fake news detection.
2350034-7
Ref. 22 Lexical/polarity CNN/LSTM/Bi-LSTM/ 0.95 0.95 0.89 0.95
C-LSTM/Conv-HAN
Ref. 22 Lexical/polarity/n-grams NB 0.90 0.91 0.90 0.90
Ref. 69 Clues/News contents CNN 0.54 0.498 0.447 0.405 0.52
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
(Continued )
Fake News Detection Using FE, NLP, CL, and Deep Learning
Table 1. (Continued)
2350034-8
Ref. 17 (three feature sets) LR/RF/Ada 0.914 0.91 0.92 0.93
Ref. 13 CNN + RNN 0.82
Ref. 38 Stochastic Gradient 0.772
LSTM+CNN
Ref. 69 FakeNewsNet Clues/News contents BERT 0.588 0.563 0.628 0.449 0.578
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Notes: Acc: accuracy, Prec: precision, AUC: area under ROC curve, BERT: bidirectional encoder representations from transformers, CNN: convo-
lutional neural network, DT: decision tree, KNN: k-nearest neighbor, LR: logistic regression, LSTM: long short-term memory, NB: Naive Bayes,
RF: random forest, SVM: support vector machine, Bi-LSTM: bi-directional long short-term memory, SGD: stochastic gradient descent. 73
2nd Reading
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
to detect fake news by extracting specific features from news sources, news texts,
etc. Ultimately, a new set of grammatical and semantic features were introduced for
fake news detection. Younus Khan et al.15 detected fake news using n-gram analysis
and machine learning techniques. They used Term Frequency (TF) and TF-IDF
to extract features, after preprocessing the text. To classify real and fake news,
they used traditional methods such as Support Vector Machine (SVM) and KNN,
as a well-known instance-based learning algorithm47 as well as a five-fold cross-
validation method for evaluation. A specialized team gathered and prepared the
used dataset, as well as the dataset in the study by Horne and Adali,49 and applied
the proposed techniques to both datasets. According to the results, the linear SVM
and TF–IDF FE techniques performed better than other methods. Younus Khan
et al.15 employed traditional algorithms and deep learning to detect fake news using
the Liar and Fake or Real datasets. Using the Sentiment Intensity Analyzer func-
tion78 from the NLTK library in Python, they first extracted lexical features such
as word count, article length, and sentiment analysis. Next, they used the TF–IDF
to extract bigrams and unigrams. They also utilized the Empath tool to extract
important topics and data. They implemented GloVe for word embedding and KNN
for traditional algorithms on the two datasets above. In the Liar dataset, the best
result was obtained from Naive Bayes (NB) with an accuracy of 60%. The NB algo-
rithm showed its best performance on the Fake or Real dataset as well, achieving
an accuracy of 90%.14 detected fake news using text features such as stylometric
features and text-based word vector representation suggested in their own research.
Due to the imbalance in the FakeNewsNet, they combined it with the McIntire
dataset to form a single dataset. This allowed them to create a single dataset with
49% real news and 50.15% fake news. Then, three groups of stylometric features
were created during the FE phase: Group 1 entailed the number of unique words,
complexity and Gunning Fox index; Group 2 encompassed the number of words,
sentences, syllables, and capital letters; and Group 3 consisted of the number of
characters, figures, short words, etc. For fake news detection, they used a variety of
classification methods such as Random Forest, SVM, KNN, and Bagging.50 They
used the Liar dataset and embedding techniques suggested in their own study to
detect fake news. For simplicity purposes, they assumed the target class as binary,
that is, in the form of either real or fake news. They used the Bag of Word (BoW),
TF–IDF, and n-gram methods for embedding and FE, respectively. Subsequently,
they used Min–Max scaling to normalize the numerical features they had generated.
To classify and detect fake news, they used AdaBoost, Extra Trees, Random Forest,
XGBoost, and Bagging methods. After 150 tests, the Bagging method produced the
best results with 70% accuracy.
2350034-9
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
in the literature to detect and classify fake news. According to Pierri and Ceri,51
there are certain challenges in fake news detection which include: (i) difficulty in
detecting fake news from real news, as the former is created using a metric of high
similarity to a real news broadcast in traditional media, (ii) the fast-paced fake
news propagation, (iii) the inability of experts in early detection of fake news, and
(iv) the limitations of social media platforms in authorizing access to information.
The algorithms were then classified based on content, context, or a combination
of both. A number of examined methods included Linguistic Inquiry and Word
Count (LIWC),52 Gated Recurrent Unit (GRU),53 and Long Short-Term Memory
(LSTM).54 Bajaj55 developed a classifier to determine the fake parts of a news
article, using only the title and the text. The datasets used in that study were Kag-
gle56 and Signal Media News.57 The author then converted the text into a feature
vector using GloVe58 with 300 dimensions. Subsequently, he attempted to detect
the fake segments of a piece of news using two-layer feed-forward neural networks,
Recurrent Neural Network (RNN), LSTM, and GRU. Notably, Goldani et al.31 used
capsule neural networks to detect fake news; they also used static and nonstatic
word embedding models for short and medium-to-long news samples. Moreover,
n-grams were applied on the Liar and ISOT datasets for FE. To evaluate the pro-
posed method and compare it to traditional methods such as SVM, the authors
only used the accuracy metric. Ultimately, the best result was obtained using the
ISOT dataset and the nonstatic capsule network with an accuracy of 99.8%. Due to
the problems with the dataset and the short length of the text, the authors concen-
trated on metadata for the Liar dataset. The best result was obtained on “history as
metadata” with an accuracy of 39.5%.59 The authors investigated how closely news
texts corresponded to their respective headlines. To this aim, they created a dataset
by extracting millions of news articles and separating their texts and titles, and then
used a deep hierarchical encoder to detect fake news. A method was also proposed
for summarizing news articles. Younus Khan et al.15 used deep learning methods
to detect fake news. They employed the Liar and Fake or Real datasets. First, they
extracted lexical features such as word count, article length, and sentiment anal-
ysis. Then, they used the TF–IDF to extract bigrams and unigrams. Notably, the
authors used CNN, LSTM, Conv-HAN, etc. models at the character level in their
deep learning models. The Conv-HAN model yielded the best results in the Liar
dataset with an accuracy of 59%.17 In this study a hybrid RNN-CNN deep learn-
ing model is presented for fake news detection. The ISOT and FA-KES reference
datasets were then used to test the model. After preprocessing the news texts, they
divided the dataset into two sets including training (80%) and test (20%). Next,
they used GloVe to embed the data. The proposed method was implemented on the
ISOT dataset with 99% accuracy.32 They utilized machine learning and ensemble
techniques, and extracted a set of LIWC features to classify news articles into two
categories of true and fake. They used the ISOT dataset as well as two Kaggle
datasets.60 Using the LIWC2015 tool, they extracted a total of 93 features, includ-
ing words related to positive and negative emotions in the text, the number of verbs,
2350034-10
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
adverbs, etc. They also divided the data into two groups including training (70%)
and testing (30%). Subsequently, they used different hyperparameters to train var-
ious learning algorithms to achieve the highest classification accuracy possible. To
this end, they applied various ensemble techniques such as Bagging and Boosting
and created two voting classifiers based on these algorithms. They employed perfor-
mance metrics such as accuracy for evaluation purposes. Finally, using the Random
Forest and Perez-LSVM algorithms, they achieved an accuracy of 99% (on average
95.25%) on the ISOT dataset. McIntire9 developed a fake news detection method
based on deep learning and BERT algorithm. They concentrated on searching for
clues in the news contents. After text preprocessing, three classification techniques
were used which included classical classification algorithms, deep learning approach,
and multimedia approach. Following data ingestion and preprocessing, fake news
was classified and detected using the two modules of NLP processing and multi-
media processing. Masciari et al.13 extracted statistical data (e.g., mean, variance,
etc.) from the Liar and FakeNewsNet datasets. The authors omitted news samples
with short lengths (less than ten words) from the Liar dataset; subsequently, this
dataset was reduced by 1675 news samples. In the FakeNewsNet dataset, they only
used political news and eliminated the news collected from the GossipCop website
to mitigate the imbalance in the dataset. Ultimately, the Google BERT model on
the Liar dataset produced the best results with an accuracy of 61.9%. In addition,
the Google BERT model performed best on the FakeNewsNet dataset. Amer et al.18
used classical machine learning models such as SVM, NB, and Decision Tree (DT) as
well as deep learning algorithms including LSTM and GRU to detect fake news on
the ISOT dataset. Also, a set of tests were carried out to compare the performance
of word embedding methods with BERT. The results showed that the performance
of deep learning models using word embedding methods was higher than the BERT
model. Ennejjai et al.16 implemented various operations on three datasets, includ-
ing Fake or Real dataset, for fake news detection. These operations, respectively,
involved preprocessing text features, converting them into numerical vectors via
different methods such as GloVe, Word2Vec, and TF–IDF, and using different deep
learning methods including LSTM, CNN. Agarwal et al.19 attempted to detect fake
news without taking the author’s name, news source, etc. into account. After pre-
processing the text, the authors used GloVe for word embedding and then sent the
numerical vectors to the hybrid model of CNN and LSTM for classification. Kaliyar
et al.20 employed a model called FakeBERT to detect fake news. The architecture
of the model was a combination of CNN and BERT algorithms. The authors tuned
the hyperparameters of CNN to achieve higher performance and managed to reach
99.8% accuracy in detecting fake news on the ISOT dataset.
3. Datasets
In this study, four popular datasets61 were used to test the performance of the
proposed model under a variety of conditions, including news articles with different
2350034-11
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
subjects, insufficient information about news samples, and news samples with var-
ious lengths. These datasets included the Liar,10 ISOT Fake News,48 FakeNews-
Net,62 and Fake or Real News,9 which are discussed in the following. The structural
information of the datasets is listed in Table 2.
2350034-12
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
4. Methodology
The model proposed in this study is illustrated in Fig. 1 and the relevant pseudo-
code is provided in this section. In the first phase of the proposed model, each news
sample from the datasets is partitioned into two parts including textual features
(named “Text”) and relevant features (named “Primitive Metadata”). The prepro-
cessing operation is carried out on “Text”. Then, “FE” is done on the preprocessed
text, and the extracted features would be added to the “Final Metadata”. To create
the same conditions as those of other studies, the dataset was split into two sets
including training (80%) and test (20%). Next, to prevent overfitting during model
training, 20% of the training data was considered as validation. The Doc2Vec para-
graph embedding was used to convert the preprocessed data into numerical vectors
with a fixed length of 400, which was considered as input for the second phase.
In the second phase, the statistical data were used to reduce feature dimension.
Subsequently, the “Input layer” was created through outer-joining the outputs of
the “Dimension reduction”, and “Final Metadata” parts. Then, the news samples
of the “Final Training Set” were sorted in E2H order using the SDK method, and
were sent to deep learning models. Additionally, the “Final Training Set” was con-
sidered as input for the classical machine learning. The components of the proposed
model explained below includes: Text Preprocessing, FE, the SDKNN method, and
classifiers (classical machine learning and deep learning models were used for fake
news detection).
The ds is vertically partitioned into Text (which includes only textual features),
and Primitive metadata (which includes only categorical or numerical features).
Just the Liar dataset is included the primitive metadata. We denote the main
dataset by ds, the number of news samples by N , the number of extracted fea-
tures by M , scoring function by f , and data pacing function by F , loss function
by gi (w).
2350034-13
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
2350034-14
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
this field (e.g., Refs. 17, 31, 32, and 36) have attempted to extract lexical, linguistic,
stylometric, and text polarity features, among others, as the best features that affect
the performance of the classifier. In this study, after preprocessing the datasets, two
operations were carried out during the FE phase; first, the textual features of the
datasets were transformed into numerical vectors with fixed lengths, using Doc2Vec
paragraph embedding, to prepare them to be used in machine learning algorithms.
Second, as the employed datasets entailed the textual features of news body or news
2350034-15
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
title while providing no other key information, a number of important features were
extracted and taken into account as the “Final metadata”.
2350034-16
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
2350034-17
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
for k in Queue2 :
sum1 = sum1 + cosine similarity (Queue1 [0], Queue2 [k])
sum2 = sum2 + cosine similarity (Queue1 [n − 1], Queue2 [k])
end for
F SK+7 = (sum1 + sum2 )/(2n)
Add F SK+6 and F SK+7 to Final metadata
by ds, the number of news samples by N, the number of accurate references by nat ,
the number of null references by nrt , the number of poorly structured sentences by
nrf , the Ring Queue by Rqueue, the well-structured sentences by nst , the number
of true anaphoric references by nat , the number of true cataphoric references by
nct , the number of extracted features by M.
Surface features
Numerous features were extracted from the news samples in the applied datasets
(with the exception of the Liar dataset in which, due to the short length of the news
samples, the extraction of these features was impossible) at the levels of paragraph,
sentence, and character. These extracted features included the number of sentences
and words in each paragraph, the number of words in each sentence, the number
of adjectives adverbs, and specific names, the number of verbs in each paragraph
and each sentence, and the average number of characters in words. Finally, those
features in the text that had the highest dependence on the target output (class
label) based on Spearman’s rank correlation coefficient,67 e.g., the number of words,
adjectives, adverbs, specific nouns, and verbs, were maintained and the rest were
removed.
2350034-18
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
since the majority of fake news nowadays is generated by machines, with the news
body and title misaligned, this paper determines if the sentiment analysis of the
news body and title (if present in the dataset) are matched and adds them to the
final metadata.
Structural features
In this work, two features, which are considered structural features and have been
somewhat overlooked in previous studies on fake news detection, were extracted
from news body and news title. These features include cohesion and coherence and
they were calculated based on semantic similarity. Cohesion refers to the relation-
ship between the components of a given text, and it is measured by considering
factors such as reference pronouns, grammar, and linking words including conjunc-
tive adverbs, coordinates, subordinates, and correlative conjunctions. Coherence
refers to the semantic relationship and consistency between ideas in a text. Subse-
quently, the sentences in a text should be semantically linked to its title. In real
news, correlation and coherence exist between the news title and the news body.
For example, Singh et al.34 calculated the coherence in ISOT and HWB datasets
and concluded that the degree of coherence in real news is higher than that in fake
news. Therefore, these features can possibly be used as measures for distinguishing
real news from fake news. On the other hand, coherence may be nonexistent in fake
news because it is produced by operations such as manipulating the real news or
adopting only a part of the real news which, in turn, might disarray the structure
of sentences, distort the correlation between the news body and news title, dis-
rupt the35 between the sentences in news body, and so on. As an example, Karuna
et al.35 addressed believable fake news generation and concluded that the believ-
ability of the fake news could be increased by applying different methods such as
increasing the coherence between the news title and the news body and increasing
the cohesion between the news sentences through artificial data generation. Given
the above-mentioned arguments, these structural features were extracted from the
news samples in this study and then added to the “Final metadata” using the NLP
method as well as the appropriate tools and techniques.
Calculation of cohesion
In this paper, to determine the cohesion in each news sample, the distinguish-
able lexical and grammatical cohesion devices were extracted. Ampa and Basri71
reviewed 809 handwritten articles and reported that more than 69% of cases of
grammatical cohesion devices appear to be related to reference pronouns and con-
junctions. They also explored 1224 student papers and discovered that reiterations
(synonyms, repetition, and antonyms) were employed to create lexical cohesion,
82.84% of the time. To simplify the process in this study, the window size of
each news sample was set to three consecutive sentences. The number of reference
2350034-19
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Calculation of coherence
In this study, coherence is calculated by measuring the semantic relationship
between the sentences at the beginning and the end of each news sample, rather
than checking the entire sentences in the sample. This strategy was adopted for
three reasons; first, for the sake of simplicity; second, due to the fact that in many
cases, the first and last sentences of each paragraph are referred to as the topic
sentence and conclusion, respectively; and third, because the semantic relationship
between the other sentences had already been calculated in cohesion. To this end,
similar to the case of lexical cohesion, the Doc2Vec paragraph embedding and cosine
similarity were combined. Notably, since the inconsistency between keywords in the
news text and the title can significantly impact fake news detection, the semantic
relationship between the two sentences at the beginning and the end of the body
of each news sample was also computed, along with the sentences that formed the
title of news samples.
2350034-20
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
The other problem with the use of CL is making a decision on how the training
set data should be organized. Data can be organized in a specific order such as
E2H, hard to easy, or random, in terms of learning by deep learning models. Hav-
ing conducted numerous experiments, Wisniewski et al.27 concluded that sorting
the data according to the E2H order yields better results. Other studies have also
achieved similar results; including the model offered by Hacohen and Weinshall.25
Consequently, the E2H order was employed in this study to arrange the data. Schol-
ars have presented numerous algorithms for data arrangement. For instance, in the
model proposed by Vijjini et al.,29 the BERT algorithm and sentence length were
employed as data pace function and the difficulty metric, respectively, to arrange
textual data. The authors believed that sentences with shorter lengths are easier to
learn by deep learning models such as CNN and LSTM. Hacohen and Weinshall25
use boot strapping and transfer learning to arrange training data. Liu et al.72 used
embedding vector norm to calculate data difficulty level. In this case, the authors
believed that longer sentences or sentences containing less-repetitive words are more
difficult to learn; as a result, they have a higher effect on changes in the loss func-
tion. Therefore, they counted the number of rare or significant words to indicate
the difficulty level of each sentence. In this study, the SDKNN method was used as
a simple CL for the data scoring process. As mentioned in the introduction section,
the employed datasets were textual and involved high difficulty levels in terms of
learning. As a result, SDKNN entails two basic steps in line with increasing the
performance of deep learning models. In the first step, a transformer technique
using Doc2Vec embedding (dim = 400) as well as statistical data were provided to
transform textual data into numerical vectors, and to extract high-level statistical
features and reduce dimension, respectively. In the second step, the scoring function
(f) and data pacing function (g) are indicated based on CL strategy. These steps
are explained in detail in the following. In the first step, the statistical data were
applied to the datasets; subsequently, the numerical vectors created by the Doc2Vec
method with a constant length of 400 were converted into numerical vectors with a
constant length of 4, which amounted to a dimension reduction by 99% (Eq. (2)).
The complexity of the probability function of the used datasets, especially, the Liar
and FakeNewsNet datasets, indicated the high difficulty of these datasets which
results in the news samples overlapping with two classes present in these datasets.
Therefore, it was necessary to extract the features that could be utilized to separate
the news samples (i.e., to provide separable samples). Notably, given the presence
of overlapping samples in these two datasets, the distance metric was not suitable
for separating the samples; in turn, this necessitated more detailed features of the
news samples, which were extracted using high-order statistical data.
sf
Reduction Rate = 1 − , (2)
tf
where sf and tf stand for the selected features and total features, respectively. For
each news sample, Variance (Eq. (3)) is a measure of spread, Skewness (Eq. (4))
2350034-21
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
N
N
W ∗ = Min E(W ) L(yi , yi ) = Min E(W ) Li + regularizer(W ). (7)
w w
i=1 i=1
2350034-22
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
In the function f , the samples are sorted in ascending order from E2H based
on the level of difficulty and the rank obtained according to the number of nearest
neighbors with the same target class as that sample. Then, this collection of samples
sorted in K separate sections (based on the specified ranks in terms of difficulty
and in the order of E2H) and numbered with values from 1 to k. Then, in the
training process and in each iteration, the samples in a section are sorted based
on the amount of error and decreasing gradient. This process continues until the
entire training dataset is sorted. Therefore, in each iteration of the training process,
a section is selected based on the numbered priority. Then, a batch of samples from
that section (mini-batch) is selected for training the network. Then, the difficulty
level is determined based on the gradient and error values. If the training error rate,
indicated by the symbol l in the algorithm, is decreasing, then the weights and the
model are updated. And that mini-batch is added to the training sorted set. This
operation is done for each section and all samples as batch processing. And finally,
the batches in each section are sorted according to the loss function. By performing
this operation for all sections, finally the training set samples are sorted based on
the performance of the model. Finally, this sorted training set is sent as input to
machine learning models.
2350034-23
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Hyperparameter RNN
2350034-24
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
Hyperparameter CNN
Hyperparameter GRU
2350034-25
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Hyperparameter LSTM
The best results were obtained when the LSTM model was used in Table 7, based
on the random search with the hyperparameters instead in Table 6.
Learning rate is the proportion of the weights that are updated during the
training of the LSTM model. It can be chosen from the range [0.0–1.0].
2350034-26
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
LR 65 94 98.6 78
LDA 64 93 98.3 79
Classical machine DT 69 85 99.9 74
learning SVM 66 95 98.5 79.5
KNN 68 91.5 99.5 76
NB 62 90 97 79
2350034-27
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Before FE 67 91 96 75
After FE 78 94.5 99 80
After FE & SDKNN 82 95.8 99.8 81
Before FE 71 90 97 68
After FE 80 94 99.2 72
After FE & SDKNN 85 96 99.8 78.5
of the whole result, Tables 8 and 9, respectively, present the accuracy and AUC
of the proposed model (using the LSTM model) on preprocessed datasets after
the following operations: paragraph embedding (called “before FE”), extraction of
key features such as polarity scores and primitive metadata in the Liar database,
or polarity scores and35 in the other datasets (called “after FE”), and FE and
SDKNN phases (called “after FE & SDKNN”). In the Liar dataset, for instance,
the extraction of key features during the FE phase and the SDKNN technique
increased model performance rate to 80% (AUC = 80%) with 9% growth and 85%
(AUC = 85%) with 14% growth, respectively. Moreover, 10.5% growth was observed
following the implementation of the FE and SDKNN phases in the FakeNewsNet
dataset (AUC = 78.5%).
Figure 2 illustrates the effectiveness of the extracted features (from textual and
primitive features in the used datasets) on the performance of the proposed model
2350034-28
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
(AUC metric). The structural features have the greatest impact on method per-
formance, due to the medium or long-length news samples in ISOT, FakeNewsNet,
and Fake or Real datasets, as well as the availability of news body or title in each
news sample. Given the short length of sentences in the Liar dataset, these fea-
tures showed no acceptable effect on the performance of the classification methods.
Instead, in this dataset, the extracted features from primitive metadata and also
the surface features significantly improved the performance rate in these methods.
For example, extracting the surface features in the Liar dataset increased the AUC
of the model by 3%. Extracting the structural features in the FakeNewsNet dataset
raised the AUC of the model by 2.5%.
Considering the application of classical machine learning methods and the deep
learning methods in fake news detection in this study, the results of these meth-
ods were compared to those of the methods in other studies; the results of this
comparison are presented in two parts. The results of using the classical methods
on Fake or Real, ISOT, and Liar datasets in terms of “accuracy” are provided in
Table 10. As can be seen in this table, despite using the same classical methods
and the same datasets, the performance of these methods in this study appears to
have been more efficient than those in other studies; this superiority result from
the extraction of important features in the proposed model. For example, using the
DT method on ISOT and Fake or Real datasets in this study offered 99.9% and
85% accuracy, respectively. These values are higher than the results presented in
studies by Younus Khan et al.15 and Nasir et al..17
Tables 11–14 present a comparison of the results of the proposed model (with
LSTM) with those of other deep learning models on the four datasets used in the
study. The datasets used in benchmark studies according to Tables 11–14 were
the same as those used in this paper, with the only difference being the presented
models. Notably, the models with higher performance metrics than other methods in
all the tables are highlighted to become more recognizable. According to the results,
the proposed model performed better than the other benchmark models, in terms of
Table 10. Comparing the classical machine learning models (Acc %).
Ref. 15 56 52 56 56.5 60
Ref. 37 Liar
62 57 62 58
Proposed model 65 69 66 68 62
Ref. 15 67 63 66 73 90
Ref. 16 Fake or Real
90
Proposed model 94 85 95 91.5 90
Ref. 17 ISOT 52 96 60 60 92
Ref. 18 96.2 80.1 80.2
Proposed model 98.6 99.9 98.5 99.5 97
2350034-29
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Table 11. Comparing the deep learning models for the Fake or Real
dataset.
CNN+GloVe22 86 86 86 86
Conv-HAN22 86 86 86 86
LSTM22 76 78 76 76
C-LSTM22 86 87 86 86
Bi-LSTM+GloVe22 85 86 85 85
LSTM+Word2Vec16 86 86 86 86
Bi-LSTM+GloVe16 84 84 84 84
Conv-HAN+GloVe16 80 82 80 80
Proposed model (with LSTM) 95.8 96 95.5 96
Table 12. Comparing the deep learning models for the ISOT dataset.
Boosting/Bagging32 99 99 99 1 95.2
CNN81 99 99 99 99
RNN81 98 98 98 98
(CNN+RNN)81 99 99 99 99
Capsule NN41 99.8
LSTM-single layer18 98.9
GRU-stacked-2-Layers18 99.1
BERT18 96.9
CNN+LSTM19 94.7
Fake BERT20 98.9
Proposed model (with LSTM) 99.8 99.8 99.7 99.9 99.8
Table 13. Comparing the deep learning models for the Liar dataset.
performance metrics. For example, the highest performance belongs to the method
presented in Ref. 13 in the FakeNewsNet dataset among the methods provided by
other authors with an AUC value of 60.7%. However, an 17.8% difference can be
observed when comparing this finding against the results obtained from performing
2350034-30
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
Table 14. Comparing the deep learning methods for the FakeNewsNet dataset.
the proposed model (with LSTM) on the same dataset, with an AUC of 78.5%. In
addition, the method presented by Waikhom and Goswami50 had an accuracy of
70% in the Liar database, which was the highest performance among the methods
offered by other authors. However, there still exists a 12% difference compared to
the performance of the proposed model (with LSTM) in this dataset.
2350034-31
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
References
1. K. Tanveer, M. Antonis and A. Adnan, Fake news outbreak 2021: Can we stop
the viral spread? Journal of Network and Computer Applications 190 (2021) 103–
112.
2. Y. Chen, N. Conory and V. Rubin, News in an online world: The need for an “auto-
matic crap detector”, Proceedings of the Association for Information Science and
Technology 52 (2015) 1–4.
3. A. Giachanou, P. Rosso and F. Crestani, The impact of emotional signals on credibility
assessment, Journal of the Association for Information Science and Technology 72(9)
(2021) 1117–1132.
4. X. Zhang and A. A. Ghorbani, An overview of online fake news: Characteriza-
tion, detection, and discussion, Information Processing & Management 57(2) (2020)
102025.
5. Z. Ziegler, Michael Polányi’s fiduciary program against fake news and deepfake in the
digital age, AI & Soc. (2021), https://doi.org/10.1007/s00146-021-01217-w.
6. E. Shearer and J. Gottfried, News use across social media platforms 2017, Report,
Pew Research Center (2017).
7. Á. Figueira and L. Oliveira, The current state of fake news: Challenges and opportu-
nities, Procedia Computer Science 121 (2017) 817–825.
8. F. Li, X. Zhang, X. Zhang, C. Du, Y. Xu and Y. C. Tian, Cost-sensitive and hybrid-
attribute measure multi-decision tree over imbalanced data sets, Information Sciences
422 (2018) 242–256.
9. G. McIntire, Fake real news dataset, GeorgeMcIntire’s Github (2018).
10. W. Y. Wang, “Liar, liar pants on fire”: A new benchmark dataset for fake news
detection, preprint (2017), arXiv:1705.00648.
11. M. Koziarski, M. Woźniak and B. Krawczyk, Combined cleaning and resampling algo-
rithm for multi-class imbalanced data with label noise, Knowledge-Based Systems 204
(2020) 106223.
12. T. Li, G. Kou, Y. Peng and P. S. Yu, An integrated cluster detection, optimization,
and interpretation approach for financial data, IEEE Transactions on Cybernetics
52(12) (2022) 13848–13861, doi:10.1109/TCYB.2021.3109066.
13. E. Masciari, V. Moscato, A. Picariello and G. Sperli, A deep learning approach to
fake news detection, in Int. Symp. Methodologies for Intelligent Systems (Springer,
Cham, 2020), pp. 113–122.
14. H. Reddy, N. Raj, M. Gala and A. Basava, Text-mining-based fake news detection
using ensemble methods, International Journal of Automation and Computing 17(2)
(2020) 210–221.
15. J. Younus Khan et al., A benchmark study of machine learning models for
online fake news detection, Machine Learning with Applications 4 (2021) 100032,
doi:10.1016/j.mlwa.2021.100032.
16. I. Ennejjai, S. I. El Ahrache and B. Hassan, Fake news detection using deep learning,
8th Int. Conf. Innovation and New Trends in Information Technology, 2020, Tangier,
Morocco, pp. 1–8.
17. J. A. Nasir, O. S. Khan and I. Varlamis, Fake news detection: A hybrid CNN-RNN
based deep learning approach, International Journal of Information Management
Data Insights 1 (2020) 100007, doi:10.1016/j.jjimei.2020.100007.
18. E. Amer, K. S. Kwak and S. El-Sappagh, Context-based fake news detection model
relying on deep learning models, Electronics 11(8) (2022) 1255.
2350034-32
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
19. A. Agarwal, M. Mittal, A. Pathak and L. M. Goyal, Fake news detection using a
blend of neural networks: An application of deep learning, SN Computer Science 1(3)
(2020) 1–9.
20. R. K. Kaliyar, A. Goswami and P. Narang, FakeBERT: Fake news detection in social
media with a BERT-based deep learning approach, Multimedia Tools and Applications
80(8) (2021) 11765–11788.
21. J. L. Elman, Learning and development in neural networks: The importance of starting
small, Cognition 48(1) (1993) 71–99.
22. Y. Bengio, J. Louradour, R. Collobert and J. Weston, Curriculum learning, in Proc.
26th Annual Int. Conf. Machine Learning (Association for Computing Machinery,
New York, 2009), pp. 41–48.
23. P. Soviany, R. T. Ionescu, P. Rota and N. Sebe, Curriculum learning: A survey,
International Journal of Computer Vision 130 (2022) 1526–1565.
24. D. Weinshall, G. Cohen and D. Amir, Curriculum learning by transfer learning: The-
ory and experiments with deep networks, in Proc. 35th Int. Conf. Machine Learning
(PMLR, 2018), pp. 5238–5246.
25. G. Hacohen and D. Weinshall, On the power of curriculum learning in training
deep networks, in Proc. 36th Int. Conf. Machine Learning (PMLR, 2019), pp. 2535–
2544.
26. X. Wei, L. Wei, H. Xiaolin, Y. Jie and Q. Song, Multi-modal self-paced learning for
image classification, Neurocomputing 309 (2018) 134–144.
27. M. G. Wisniewski et al., Easy-to-hard effects in perceptual learning depend upon the
degree to which initial trials are “easy”, Psychonomic Bulletin & Review 26 (2019)
1889–1895, doi:10.3758/s13423-019-01627-4.
28. P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, Enriching word vectors with
subword information, Transactions of the Association for Computational Linguistics
5 (2017) 135–146.
29. A. R. Vijjini, K. Anuranjana and R. Mamidi, Analyzing curriculum learning for sen-
timent analysis along task difficulty, pacing and visualization axes, preprint (2021),
arXiv:2102.09990.
30. T. Pi, X. Li, Z. Zhang, D. Meng, F. Wu, J. Xiao and Y. Zhuang, Self-paced boost
learning for classification, in Proc. Twenty-Fifth Int. Joint Conf. Artificial Intelligence
(IJCAI’16) (AAAI Press, 2016), pp. 1932–1938.
31. M. H. Goldani, S. Momtazi and R. Safabakhsh, Detecting fake news with capsule
neural networks, Applied Soft Computing 101 (2021) 106991.
32. A. Iftikhar, Y. Muhammad, Y. Suhail and O. A. Muhammad, Fake news detection
using machine learning ensemble methods, Complexity 2020 (2020) 8885861.
33. T. Li, G. Kou and Y. Peng, Improving malicious URLs detection via feature engi-
neering: Linear and nonlinear space transformation methods, Information Systems 91
(2020) 101494, doi:10.1016/j.is.2020.101494.
34. I. Singh, P. Deepak and K. Anoop, On the coherence of fake news articles,
in ECML PKDD 2020 Workshops: Joint European Conf. Machine Learning and
Knowledge Discovery in Databases, eds. I. Koprinska et al., Communications in
Computer and Information Science, Vol. 1323 (Springer, Cham, 2020), pp. 591–
607.
35. P. Karuna et al., Enhancing cohesion and coherence of fake text to improve believabil-
ity for deceiving cyber attackers, in Proc. First Int. Workshop Language Cognition
and Computational Models, Santa Fe, New Mexico, United States (Association for
Computational Linguistics, 2018), pp. 31–40.
2350034-33
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
36. O. Yukari and K. Ichiro, Text classification based on the latent topics of important sen-
tences extracted by the PageRank algorithm, in Proc. ACL Student Research Work-
shop, Sofia, Bulgaria (Association for Computational Linguistics, 2013), pp. 46–51.
37. M. K. Elhadad, K. F. Li and F. Gebali, A novel approach for selecting hybrid features
from online news textual metadata for fake news detection, in Int. Conf. P2P, Parallel,
Grid, Cloud and Internet Computing (Springer, Cham, 2019), pp. 914–925.
38. O. Ajao, D. Bhowmik and S. Zargari, Fake news identification on twitter with hybrid
CNN and RNN models, in Proc. 9th Int. Conf. Social Media and Society (Association
for Computing Machinery, New York, 2018), pp. 226–230.
39. S. Gilda, Notice of violation of IEEE publication principles: Evaluating machine learn-
ing algorithms for fake news detection, in 2017 IEEE 15th Student Conf. Research
and Development(SCOReD) (IEEE, 2017), pp. 110–115.
40. K. Goseva et al., Identification of security related bug reports via text mining
using supervised and unsupervised classification, https://ntrs.nasa.gov/search.jsp?
R=20180004739 2020-02-02T17:46:02+00:00Z.
41. V. Chandola, A. Banerjee and V. Kumar, Anomaly detection: A survey, ACM Com-
puting Surveys 41(3) (2009) 1–58.
42. CWE-888, Software fault pattern (SFP) clusters, MITRE Corporation,
https://cwe.mitre.org/data/graphs/888.html.
43. N. Mansourov, Software fault patterns: Towards formal compliance points for
CWE (2011), https://buildsecurityin.uscert.gov/sites/default/files/MansourovSW
FaultPatterns.pdf.
44. L. Manevitz and M. Yousef, One-class document classification via neural networks,
Neurocomputing 70(7–9) (2007) 1466–1481.
45. M. Aldwairi and A. Alwahedi, Detecting fake news in social media networks, Procedia
Computer Science 141 (2018) 215–222.
46. J. C. S. Reis, A. Correia, F. Murai, A. Veloso and F. Benevenuto, Supervised
learning for fake news detection, IEEE Intelligent Systems 34(2) (2019) 76–81,
doi:10.1109/MIS.2019.2899143.
47. T. Li, G. Kou, Y. Peng and Y. Shi, Classifying With adaptive hyper-spheres: An incre-
mental classifier based on competitive learning, IEEE Transactions on Systems, Man,
and Cybernetics: Systems 50(4) (2020) 1218–1229, doi:10.1109/TSMC.2017.2761360.
48. H. Ahmed, I. Traore and S. Saad, Detection of online fake news using n-gram analysis
and machine learning techniques, in Int. Conf. Intelligent, Secure, and Dependable
Systems in Distributed and Cloud Environments (Springer, Cham, 2017), pp. 127–
138.
49. B. D. Horne and S. Adali, This just in: Fake news packs a lot in title, uses simpler,
repetitive content in text body, more similar to satire than real news, in 2nd Int.
Workshop on News and Public Opinion at ICWSM (AAAI Press, 2017), pp. 759–766.
50. L. Waikhom and R. S. Goswami, Fake news detection using machine learning, in Proc.
Int. Conf. Advancements in Computing & Management (ICACM) (2019), pp. 252–
256, https://ssrn.com/abstract=3462938, http://dx.doi.org/10.2139/ssrn.3462938.
51. F. Pierri and S. Ceri, False news on social media: A data-driven survey, ACM Sigmod
Record 48(2) (2019) 18–27.
52. J. W. Pennebaker, R. L. Boyd, K. Jordan and K. Blackburn, The development and
psychometric properties of LIWC2015, University of Texas at Austin, Austin (2015).
53. J. Chung, C. Gulcehre, K. Cho and Y. Bengio, Empirical evaluation of gated recurrent
neural networks on sequence modelings, preprint (2014), arXiv:1412.3555.
54. S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation
9(8) (1997) 1735–1780.
2350034-34
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
Fake News Detection Using FE, NLP, CL, and Deep Learning
55. S. Bajaj, “The pope has a new baby!” Fake news detection using deep learning,
Technical Report, Stanford University (2017), pp. 1–8.
56. Kaggle, Getting real about fake news (2017), https://www.kaggle.com/mrisdal/fake-
news.
57. Signal Media, The signal media one-million news articles dataset (2017),
http://research.signalmedia.co/newsir16/signal-dataset.html.
58. J. Pennington, R. Socher and C. D. Manning, GloVe: Global vectors for word rep-
resentation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing
(EMNLP) (Association for Computational Linguistics, 2014), pp. 1532–1543.
59. S. Yoon et al., Detecting incongruity between news headline and body text via a deep
hierarchical encoder, in Proc. AAAI Conf. Artificial Intelligence (AAAI Press, 2019),
pp. 791–800, doi:10.1609/aaai.v33i01.3301791.
60. Kaggle, Fake news detection, San Francisco, CA, USA (2018), https://www.kaggle.
com/jruvika/fake-news-detection.
61. P. Meel and D. K. Vishwakarma, Fake news, rumor, information pollution in social
media and web: A contemporary survey of state-of-the-arts, challenges and opportu-
nities, Expert Systems with Applications 153 (2020) 112986.
62. K. Shu, D. Mahudeswaran, S. Wang, D. Lee and H. Liu, Fakenewsnet: A data repos-
itory with news content, social context and spatialtemporal information for studying
fake news on social media, preprint (2018), arXiv:1809.01286.
63. Q. Le and T. Mikolov, Distributed representations of sentences and documents, in
Proc. 31st Int. Conf. Machine Learning (PMLR, 2014), pp. 1188–1196.
64. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot and E. Duchesnay, Scikit-learn: Machine learning in Python,
Journal of Machine Learning Research 12 (2011) 2825–2830.
65. R. Řehåuřek and P. Sojka, Software framework for topic modelling with large corpora,
in Proc. LREC 2010 Workshop on New Challenges for NLP Frameworks (University
of Malta, Malta, 2010), pp. 46–50.
66. H. Zhang and G. Kou, Role-based multiplex network embedding, in Proc. 39th
Int. Conf. Machine Learning (PMLR, 2022), pp. 26265–26280, https://proceedings.
mlr.press/v162/zhang22m.html.
67. C. Spearman, The proof and measurement of association between two things, The
American Journal of Psychology 100(3/4) (1987) 441–471.
68. F. N. Ribeiro, M. Araújo, P. Gonçalves, M. A. Gonçalves and F. Benevenuto, Sen-
tiBench - A benchmark comparison of state-of-the-practice sentiment analysis meth-
ods, EPJ Data Science 5(1) (2016) 1–29.
69. S. Baccianella, A. Esali and F. Sebastiani, SentiWordNet 3.0: An enhanced lexical
resource for sentiment analysis and opinion mining, in Proc. Seventh Int. Conf. Lan-
guage Resources and Evaluation (LREC’10) (European Language Resources Associ-
ation, 2010), pp. 2200–2204.
70. D. Ippolito et al., Automatic detection of generated text is easiest when humans
are fooled, in Proc. 58th Annual Meeting of the Association for Computational
Linguistics (Association for Computational Linguistics, 2020), pp. 1808–1822,
doi:10.18653/v1/2020.acl-main.164.
71. A. T. Ampa and D. M. Basri, Lexical and grammatical cohesions in the students’
essay writing as the English productive skills, Journal of Physics: Conference Series
1339(1) (2019) 012072.
72. X. Liu, H. Lai, D. F. Wong and L. S. Chao, Norm-based curriculum learning for neural
machine translation, preprint (2020), arXiv:2006.02014.
2350034-35
2nd Reading
April 5, 2023 19:10 WSPC/S0219-6220 173-IJITDM 2350034
73. L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proc.
COMPSTAT’2010 (Physica-Verlag HD, 2010), pp. 177–186.
74. J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, Journal
of Machine Learning Research 13(2) (2012) 281–305.
75. J. Kim, J. Kim, H. L. T. Thu and H. Kim, Long short term memory recurrent neural
network classifier for intrusion detection, in 2016 Int. Conf. Platform Technology and
Service (PlatCon) (IEEE, 2016), pp. 1–5.
76. T. A. Tang, L. Mhamdi, D. McLernon, S. A .R. Zaidi and M. Ghogho, Deep recurrent
neural network for intrusion detection in SDN-based networks, in 2018 4th IEEE Conf.
Network Softwarization and Workshops (NetSoft) (IEEE, 2018), pp. 202–206.
77. C. Yin, Y. Zhu, J. Fei and X. He, A deep learning approach for intrusion detection
using recurrent neural networks, IEEE Access 5 (2017) 21954–21961.
78. A. Onan, Bidirectional convolutional recurrent neural network architecture with
group-wise enhancement mechanism for text sentiment classification, Journal of King
Saud University - Computer and Information Sciences 34(5) (2022) 2098–2117.
79. L. Zhang, S. Wang and B. Liu, Deep learning for sentiment analysis: A survey, Wiley
Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4) (2018) e1253.
80. G. Gutierrez, J. Canul-Reich, A. O. Zezzatti, L. Margain and J. Ponce, Mining:
Students comments about teacher performance assessment using machine learning
algorithms, International Journal of Combinatorial Optimization Problems and Infor-
matics 9(3) (2018) 26–40.
81. K. Cho, M. van Merriënboer, D. Bahdanau and Y. Bengio, On the properties of neural
machine translation: Encoder-decoder approaches, preprint (2014), arXiv:1409.1259.
82. A. S. Santra and J. L. Lin, Integrating long short-term memory and genetic algorithm
for short-term load forecasting, Energies 12(11) (2019) 2040.
83. T. Young, D. Hazarika, S. Poria and E. Cambria, Recent trends in deep learning
based natural language processing, IEEE Computational Intelligence Magazine 13(3)
(2018) 55–75.
84. S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu and J. Gao, Deep
learning-based text classification: A comprehensive review, ACM Computing Surveys
54(3) (2021) 1–40.
85. R. C. Staudemeyer, Applying long short-term memory recurrent neural networks to
intrusion detection, South African Computer Journal 56(1) (2015) 136–154.
86. R. C. Staudemeyer and C. W. Omlin, Evaluating performance of long short-term
memory recurrent neural networks on intrusion detection data, in Proc. South African
Institute for Computer Scientists and Information Technologists Conf. (Association
for Computing Machinery, New York, 2013), pp. 218–224.
87. H. Hindy, D. Brosset, E. Bayne, A. Seeam, C. Tachtatzis, R. Atkinson and X.
Bellekens, A taxonomy and survey of intrusion detection system design techniques,
network threats and datasets, preprint (2018), arXiv:1806.03517v1.
2350034-36