Fake News Classification Using Transformer Based Enhanced LSTM and BERT
Fake News Classification Using Transformer Based Enhanced LSTM and BERT
a r t i c l e i n f o a b s t r a c t
Keywords: Fake News has been a concern all over the world and social media has only amplified this phenomenon. Fake
Fake News News has been affecting the world on a large scale as these are targeted to sway the decisions of the crowd
classification in a particular direction. Since manually verifying the legitimacy of news is very hard and costly, there has
transformer
been a great interest of researchers in this field. Different approaches to identifying fake news were examined,
Natural language processing
such as content-based classification, social context-based classification, image-based classification, sentiment-
based classification, and hybrid context-based classification. This paper aims to propose a model for fake news
classification based on news titles, following the content-based classification approach. The model uses a BERT
model with its outputs connected to an LSTM layer. Training and evaluation of the model were done on the
FakeNewsNet dataset which contains two sub-datasets, PolitiFact and GossipCop. A comparison of the model
with base classification models has been done. A vanilla BERT model has also been trained on the dataset under
similar constraints as the proposed model has to evaluate the impact same using an LSTM layer. The results
obtained showed a 2.50% and 1.10% increase in accuracy on PolitiFact and GossipCop datasets respectively over
the vanilla pre-trained BERT model.
1. Introduction chaos and problems to an extreme extent for people around the world
and even resulted in a huge amount of fatalities.
Fake News is the misinformation disseminated among the public by The fake news ecosystem plays the basic human psychology a) Naive
mainstream sources like media outlets and social media. It is generally Realism: Believing the perception that only one’s reality is correct and
misleading to shape beliefs of the masses to one’s favour. As Paskin others are wrong. b)Confirmation Bias: Believing in the information that
stated “particular news articles that originate either on mainstream me- reinforces existing biases. Fake news has been a concern all over the
dia (online or offline) or social media and have no factual basis, but world for the past several years. The concerns raised by it have only
presented as facts and not satire” (Paskin, 2018). It has been a topic of at- escalated with the constantly increasing time of people being spent on
tention because this has been affecting our lives in various ways as there social media and thus being the main source of news for them. This at-
have been many incidents that have demonstrated the same very clearly. titude has just become much more common as social media serves its
Research has indicated that fake news was a significant factor for Don- users the stories and posts that align with their ideologies. This gives rise
ald Trump’s win as US president in the 2016 elections (Grinberg et al., to an echo chamber effect where the interactions of the user with oth-
2019). Also, the Brexit polling decision of the UK citizens. During these ers or the content he/she consumes match their existing ideologies and
events, fake news was targeted towards social media users depending thought processes (Törnberg, 2018). Thus, most people never bother to
upon their ideologies to persuade them to lean to a particular side and check the trustworthiness of the news as it is supported by their confir-
vote in their favour (Hopkin and Rosamond, 2018). The social media mation bias (Moravec et al., 2018).
platform like Whatsapp realized the scale of the widespread culture of The solution to this problem is imperative, considering the above-
sharing fake news and consequently had to run awareness campaigns stated facts about the explosion of fake news in the past decade and
against it in India (Farooq, 2017). Fake news was ramping up following how it has been exploited in various areas. Most people interact with
the COVID-19 breakout throughout the world and there was a profu- social media daily and share posts and news articles whose legitimacy
sion of rumors and fake news making rounds on social and mainstream is not determined which amplifies the effect of the fake news. Because
media (O’Connor and Murphy, 2020, Kadam and Atre, 2020). It created of the widespread prevalence of fake news all around the world, there
∗
Corresponding author
E-mail address: [email protected] (D. Kumar).
https://doi.org/10.1016/j.ijcce.2022.03.003
Received 29 October 2021; Received in revised form 15 March 2022; Accepted 15 March 2022
Available online 19 March 2022
2666-3074/© 2022 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
N. Rai, D. Kumar, N. Kaushik et al. International Journal of Cognitive Computing in Engineering 3 (2022) 98–105
are private media outlets that exclusively check the legitimacy of the F1 score on the test data for most models. The authors state that fact-
news by verifying the facts presented in the report (Luengo and García- checking is a challenging task but various lexical features can contribute
Marín, 2020). But as the world has moved towards social media, the to the understanding of the differences between more reliable and less
amount of news content being published is colossal and classifying fake reliable digital news sources (Rashkin et al., 2017). The dataset called
news is becoming increasingly difficult to assess the legitimacy of the FakeNewsNet was made publicly available for fake news detection. A
news manually. This process is getting more tedious and more costly comprehensive study has been done on the dataset and the results have
day by day, with the influx of huge amounts of information being gen- been compared with classification models like SVM, Logistic Regres-
erated every day. With the rise of DeepFake, checking media for manip- sion, NB, CNN, SAF/S (utilizing news content), and its variant SAF/A
ulations from the source is essential for verifying the integrity and the (utilizing social context). Amongst the compared models, a combined
information conveyed by it (Singh et al., 2020). Thus, recently com- model of SAF/A and SAF/S with LSTM cells had achieved the best accu-
puter researchers have been attempting to automate the process. This racy, i.e. 70.6% on the PolitiFact dataset; and 71.7% on the GossipCop
also motivated the authors to propose a model to automate the process dataset (Shu et al., 2020). A comprehensive study with hybrid models
and identify fake news patterns in news articles and media. using N-Gram, Word-Embeddings, and Topic Models for content-based
SVM and Naive Bayes (NB) classifiers have been used by various re- classification was proposed (Aggarwal et al., 2020, Walia et al., 2021).
searchers in this field. These models differ in their functioning and struc- Hybrid models such as N-Gram, N-Gram + Topic, N-Gram + Word2Vec,
ture but both produced similar results and were used as baseline models Word2Vec + Topic and N-Gram + Word2Vec + Topic have been com-
(Prasetijo et al., 2017, Granik and Mesyura, 2017). Various clustering pared and analyzed, and an accuracy of 80%, 77%, 72%, 42% and 40%
algorithms and decision trees have been used extensively in the liter- respectively and F1 score of 0.78, 0.76, 0.72, 0.39 and 0.46 respectively
ature for experimentation (Goyal et al., 2016). Recurrent Neural Net- has been achieved. It was noted that the performance of machine learn-
works (RNNs) are very popular in this field, especially Long Short-Term ing models decreased as more models are combined, most probably due
Memory (LSTM) (Sundermeyer et al., 2012). However, RNNs usually to high bias (Oriola, 8887).
face the problem of vanishing gradients, which hinders their capabil- Classification of fake news has been performed for news content fea-
ity of learning long data sequences which are solved by LSTMs. Word tures or social context features. Twitter data has been mined for social
embedding is an important factor to be considered while designing a context and news content features extraction from the source, headline,
model for NLP problems (Mikolov et al., 2018). To improvise upon this, body text of the news and images. The tweets can be targeted using
the proposed research used contextual word embedding using the BERT specific words (Mittal et al., 2019). Retweet rate, the time difference
model (Devlin et al., 2018). BERT can learn contextualized word repre- between the retweets, users retweeting are some of the important fea-
sentations by utilizing a huge volume of unlabeled text corpora. BERT tures that provided social context along with the text content of the
has performed well in the NLP tasks because of its intricate structure tweet. Another useful feature to mine is user comments on the tweet,
and excellent nonlinear representation learning capability. LSTMs effec- which provided additional text content. Utilizing the text-content and
tively boost performance by memorizing and finding the pattern of key user comments, a study compared fake news classification, models like
information. Therefore, contextualized word representations from BERT RST, LIWC, text-CNN, and HAN which uses news-content. Models like
are employed in LSTM to improve fake news classification performance, HPA-BLSTM which relied only on user comments, i.e. social-context;
thanks to their powerful ability to capture semantics and long-distance and models like TCNN-URG and CSI utilised both news content and
dependencies in news titles. user comments for fake news classification. These models have been
The research contributions have been summarized as follows: compared along with a proposed model called dEFEND, consisting of a
word encoder, a sentence encoder, a user comments encoder, a sentence-
• The proposed methodology classifies the news based on its linguistic comment co-attention layer and a fake news prediction component. The
features such as syntactical, grammatical and semantical aspects of authors reported that dEFEND improved on other models and had an
the news reports and articles. accuracy of 90.4% with an F1 score of 0.928 on the PolitiFact dataset;
• To propose an approach for fake news classification by combining and an accuracy of 80.8% on the GossipCop dataset with an F1 score
the BERT model with an LSTM which classifies news articles as either of 0.755. The authors observed a drop in the accuracy when either the
fake or legitimate. co-attention for news contents or the user comments were removed. The
• Accuracy, Precision, Recall, and F1 Score have been used as the eval- results showed that user comments were necessary to guide fake news
uation criteria for checking the robustness of the proposed method- detection in dEFEND (Shu et al., 2019). A similar hybrid model utilizing
ology. GRU and RNN for word encoding, sentence encoding and user comments
• Empirical evaluation of the proposed methodology has been con- encoding was proposed with SVM as the classifier unit. It had an accu-
ducted with state-of-the-art methodologies such as conventional racy of 91.2% on PolitiFact with an F1 score of 0.932; and an accuracy
TCNN-URG, LIWC, CSI, HAN, SAFE etc. based on the various training of 80.2% on GossipCop with an F1 score of 0.762, but with the same
and testing phases. limitation of relying on user comments (Albahar, 2021).
The possibilities of using hierarchical propagation networks (HPN)
The paper is organized as follows: Section 2 discusses the literature
to perform temporal, i.e. time differences between post and user replies;
done in the area of NLP and fake news detection Section 3. explains the
and linguistic analysis, i.e. sentiment of the post and that of different
dataset description, architecture of BERT and LSTM which is followed
levels of user replies were investigated. Hybrid frameworks utilizing
by the architecture of the proposed model Section 4. depicts the detailed
HPNs paired with existing content-based classification models showed
Results & Analysis. The performance of the proposed methodology has
an overall improvement in results over the same existing content-based
been compared and analyzed with state-of-the-art methods which are
classification models not utilizing HPNs. Amongst the models compared
followed by the conclusion section.
by the authors, RST_HPFN (Rhetorical Structure Tree + Hierarchical
propagation network) had the highest best accuracy, i.e. 87.5% on the
2. Literature Review PolitiFact dataset with an F1 score of 0.843; and LIWC_HPFN (Linguistic
Inquiry and Word Count + Hierarchical propagation network) achieved
Earlier studies in the field of fake news detection revealed that var- the highest accuracy, i.e. 86.9% on the GossipCop dataset with an F1
ious lexical features can be useful in understanding the differences be- score of 0.871. This model relied on the propagation data of the fake
tween more and less trustworthy digital news sources. The authors com- news including the temporal data of retweets and information of users
bined LIWC (Linguistic Inquiry and Word Count) measurements with sharing the tweet (Shu et al., 2020).An early detection method for fake
the original text and found that the linguistic features improved the news has been proposed which utilizes a pre-trained BERT summariza-
99
N. Rai, D. Kumar, N. Kaushik et al. International Journal of Cognitive Computing in Engineering 3 (2022) 98–105
tion model for text summarization and GEAR, a fact verification model dings performed similarly to BiLSTM+ attention. The importance of pre-
based on BERT. This method had an accuracy of 68.2% on the Politi- training on target domains like corpus has also been discussed in the
Fact dataset with an F1 score of 0.725; and an accuracy of 73.8% on paper. It was concluded that the transformer models performed much
the GossipCop dataset with an F1 score of 0.525. The authors state that better than the non-transformer and word-based models (Wani et al.,
the model can be computationally expensive to use in real-world appli- 2021).
cations (Li and Zhou, 2020).GloVe word embeddings and a 1-D CNN
for n-gram feature extraction, followed by an LSTM layer for temporal 3. Methodology
feature extraction has been used for content-based fake news classifica-
tion (Agarwal et al., 2020). In another study, the use of LSTM cells with This section details the architecture of the proposed model. It also
the Attention model (LSTM-ATT) for content-based classification was contains details about the dataset on which the models are trained and
investigated, which had an accuracy of 83.3% on the PolitiFact dataset evaluated. A brief background about the BERT and LSTM architectures,
with an F1 score of 0.83; and an accuracy of 79.3% on the GossipCop dataset used and preprocessing methods has been explained in this sec-
dataset with an F1 score of 0.79. The authors concluded that the pro- tion. Fig. 1 depicts the proposed framework for content-based fake news
posed model performed well compared with baseline models (Lin et al., classification:
2019). BERT with an LSTM layer has been used as the classification model
SpotFake+, a multimodal framework utilizing XLNet for text pro- to classify the news titles. BERT with an LSTM layer is employed as the
cessing and VGG-19 for image processing had an accuracy of 84.6% on classification model to fulfil this purpose. It uses large number of un-
PolitiFact and 85.6% on GossipCop. It was noticed that text and im- labeled text corpora and acquire contextualized word representations.
age classification on GossipCop had an accuracy of 83.6% and 80% re- Because of its complicated structure and high nonlinear representation
spectively (Singhal et al., 2020). The importance of compound senti- learning power BERT scored well in the NLP tests. By memorizing and
ment and retweet rate for the classification of fake news were explored finding the pattern of vital information, LSTMs efficiently increased per-
and a simple feed-forward neural network used as a classifier. An accu- formance.
racy of 64% over the datasets with an F1 score of 0.64 was achieved.
A neural network with only compound sentiment was found to per- 3.1. Data Preprocessing
form similar to one using both compound sentiment and retweet rate
(Ezeakunne et al., 2020). In recent years, transformer-based models, like Data preprocessing is the essential step for training the models, there-
BERT has been explored for the task of fake news classification. One such fore news articles present in the dataset has been preprocessed. Follow-
proposed model utilizes three pre-trained BERT models for statements, ing are the steps taken to preprocess the data:
metadata and justifications present in the LIAR PLUS dataset. The pro- • Lowercased every word in the sentence
posed triple-BERT framework had an accuracy of 74% on the dataset. • Changed “’t” to “not”. For example, can’t be changed to can not
Notably, on the LIAR dataset, consisting of statement and metadata, a • Removed “@name”
double-BERT model had an accuracy of 72% (Mehta et al., 2021). In • Isolated and removed punctuations except "?"
another study, the authors proposed a framework utilizing a pre-trained • Removed other special characters
BERT and three parallel blocks of 1d-CNN having different kernel-sized • Removed stop words except "not" and "can"
convolutional layers with different filters for better learning. It gained • Removed trailing whitespaces.
high accuracy for the task of fake news classification on real-world fake • Tokenization of the cleaned text-content
news datasets as compared to state-of-the-art deep learning models, in-
dicating that features output from BERT proved to be more useful for After the text contents of the datasets were preprocessed, tokeniza-
the task (Kaliyar et al., 2021). A study utilized BERT for sarcasm detec- tion vectors, attention masks and their binary category were obtained
tion which focused on the context-based feature technique for sarcasm from Bert Tokenizer and packed together for training and classification.
identification using deep learning, transformer learning, and conven-
tional machine learning models on different datasets. It was observed 3.2. BERT
that BERT performed better than the deep learning models consisting
of GloVe embeddings, which shows BERT performed well in learning BERT (Bidirectional Encoder Representations from Transformers)
contextual features from the data (Eke et al., 2021). Yet another study has been made up of a transformer attention mechanism that learns
discussed that BERT is best suited for fake news classification because contextual relationships among words. The transformer consists of an
of its deep contextualizing nature. Two models has been proposed i.e. encoder responsible for reading text input. It also consists of a decoder
BAKE, with weighted cross-entropy as training loss and exBAKE, both which is responsible for prediction based on the task. In contrast to
the models have been trained on CNN and Daily Mail data whereas FNC- directional models which read the text input in sequential order, the
1 dataset has been used for the fine-tuning. The authors reported F1 transformer encoder reads all of the words simultaneously, thus giving
score of pre-trained BERT with cross-entropy as a training loss to be it a non-directional nature. This means that the model learns the con-
0.656. F1 score of BAKE and exBAKE has been reported to be 0.734 text of the word from all of its surrounding words. Thus, it is termed
and 0.746 respectively. The authors concluded that exBAKE was able to bidirectional BERT architecture for sentence-level classification Fig. 2.
achieve better performance in majority categories and performed better represents the architecture of BERT sentence-level classification model:
in minority categories of the FNC-1 dataset. The use of weighted cross- BERT has many versions of pre-trained models for different use cases.
entropy was found to be crucial for showing competitive results in fake Two of the most used models are-
news detection in FNC-1 dataset (Jwa et al., 2019).
• BERT-base: 12 encoder stack layers + 768 hidden units + 12 multi-
A bi-directional transformer approach with a feed-forward classifi-
head attention heads: 110M parameters.
cation layer discussed the benefits of utilizing transformer-based mod-
• BERT-large: 24 encoder stack layers + 1024 hidden units + 16 multi-
els over machine learning models. The authors reported that fine-tuned
head attention heads: 340M parameters.
BERT had an accuracy of 97.02% compared to XGBoost, which had an
accuracy of 89.37% on the NewsFN dataset (Aggarwal et al., 2020). A The input data needs to be converted into an appropriate format
comparative analysis of deep learning approaches for fake news identi- before using the pre-trained model. Relevant embeddings for each sen-
fication based on the COVID-19 fake news dataset has been done which tence has been obtained. Each encoder layer in these models takes list of
resulted that a fine-tuned BERT performed better than other models token embeddings and their attention masks as input. The same number
including BiLSTM+ attention. LSTM model trained on word embed- of embeddings with the same hidden size has been taken as the output.
100
N. Rai, D. Kumar, N. Kaushik et al. International Journal of Cognitive Computing in Engineering 3 (2022) 98–105
other RNNs. RNNs are good at learning small data sequences and ex-
cel at it (Luengo and García-Marín, 2020). The problem faced by RNNs
is that they suffer from the vanishing gradient problem, which hampers
their ability to learn and understand long sequences and context. LSTMs
are special RNNs that do not face this problem and are well suited to
learn long-term dependencies. All RNNs are made up of repeating mod-
ules but these generally have a very simple architecture such as a single
tanh layer. LSTMs however are made up of repeating modules called
cells containing four neural networks connected in a special manner
which are shown in Fig. 3. Each cell passes two states to the next cell
i.e. cell state and hidden state (ct and ht ). Each cell is used to remember
things and the manipulations to them are done using three mechanisms
called gates namely being forgotten, input, and output gate. Forget Gate
removes the information no longer useful for the LSTM which consists
of a sigmoid layer that makes the decision. Input gate is responsible for
the addition of relevant information to the current cell state and uses
tanh and sigmoid layers. The output gate is responsible to show the rel-
evant information from the current cell and employs a sigmoid layer
Fig. 3 shows the LSTM architecture:
Fig. 2. The architecture of BERT sentence-level classification (Moravec et al., ( [ ] )
2018) 𝑖𝑡 = 𝜎 𝑊𝑖 . ℎ𝑡−1, 𝑥𝑡 + 𝑏𝑖 (1)
( [ ] )
𝑓𝑡 = 𝜎 𝑊𝑓 . ℎ𝑡−1, 𝑥𝑡 + 𝑏𝑓 (2)
A single vector representing the entire input sentence has been fed to
the classifier, and the hidden state of the first token [CLS] of the model’s ( [ ] )
output can be used to represent the entire sentence, which is used for 𝑜𝑡 = 𝜎 𝑊𝑜 . ℎ𝑡−1, 𝑥𝑡 + 𝑏𝑜 (3)
classification purpose.
( [ ] )
𝐶 ′𝑡 = tanh 𝑊𝑐 . ℎ𝑡−1, 𝑥𝑡 + 𝑏𝑐 (4)
3.3. LSTM
𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶 ′𝑡 (5)
Long short-term memory is a type of RNN that can learn long-term
dependencies. The chain-like structure of LSTMs is similar to RNNs, but
( )
the base module that makes up the LSTM is structurally distinct from ℎ𝑡 = 𝑜𝑡 ∗ tanh 𝐶𝑡 (6)
101
N. Rai, D. Kumar, N. Kaushik et al. International Journal of Cognitive Computing in Engineering 3 (2022) 98–105
102
N. Rai, D. Kumar, N. Kaushik et al. International Journal of Cognitive Computing in Engineering 3 (2022) 98–105
Table 2
Comparison of models for PolitiFact and GossipCop
Accuracy (%) Precision Recall F1 Score Accuracy (%) Precision Recall F1 Score
103
N. Rai, D. Kumar, N. Kaushik et al. International Journal of Cognitive Computing in Engineering 3 (2022) 98–105
• HAN (Yang et al., 2016): For false news detection, HAN applies a A confusion matrix has been visualized for the correctness of pre-
hierarchical attention neural network structure to news material. It dictions made by the model on the test set. By definition, C is a confu-
uses word-level attention in each sentence and sentence-level atten- sion matrix where Ci,j is equal to the number of data points that belong
tion in each text to encode news material. to class i and are predicted to be in class j Fig. 8(a-b) represent the
• SAFE(Multi-modal) (Zhou et al., 2020): Text-CNN is extended in confusion matrix of the proposed model over PolitiFact and GossipCop
Similarity-Aware Fake (SAFE) news detection by adding fully con- datasets respectively.
nected layer that automatically extracts textual information for each
news article. A convolutional layer and a maximum pooling layer
5. Conclusion
are included. Each word is first embedded in a piece of material
with several words, and then a convolutional layer has been uti-
Fake News information is broadcast to the public through main-
lized to create a feature map from a succession of local inputs via a
stream sources such as media outlets and social media and is often de-
filter.
ceptive with the intent of influencing public opinion in one’s favour.
The importance of fake news classification in the modern-day and work
The proposed model has been implemented using (BERT+LSTM) done towards the same has been discussed in the paper. A content-based
on FakeNewsNet (PolitiFact and GossipCop) dataset and the re- classification model which classifies news as fake or real based on news
sults have been compared with baseline models such as TCNN- titles has been proposed. To classify the news titles, BERT with an LSTM
URG (Aggarwal et al., 2020), LIWC (Wani et al., 2021), CSI layer was utilized as the classification model. To accomplish this objec-
(Qian et al., 2018), HAN (Pennebaker et al., 2015), SAFE(Multimodal) tive, BERT with an LSTM layer is used as the classification model. BERT
(Ruchansky et al., 2017) and BERT. The proposed model achieved max- can learn contextualized word representations from a wide number of
imum accuracy of 88.75% as compared to other models. The proposed unlabeled text datasets. BERT performed well in NLP testing due to its
model got an increment of a minimum of 1.35% and a maximum of complex structure and great nonlinear representation learning capabil-
17.55% accuracy for baseline models in the PolitiFact dataset. Also, an ity. LSTMs effectively improve performance by memorizing and finding
increment of a minimum of 0.3% and a maximum of 10.5% is seen in the pattern of crucial information. Contextualized word representations
accuracy when compared to baseline models in the GossipCop dataset from BERT could be employed in LSTM to improve false news classi-
Fig. 6 and Fig. 7(a-b) shows a pictorial representation of the evaluated fication performance because of their high ability to capture semantics
metrics over PolitiFact and GossipCop respectively. and long-distance relationships in news titles. It has been compared with
104
N. Rai, D. Kumar, N. Kaushik et al. International Journal of Cognitive Computing in Engineering 3 (2022) 98–105
other classification methods and a vanilla BERT model. A slight improve- Jwa, H., Oh, D., Park, K., Kang, J. M., & Lim, H. (2019). exbake: Automatic fake news
ment has been seen with the proposed model which is indicative of the detection model based on bidirectional encoder representations from transformers
(bert). Applied Sciences, 9(19), 4062.
model learning linguistic patterns of news titles and their connection Kadam, A. B., & Atre, S. R. (2020). Negative impact of social media panic during the
with fake news. A few setbacks that the model may have faced is the COVID-19 outbreak in India. Journal of travel medicine, 27(3), taaa057.
thin line between fake news and real news titles. Often, the titles do not Kaliyar, R. K., Goswami, A., & Narang, P. (2021). FakeBERT: Fake news detection in social
media with a BERT-based deep learning approach. Multimedia Tools and Applications,
appear to have any difference as the writers of fake news begin to use 80(8), 11765–11788.
language similar to that used by real news. To overcome this problem, Li, Q., & Zhou, W. (2020). Connecting the Dots Between Fact Verification and Fake News
the news title has to be manually fact-checked, which may be a prospect. Detection. arXiv preprint arXiv:2010.05202.
Lin, Jun & Tremblay-Taylor, Glenna & Mou, Guanyi & You, Di & Lee, Kyumin. (2019).
The performance of deep learning models increases with more data and
Detecting Fake News Articles. 3021-3025. 10.1109/BigData47090.2019.9005980.
more data, in this case, would mean more instances where fake news Luengo, M., & García-Marín, D. (2020). The performance of truth: politicians, fact-check-
titles can be seen and a comprehensive study on the language used in it ing journalism, and the struggle to tackle COVID-19 misinformation. American Journal
of Cultural Sociology, 8(3), 405–427.
can be done. Social media is a mass spreader of fake news where fake
Mehta, D., Dwivedi, A., Patra, A., & Kumar, M. A. (2021). A transformer-based architecture
news is often tweeted and shared multiple times. Future work may in- for fake news classification. Social Network Analysis and Mining, 11(1), 1–12.
clude training the model on linguistic and lexical patterns of fake news Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in
as seen on social media sites. This architecture can be tested in the fu- Pre-Training Distributed Word Representations. In Proceedings of the Eleventh Inter-
national Conference on Language Resources and Evaluation (LREC 2018) Opgehaal van
ture on a variety of application domains, and it may even improve ex- https://aclanthology.org/L18-1008 .
isting benchmarks. The proposed model can be studied and tested with Mittal, M., Kaur, I., Pandey, S. C., Verma, A., & Goyal, L. M. (2019). Opinion Mining for the
various set-ups in the hopes of achieving greater performance than the Tweets in Healthcare Sector using Fuzzy Association Rule. EAI Endorsed Transactions
on Pervasive Health and Technology, 4(16).
current state. We also intend to tune the hyperparameters of both the Moravec, P., Minas, R., & Dennis, A. R. (2018). Fake news on social media: People believe
BERT and following layers, as well as to conduct a thorough analysis of what they want to believe when it makes no sense at all (pp. 18–87). Kelley School of
their impacts. Business Research Paper.
O’Connor, C., & Murphy, M. (2020). Going viral: doctors must tackle fake news in the
covid-19 pandemic. bmj, 369(10.1136).
Compliance with Ethical Standards Oriola, O. (2022). Exploring N-gram, Word Embedding and Topic Models for Con-
tent-based Fake News Detection in FakeNewsNet Evaluation. International Journal of
Computer Applications, 975, 8887.
The authors declare that they do not have any conflict of interest.
Paskin, D. (2018). Real or fake news: who knows? The Journal of Social Media in Society,
This research did not involve any human or animal participation. All 7(2), 252–273.
authors have checked and agreed on the submission. Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and
psychometric properties of LIWC2015.
Prasetijo, A. B., Isnanto, R. R., Eridani, D., Soetrisno, Y. A. A., Arfan, M., &
Declaration of Competing Interest Sofwan, A. (2017). Hoax detection system on Indonesian news sites based on text
classification using SVM and SGD. In 2017 4th International Conference on Information
The authors declare that they have no known competing financial Technology, Computer, and Electrical Engineering (ICITACEE) (pp. 45–49). IEEE.
Qian, F., Gong, C., Sharma, K., & Liu, Y. (2018). Neural User Response Generator: Fake
interests or personal relationships that could have appeared to influence News Detection with Collective User Intelligence. In IJCAI, 18, 3834–3840.
the work reported in this paper. Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of varying shades:
Analyzing language in fake news and political fact-checking. In Proceedings of the 2017
References conference on empirical methods in natural language processing (pp. 2931–2937).
Ruchansky, N., Seo, S., & Liu, Y. (2017). Csi: A hybrid deep model for fake news detection.
In Proceedings of the 2017 ACM on Conference on Information and Knowledge Manage-
Aggarwal, A., Chauhan, A., Kumar, D., Mittal, M., & Verma, S. (2020). Classification of ment (pp. 797–806).
fake news by fine-tuning deep bidirectional transformers-based language model. EAI Singh, A., Saimbhi, A. S., Singh, N., & Mittal, M. (2020). DeepFake Video Detection: A
Endorsed Transactions on Scalable Information Systems, 7(27). Time-Distributed Approach. SN Computer Science, 1(4), 1–8.
Aggarwal, A., Chauhan, A., Kumar, D., Mittal, M., Roy, S., & Kim, T. H. (2020). Video Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language
caption based searching using end-to-end dense captioning and sentence embeddings. modeling. Thirteenth annual conference of the international speech communication asso-
Symmetry, 12(6), 992. ciation.
Agarwal, A., Mittal, M., Pathak, A., & Goyal, L. M. (2020). Fake news detection using a Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2020). Fakenewsnet: A data repos-
blend of neural networks: an application of deep learning. SN Computer Science, 1(3), itory with news content, social context, and spatiotemporal information for studying
1–9. fake news on social media. Big Data, 8(3), 171–188.
Albahar, M. (2021). A hybrid model for fake news detection: Leveraging news content and Shu, K., Cui, L., Wang, S., Lee, D., & Liu, H. (2019). defend: Explainable fake news de-
user comments in fake news. IET Information Security. tection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge
Anand, I., Negi, H., Kumar, D., Mittal, M., Kim, T. H., & Roy, S. (2021). Residual U-Network Discovery & Data Mining (pp. 395–405).
for Breast Tumor Segmentation from Magnetic Resonance Images. Computers Materials Shu, K., Mahudeswaran, D., Wang, S., & Liu, H. (2020). Hierarchical propagation networks
& Continua, 67(3), 3107–3127. for fake news detection: Investigation and exploitation. Proceedings of the International
Deepak, S., & Chitturi, B. (2020). Deep neural approach to Fake-News identification. Pro- AAAI Conference on Web and Social Media, 14, 626–637.
cedia Computer Science, 167, 2236–2243. Singhal, S., Kabra, A., Sharma, M., Shah, R. R., Chakraborty, T., & Kumaraguru, P. (2020).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidi- Spotfake+: A multimodal framework for fake news detection via transfer learning
rectional transformers for language understanding. arXiv preprint arXiv:1810.04805. (student abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 34(10),
Eke, C. I., Norman, A. A., & Shuib, L. (2021). Context-Based Feature Technique for Sarcasm 13915–13916.
Identification in Benchmark Datasets Using Deep Learning and BERT Model. IEEE Törnberg, P. (2018). Echo chambers and viral misinformation: Modeling fake news as
Access, 9, 48501–48518. complex contagion. PloS one, 13(9), Article e0203958.
Ezeakunne, U., Ho, S. M., & Liu, X. (2020). Sentiment and retweet analysis of user response Walia, I. S., Kumar, D., Sharma, K., Hemanth, J. D., & Popescu, D. E. (2021). An Integrated
for fake news detection. In Proceedings of the 2020 International Conference on Social Approach for Monitoring Social Distancing and Face Mask Detection Using Stacked
Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation. ResNet-50 and YOLOv5. Electronics, 10(23), 2996.
Farooq, G. (2017). Politics of Fake News: how WhatsApp became a potent propaganda Wani, A., Joshi, I., Khandve, S., Wagh, V., & Joshi, R. (2021). Evaluating deep learn-
tool in India. Media Watch, 9(1), 106–117. ing approaches for covid19 fake news detection. In International Workshop on Com-
Goyal, L. M., Mittal, M., & Sethi, J. K. (2016). Fuzzy model generation using Subtractive bating On line Ho st ile Posts in Regional Languages during Emergency situation
and Fuzzy C-Means clustering. CSI transactions on ICT, 4(2-4), 129–133. (pp. 153–163). Cham: Springer.
Granik, M., & Mesyura, V. (2017). Fake news detection using naive Bayes classifier. In Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical atten-
2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON) tion networks for document classification. In Proceedings of the 2016 conference of the
(pp. 900–903). IEEE. North American chapter of the association for computational linguistics: human language
Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B., & Lazer, D. (2019). Fake news technologies (pp. 1480–1489).
on Twitter during the 2016 US presidential election. Science, 363(6425), 374–378. Zhou, X., Wu, J., & Zafarani, R. (2020). Similarity-Aware Multi-modal Fake News Detec-
Hopkin, J., & Rosamond, B. (2018). Post-truth politics, bullshit and bad ideas:‘Deficit tion. Advances in Knowledge Discovery and Data Mining, 12085, 354.
Fetishism’in the UK. New political economy, 23(6), 641–655.
105