ART16000108
ART16000108
MEDWIN PUBLISHERS
Committed to Create Value for Researchers
Mehta DK1, Patel MB2, Dangi A2, Patwa N2, Patel Z2, Jain R2*, Shah PD3 and
Research Article
Suthar BR3
Volume 2 Issue 1
1
Department of Computer Application, Mandsaur University, Mandsaur, Madhypradesh, India
Received Date: January 25, 2024
2
Department of Computer Engineering, UVPCE, Ganpat Univesrity, Mehsana, Gujrat, India
Published Date: February 09, 2024
3
Department of Information Technology, UVPCE, Ganpat Univesrity, Mehsana, Gujrat, India
DOI: 10.23880/art-16000108
Abstract
This research article investigates the effectiveness of natural language processing (NLP) and supervised learning in classifying
fake news articles. With the increasing prevalence of fake news in online media, it has become critical to identify and categorize
such articles accurately. In this study, we apply NLP techniques to extract features from textual data, and use a supervised
learning algorithm to train a classification model. We use a dataset of fake news articles to evaluate the performance of
our model in terms of accuracy, precision, recall, and F1 score. Our results demonstrate that our approach achieved high
accuracy and robustness in the classification of fake news articles. Furthermore, we perform a feature importance analysis to
identify the most significant features that contribute to the classification of fake news. The findings of this study have practical
implications for identifying and combating fake news in online media, and also provide insights into the effectiveness of NLP
and supervised learning for text classification tasks.
Keywords: Natural Language Processing; Supervised Learning; Support Vector Machine; Classification; Machine Learning
Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Classification Adv Rob Tec
of Fake News Articles
2 Advances of Robotic Technology
decision trees, random forests, and support vector machines. Fact: Pew Research - 68% of Americans feel overwhelmed by
To achieve this goal, we collected a large dataset of news political news.
articles from various online sources and labelled them Health News:
as either fake or legitimate. We then trained several Definition: Covers medical research, public health, and
supervised learning models on this dataset and evaluated healthcare policies.
their performance using several metrics such as accuracy, Impact: Influences health behaviors and vaccine uptake.
precision, recall, and F1 score. Fact: Journal of Health Communication - Misinformation
leads to vaccine hesitancy.
The significance of this research lies in its potential Technology News:
to contribute to the development of automated tools for Definition: Information on tech advancements, innovations,
detecting and combating fake news. By exploring the efficacy and trends.
of NLP and supervised learning, we aim to provide insights Impact: Shapes consumer preferences and influences stock
into the most effective techniques for classifying fake news markets.
articles. These insights could inform the development Fact: World Economic Forum - 75 million jobs at risk due to
of more accurate and efficient tools for detecting and technological advancements.
combatting fake news, which is critical for maintaining the
integrity of democratic processes and ensuring the public’s Literature Review
access to reliable information.
Several studies have explored the use of NLP and
Supervised learning is a machine learning technique supervised learning in the classification of fake news
in which an algorithm is trained on labelled data to make articles. In a study by Shu K a dataset of news articles from
predictions on new, unseen data. In the context of fake news the 2016 U.S. presidential election was used to train and
classification, supervised learning algorithms are trained on evaluate different supervised learning algorithms for fake
a dataset of news articles labelled as either true or false, and news classification. The authors found that a combination
then used to predict the label of new, unseen articles. of NLP techniques and supervised learning algorithms, such
as logistic regression and random forest, could effectively
In summary, this research article provides an in-depth classify fake news articles.
exploration of the efficacy of NLP and supervised learning
in the classification of fake news articles. By investigating This research article is a dedicated exploration into the
the performance of several NLP techniques and supervised effectiveness of NLP and supervised learning in classifying
learning models, we aim to contribute to the development fake news articles. The investigation delves into various
of more accurate and efficient tools for detecting and NLP techniques, encompassing text pre-processing, feature
combatting fake news, which is essential for maintaining the engineering, and sentiment analysis, coupled with the
integrity of democratic processes and ensuring the public’s application of supervised learning models such as decision
access to reliable information. trees, random forests, and support vector machines.
Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
3 Advances of Robotic Technology
In addition, a study by Khanam Z explored the use Choudhary D studied fake job posting. They used
of a hybrid approach combining NLP techniques, such as SVM, Naive Bayes, Random Forest, and Logistic Regression
part-of-speech tagging and named entity recognition, and classifiers for comparison in order to identify fake news using
supervised learning algorithms, such as decision trees various datasets. With 61%, 97%, and 96% accuracy in the
and support vector machines, for fake news classification. Liar, Fake Job Posting, and Fake News datasets, respectively,
The authors used a dataset of news articles from different SVM classifier has the highest accuracy. GA-based fake news
sources and found that the hybrid approach outperformed detection algorithm, SVM, Naive Bayes, Random Forest, and
other classification models [2]. Logistic Regression are taken into consideration as fitness
functions [9].
Study by Shu K shows that the authors explore the use of
NLP and supervised learning algorithms for the classification Kong SH in their study applied natural language
of fake news on social media. They used a dataset of news processing (NLP) techniques for text analytics and train deep
articles from the 2016 U.S. presidential election and found learning models for detecting fake news based on news title
that a combination of NLP techniques and supervised or news content. They proposed a solution to apply in social
learning algorithms, such as logistic regression and random media and to remove bad experience of user where they
forest, could effectively classify fake news articles [3]. receive fake stories that is posted from non-reputed sources.
Tenser flow framework with built-in Keras deep learning
This study provides a comprehensive review of the libraries is used for this work. Findings from the models
challenges and opportunities related to fake news detection. demonstrate that while models trained with news titles need
The authors discuss various NLP techniques and supervised less computation time to reach decent performance, models
learning algorithms that have been proposed for fake news trained with news content can achieve greater performance
detection, including bag-of-words, word embedding’s, at the expense of increased computation time [10].
decision trees, and neural networks [4].
Sahoo S The author of this work presented an automatic
The authors of this study explore the use of deep method for detecting false news in the Chrome browser
learning techniques, such as convolutional neural networks environment, which allows it to identify false news on
(CNNs) and recurrent neural networks (RNNs), for fake Facebook. They employed a variety of Facebook account-
news classification. They used a dataset of news articles related features along with certain news article attributes
from different sources and found that CNNs could effectively to examine the account’s activity using deep learning. The
classify fake news articles [5]. planned fake news detection system has outperformed
the current state of the art solutions, according to an
In this study, the authors propose a hybrid approach experimental analysis of real-world data [11].
combining NLP techniques, such as part-of-speech tagging
and named entity recognition, with supervised learning Khan JY In this research author used three different
algorithms, such as decision trees and support vector datasets and applied machine leaning techniques for
machines, for fake news classification. They used a dataset detecting fake news articles spreading on social media.
of news articles from different sources and found that the Author find out that BERT and other similar pre trained
hybrid approach outperformed other classification models performed better for fake news detection on very small
[6]. datasets. they used lexical and sentiment features, n-gram,
and Empathy generated features for traditional machine
The authors of this study explore the use of geometric learning models, and pre-trained word embedding for deep
deep learning techniques, such as graph convolutional neural learning models [12].
networks, for fake news classification on social media. They
used a dataset of news articles from Twitter and found that In addition Chauhan T proposed a deep learning
the proposed approach could effectively classify fake news prediction model LSTM neural network. They also used glove
articles [7]. word embedding and vector representation of textual words
for feature extraction they used vectorization, tokenization
The authors of this study compare the performance of techniques [13].
several supervised learning algorithms, including logistic
regression, decision trees, and neural networks, for fake Methodology
news classification. They used a dataset of news articles from
different sources and found that the neural network-based This section outlines the methodology used to classify
approach outperformed other classification models [8]. fake articles. A supervised machine learning approach
Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
4 Advances of Robotic Technology
was employed, which involved collecting a dataset, pre- collection of labelled statements created for the purpose of
processing it, selecting features, and training and testing the training and testing fake news detection systems. It is an
data using various classifiers such as Random Forest, SVM, extension of the LIAR dataset, which was released in 2017
Naïve Bayes, and others. The proposed system methodology and contained 12,836 short statements labelled as either
is described in (Figure 1). To achieve the highest accuracy true, mostly true, half true, barely true, false, or pants on fire.
and precision, different experiments were conducted on The LIAR-PLUS Master dataset, released in 2019, includes
each algorithm individually and in combination. The tool was the original LIAR dataset as well as additional statements
implemented based on the classification model to detect fake that were fact-checked by PolitiFact and added to the dataset.
articles.
The LIAR-PLUS Master dataset contains 14,787
The primary objective is to utilize a series of classification statements labelled as either true, mostly true, half true, barely
algorithms to develop a classification model, which can be true, false, or pants on fire. Each statement is accompanied
utilized as a fake news scanner by identifying specific details by metadata such as the speaker, the publication, the date,
in news articles. This model will then be integrated into a and the subject. The dataset also includes additional features
Python application, allowing for the detection of fake news such as the statement’s length and its sentiment score. We
data. Additionally, the Python code has been optimized then used NLP and SL techniques to classify the articles as
through appropriate refactoring techniques. either real or fake. We compared the performance of several
models, including Naive Bayes, Logistic Regression, and
We used the LIAR-PLUS Master dataset which is a Support Vector Machines.
The classification algorithms applied in this model are source or the website where the news is published.
Support Vector Machine, Naive Bayes and Logistic Regression. 4. Contextual Information: Taking into account contextual
Most significant features used in this proposed methodology information, such as the presence of quotes, references,
are: or links in the article.
1. Word Frequency: Analyzing the frequency of certain
words in the text, as certain words might be more Results
common in fake news articles.
2. Sentiment Analysis: Determining the overall sentiment Our results as shown in (Table 1) and (Figure 2) showed
of the text to see if it tends to be more positive, negative, that NLP and SL techniques can effectively distinguish
or neutral. between real and fake news articles. The best-performing
3. Source Credibility: Considering the credibility of the model was a Support Vector Machine with a classification
Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
5 Advances of Robotic Technology
accuracy of 92%. Naive Bayes and Logistic Regression also 89%, respectively. Our findings suggest that NLP and SL can
performed well, with classification accuracies of 87% and be valuable tools in the fight against fake news.
Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
6 Advances of Robotic Technology
and reliability of information disseminated to the public. 4. Al-Ayyoub M, Jararweh Y, Al-Betar MA (2020) A Survey
By implementing these advanced technologies, we can on Fake News: Challenges and Opportunities. Journal of
not only identify false news articles with greater precision Information Science 46(2): 131-148.
but also bolster the public’s trust in the authenticity of the
information they encounter. As a consequence, the rampant 5. Reis JC, Lacerda A, Silva TF (2019) Deep Learning for
spread of misinformation can be curtailed, mitigating the Fake News Detection: An Investigation. Expert Systems
potential damage to public perception, discourse, and with Applications 123: 205-213.
decision-making.
6. Rashid SF, Kaur K, Rajpal N (2020) Fake News Detection
Using Hybrid Approach. Journal of Ambient Intelligence
The significance of our study extends beyond the realms
and Humanized Computing 11(7): 3043-3054.
of academia and research. It resonates with the broader
societal imperative to safeguard the integrity of information 7. Monti F, Frasca F, Eynard D, Mannion D, Bronstein MM
and fortify the public against the deleterious effects of (2019) Fake News Detection on social media using
fake news. A populace armed with accurate and reliable Geometric Deep Learning. Information Sciences 495:
information is better equipped to navigate the complexities 199-210.
of the modern world, make informed decisions, and actively
contribute to a thriving democratic society. 8. Patel K, Mehta M (2020) A Comparative Study of Machine
Learning Techniques for Fake News Detection. Journal of
In essence, our research underscores the transformative Intelligent & Fuzzy Systems 39(1): 585-593.
potential of leveraging NLP and SL techniques as powerful
tools in the battle against fake news. By adopting these 9. Choudhury D, Acharjee T (2022) A novel approach to
advanced methodologies, we have the opportunity to not fake news detection in social networks using genetic
only enhance our ability to discern truth from falsehood algorithm applying machine learning classifiers.
but also to contribute to the broader societal endeavour Multimedia Tools and Applications 82(6): 9029-9045.
of fostering an informed and resilient public discourse. As
10. Kong SH, Tan LM, Gan KH, Samsudin NH (2020) Fake
technology continues to evolve, the integration of NLP and
News Detection using Deep Learning. IEEE Conference
SL into our information verification processes remains a
Publication.
pivotal step towards fortifying the foundations of a reliable
and trustworthy information ecosystem. 11. Sahoo SR, Gupta BB (2021) Multiple features based
approach for automatic fake news detection on social
References networks using deep learning. Applied Soft Computing
100: 106983.
1. Reis JCS, Correia A, Murai F, Veloso A, Benevenuto
F, et al. (2019) Supervised Learning for Fake News 12. Khan JY, Khondaker MTI, Afroz S, Uddin G, Iqbal A
Detection. IEEE Intelligent Systems 34(2): 76-81. (2021) A benchmark study of machine learning models
for online fake news detection. Machine Learning with
2. Khanam Z, Alwasel BN, Sirafi H, Rashid M (2021) Fake Applications 4: 100032.
News Detection Using Machine Learning Approaches. IOP
Conference Series: Materials Science and Engineering 13. Chauhan T, Palivela H (2021) Optimization and
1099(1): 012040. improvement of fake news detection using deep learning
approaches for societal benefit. International Journal of
3. Shu K, Silva A, Wang S, Tang J, Liu H (2017) Fake News Information Management Data Insights 1(2): 100051.
Detection on social media: A Data Mining Perspective.
ACM SIGKDD Explorations Newsletter 19(1): 22-36.
Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.