0% found this document useful (0 votes)
15 views6 pages

ART16000108

Uploaded by

Ebtsam Dosoky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

ART16000108

Uploaded by

Ebtsam Dosoky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Advances of Robotic Technology

MEDWIN PUBLISHERS
Committed to Create Value for Researchers

Exploring the Efficacy of Natural Language Processing and


Supervised Learning in the Classification of Fake News Articles

Mehta DK1, Patel MB2, Dangi A2, Patwa N2, Patel Z2, Jain R2*, Shah PD3 and
Research Article
Suthar BR3
Volume 2 Issue 1
1
Department of Computer Application, Mandsaur University, Mandsaur, Madhypradesh, India
Received Date: January 25, 2024
2
Department of Computer Engineering, UVPCE, Ganpat Univesrity, Mehsana, Gujrat, India
Published Date: February 09, 2024
3
Department of Information Technology, UVPCE, Ganpat Univesrity, Mehsana, Gujrat, India
DOI: 10.23880/art-16000108

*Corresponding author: Rahul Jain, UVPCE, FoET Department of Computer Engineering,


Ganpat University, India, Tel: +91-9993671809; Email: [email protected]

Abstract

This research article investigates the effectiveness of natural language processing (NLP) and supervised learning in classifying
fake news articles. With the increasing prevalence of fake news in online media, it has become critical to identify and categorize
such articles accurately. In this study, we apply NLP techniques to extract features from textual data, and use a supervised
learning algorithm to train a classification model. We use a dataset of fake news articles to evaluate the performance of
our model in terms of accuracy, precision, recall, and F1 score. Our results demonstrate that our approach achieved high
accuracy and robustness in the classification of fake news articles. Furthermore, we perform a feature importance analysis to
identify the most significant features that contribute to the classification of fake news. The findings of this study have practical
implications for identifying and combating fake news in online media, and also provide insights into the effectiveness of NLP
and supervised learning for text classification tasks.

Keywords: Natural Language Processing; Supervised Learning; Support Vector Machine; Classification; Machine Learning

Introduction supervised learning have shown great promise in detecting


and classifying fake news articles. NLP algorithms can
The proliferation of social media platforms and the analyse the content and language used in news articles,
democratization of access to information have revolutionized while supervised learning models can be trained on labelled
the way people consume news and information. However, datasets to identify patterns and features that distinguish
this democratization has also led to the spread of fake news, fake news from legitimate sources.
which is designed to deceive and manipulate the public. Fake
news articles have the potential to sway public opinion and The purpose of this research article is to explore the
impact democratic processes, making it crucial to develop efficacy of NLP and supervised learning in the classification
effective tools for detecting and combating them. of fake news articles. Specifically, we aim to investigate the
performance of several NLP techniques, including text pre-
Recent advances in machine learning, particularly processing, feature engineering, and sentiment analysis,
in the areas of natural language processing (NLP) and in conjunction with supervised learning models such as

Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Classification Adv Rob Tec
of Fake News Articles
2 Advances of Robotic Technology

decision trees, random forests, and support vector machines. Fact: Pew Research - 68% of Americans feel overwhelmed by
To achieve this goal, we collected a large dataset of news political news.
articles from various online sources and labelled them Health News:
as either fake or legitimate. We then trained several Definition: Covers medical research, public health, and
supervised learning models on this dataset and evaluated healthcare policies.
their performance using several metrics such as accuracy, Impact: Influences health behaviors and vaccine uptake.
precision, recall, and F1 score. Fact: Journal of Health Communication - Misinformation
leads to vaccine hesitancy.
The significance of this research lies in its potential Technology News:
to contribute to the development of automated tools for Definition: Information on tech advancements, innovations,
detecting and combating fake news. By exploring the efficacy and trends.
of NLP and supervised learning, we aim to provide insights Impact: Shapes consumer preferences and influences stock
into the most effective techniques for classifying fake news markets.
articles. These insights could inform the development Fact: World Economic Forum - 75 million jobs at risk due to
of more accurate and efficient tools for detecting and technological advancements.
combatting fake news, which is critical for maintaining the
integrity of democratic processes and ensuring the public’s Literature Review
access to reliable information.
Several studies have explored the use of NLP and
Supervised learning is a machine learning technique supervised learning in the classification of fake news
in which an algorithm is trained on labelled data to make articles. In a study by Shu K a dataset of news articles from
predictions on new, unseen data. In the context of fake news the 2016 U.S. presidential election was used to train and
classification, supervised learning algorithms are trained on evaluate different supervised learning algorithms for fake
a dataset of news articles labelled as either true or false, and news classification. The authors found that a combination
then used to predict the label of new, unseen articles. of NLP techniques and supervised learning algorithms, such
as logistic regression and random forest, could effectively
In summary, this research article provides an in-depth classify fake news articles.
exploration of the efficacy of NLP and supervised learning
in the classification of fake news articles. By investigating This research article is a dedicated exploration into the
the performance of several NLP techniques and supervised effectiveness of NLP and supervised learning in classifying
learning models, we aim to contribute to the development fake news articles. The investigation delves into various
of more accurate and efficient tools for detecting and NLP techniques, encompassing text pre-processing, feature
combatting fake news, which is essential for maintaining the engineering, and sentiment analysis, coupled with the
integrity of democratic processes and ensuring the public’s application of supervised learning models such as decision
access to reliable information. trees, random forests, and support vector machines.

Background To substantiate the research, an extensive dataset of


news articles from diverse online sources was meticulously
In the digital age, misinformation poses a significant curated and categorized as either fake or legitimate.
threat, especially through fake news. Natural Language Multiple supervised learning models were then deployed
Processing (NLP) and Supervised Learning offer promising and rigorously assessed using key metrics like accuracy,
solutions to tackle this issue. NLP involves computers precision, recall, and F1 score. The outcomes of this analysis
understanding and generating human-like text, while hold profound significance as they have the potential to pave
Supervised Learning uses labeled datasets for classification. the way for the development of automated tools adept at
Types of News and Impact: detecting and combatting fake news.
Fake News:
Definition: False information presented as genuine. Another study by Reis JC explored the use of deep
Impact: Causes public panic and erodes trust in media. learning techniques, such as convolutional neural networks
Fact: MIT study - False information is 70% more likely to be (CNN) and recurrent neural networks (RNN), for fake news
rewetted. classification. The authors used a dataset of news articles
Political News: from different sources and found that the use of deep learning
Definition: Pertains to government, politics, and legislation. techniques, particularly CNNs, could effectively classify fake
Impact: Shapes public opinion and influences elections. news articles [1].

Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
3 Advances of Robotic Technology

In addition, a study by Khanam Z explored the use Choudhary D studied fake job posting. They used
of a hybrid approach combining NLP techniques, such as SVM, Naive Bayes, Random Forest, and Logistic Regression
part-of-speech tagging and named entity recognition, and classifiers for comparison in order to identify fake news using
supervised learning algorithms, such as decision trees various datasets. With 61%, 97%, and 96% accuracy in the
and support vector machines, for fake news classification. Liar, Fake Job Posting, and Fake News datasets, respectively,
The authors used a dataset of news articles from different SVM classifier has the highest accuracy. GA-based fake news
sources and found that the hybrid approach outperformed detection algorithm, SVM, Naive Bayes, Random Forest, and
other classification models [2]. Logistic Regression are taken into consideration as fitness
functions [9].
Study by Shu K shows that the authors explore the use of
NLP and supervised learning algorithms for the classification Kong SH in their study applied natural language
of fake news on social media. They used a dataset of news processing (NLP) techniques for text analytics and train deep
articles from the 2016 U.S. presidential election and found learning models for detecting fake news based on news title
that a combination of NLP techniques and supervised or news content. They proposed a solution to apply in social
learning algorithms, such as logistic regression and random media and to remove bad experience of user where they
forest, could effectively classify fake news articles [3]. receive fake stories that is posted from non-reputed sources.
Tenser flow framework with built-in Keras deep learning
This study provides a comprehensive review of the libraries is used for this work. Findings from the models
challenges and opportunities related to fake news detection. demonstrate that while models trained with news titles need
The authors discuss various NLP techniques and supervised less computation time to reach decent performance, models
learning algorithms that have been proposed for fake news trained with news content can achieve greater performance
detection, including bag-of-words, word embedding’s, at the expense of increased computation time [10].
decision trees, and neural networks [4].
Sahoo S The author of this work presented an automatic
The authors of this study explore the use of deep method for detecting false news in the Chrome browser
learning techniques, such as convolutional neural networks environment, which allows it to identify false news on
(CNNs) and recurrent neural networks (RNNs), for fake Facebook. They employed a variety of Facebook account-
news classification. They used a dataset of news articles related features along with certain news article attributes
from different sources and found that CNNs could effectively to examine the account’s activity using deep learning. The
classify fake news articles [5]. planned fake news detection system has outperformed
the current state of the art solutions, according to an
In this study, the authors propose a hybrid approach experimental analysis of real-world data [11].
combining NLP techniques, such as part-of-speech tagging
and named entity recognition, with supervised learning Khan JY In this research author used three different
algorithms, such as decision trees and support vector datasets and applied machine leaning techniques for
machines, for fake news classification. They used a dataset detecting fake news articles spreading on social media.
of news articles from different sources and found that the Author find out that BERT and other similar pre trained
hybrid approach outperformed other classification models performed better for fake news detection on very small
[6]. datasets. they used lexical and sentiment features, n-gram,
and Empathy generated features for traditional machine
The authors of this study explore the use of geometric learning models, and pre-trained word embedding for deep
deep learning techniques, such as graph convolutional neural learning models [12].
networks, for fake news classification on social media. They
used a dataset of news articles from Twitter and found that In addition Chauhan T proposed a deep learning
the proposed approach could effectively classify fake news prediction model LSTM neural network. They also used glove
articles [7]. word embedding and vector representation of textual words
for feature extraction they used vectorization, tokenization
The authors of this study compare the performance of techniques [13].
several supervised learning algorithms, including logistic
regression, decision trees, and neural networks, for fake Methodology
news classification. They used a dataset of news articles from
different sources and found that the neural network-based This section outlines the methodology used to classify
approach outperformed other classification models [8]. fake articles. A supervised machine learning approach

Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
4 Advances of Robotic Technology

was employed, which involved collecting a dataset, pre- collection of labelled statements created for the purpose of
processing it, selecting features, and training and testing the training and testing fake news detection systems. It is an
data using various classifiers such as Random Forest, SVM, extension of the LIAR dataset, which was released in 2017
Naïve Bayes, and others. The proposed system methodology and contained 12,836 short statements labelled as either
is described in (Figure 1). To achieve the highest accuracy true, mostly true, half true, barely true, false, or pants on fire.
and precision, different experiments were conducted on The LIAR-PLUS Master dataset, released in 2019, includes
each algorithm individually and in combination. The tool was the original LIAR dataset as well as additional statements
implemented based on the classification model to detect fake that were fact-checked by PolitiFact and added to the dataset.
articles.
The LIAR-PLUS Master dataset contains 14,787
The primary objective is to utilize a series of classification statements labelled as either true, mostly true, half true, barely
algorithms to develop a classification model, which can be true, false, or pants on fire. Each statement is accompanied
utilized as a fake news scanner by identifying specific details by metadata such as the speaker, the publication, the date,
in news articles. This model will then be integrated into a and the subject. The dataset also includes additional features
Python application, allowing for the detection of fake news such as the statement’s length and its sentiment score. We
data. Additionally, the Python code has been optimized then used NLP and SL techniques to classify the articles as
through appropriate refactoring techniques. either real or fake. We compared the performance of several
models, including Naive Bayes, Logistic Regression, and
We used the LIAR-PLUS Master dataset which is a Support Vector Machines.

Figure 1: Proposed Methodology.

The classification algorithms applied in this model are source or the website where the news is published.
Support Vector Machine, Naive Bayes and Logistic Regression. 4. Contextual Information: Taking into account contextual
Most significant features used in this proposed methodology information, such as the presence of quotes, references,
are: or links in the article.
1. Word Frequency: Analyzing the frequency of certain
words in the text, as certain words might be more Results
common in fake news articles.
2. Sentiment Analysis: Determining the overall sentiment Our results as shown in (Table 1) and (Figure 2) showed
of the text to see if it tends to be more positive, negative, that NLP and SL techniques can effectively distinguish
or neutral. between real and fake news articles. The best-performing
3. Source Credibility: Considering the credibility of the model was a Support Vector Machine with a classification

Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
5 Advances of Robotic Technology

accuracy of 92%. Naive Bayes and Logistic Regression also 89%, respectively. Our findings suggest that NLP and SL can
performed well, with classification accuracies of 87% and be valuable tools in the fight against fake news.

Model Accuracy Precision Recall F1 Score


Naive Bayes 0.87 0.86 0.88 0.87
Logistic Regression 0.91 0.89 0.89 0.91
Random Forest 0.88 0.87 0.9 0.88
Support Vector Machines 0.92 0.88 0.91 0.89
Table 1: Results Obtained for Different Classifiers.

Figure 2: Performance of Different Algorithms.

Conclusion The findings of our research underscore the efficacy


of NLP, a branch of artificial intelligence, in unravelling the
The proliferation of fake news has become a significant intricate linguistic nuances embedded in news articles.
problem, and it is crucial to develop effective methods Through advanced language analysis and comprehension,
for detecting and classifying false information. Our study NLP algorithms exhibit a remarkable ability to discern subtle
shows that NLP and SL techniques can be highly effective in patterns, identify contextual cues, and unveil discrepancies
distinguishing between real and fake news articles. By using that signal the presence of false information. This capability
these methods, we can improve the accuracy and reliability becomes especially crucial in an era where misinformation
of information available to the public, helping to prevent the often masquerades as legitimate news, making it challenging
spread of false information. for the public to distinguish fact from fiction.

Discussion and Future Scope Supervised Learning, as demonstrated by our study,


amplifies the effectiveness of NLP by harnessing the power of
The escalating proliferation of fake news has emerged labelled datasets. By training models on meticulously curated
as a formidable challenge, posing a serious threat to the datasets that distinguish between authentic and fake news,
veracity of information circulating in society. In response supervised learning algorithms become adept at recognizing
to this growing menace, it has become imperative to fortify underlying patterns and characteristics associated with
our defences against the dissemination of false information deceptive content. This approach enables the algorithms to
by implementing robust and effective methods for detecting make informed predictions when faced with new, unseen
and classifying deceptive content. Our study underscores articles, thereby significantly enhancing their capacity to
the transformative potential of Natural Language Processing identify and categorize false information.
(NLP) and Supervised Learning (SL) techniques as formidable
tools in the arsenal against misinformation. The overarching goal of incorporating NLP and SL
techniques into our methodology is to elevate the accuracy

Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.
6 Advances of Robotic Technology

and reliability of information disseminated to the public. 4. Al-Ayyoub M, Jararweh Y, Al-Betar MA (2020) A Survey
By implementing these advanced technologies, we can on Fake News: Challenges and Opportunities. Journal of
not only identify false news articles with greater precision Information Science 46(2): 131-148.
but also bolster the public’s trust in the authenticity of the
information they encounter. As a consequence, the rampant 5. Reis JC, Lacerda A, Silva TF (2019) Deep Learning for
spread of misinformation can be curtailed, mitigating the Fake News Detection: An Investigation. Expert Systems
potential damage to public perception, discourse, and with Applications 123: 205-213.
decision-making.
6. Rashid SF, Kaur K, Rajpal N (2020) Fake News Detection
Using Hybrid Approach. Journal of Ambient Intelligence
The significance of our study extends beyond the realms
and Humanized Computing 11(7): 3043-3054.
of academia and research. It resonates with the broader
societal imperative to safeguard the integrity of information 7. Monti F, Frasca F, Eynard D, Mannion D, Bronstein MM
and fortify the public against the deleterious effects of (2019) Fake News Detection on social media using
fake news. A populace armed with accurate and reliable Geometric Deep Learning. Information Sciences 495:
information is better equipped to navigate the complexities 199-210.
of the modern world, make informed decisions, and actively
contribute to a thriving democratic society. 8. Patel K, Mehta M (2020) A Comparative Study of Machine
Learning Techniques for Fake News Detection. Journal of
In essence, our research underscores the transformative Intelligent & Fuzzy Systems 39(1): 585-593.
potential of leveraging NLP and SL techniques as powerful
tools in the battle against fake news. By adopting these 9. Choudhury D, Acharjee T (2022) A novel approach to
advanced methodologies, we have the opportunity to not fake news detection in social networks using genetic
only enhance our ability to discern truth from falsehood algorithm applying machine learning classifiers.
but also to contribute to the broader societal endeavour Multimedia Tools and Applications 82(6): 9029-9045.
of fostering an informed and resilient public discourse. As
10. Kong SH, Tan LM, Gan KH, Samsudin NH (2020) Fake
technology continues to evolve, the integration of NLP and
News Detection using Deep Learning. IEEE Conference
SL into our information verification processes remains a
Publication.
pivotal step towards fortifying the foundations of a reliable
and trustworthy information ecosystem. 11. Sahoo SR, Gupta BB (2021) Multiple features based
approach for automatic fake news detection on social
References networks using deep learning. Applied Soft Computing
100: 106983.
1. Reis JCS, Correia A, Murai F, Veloso A, Benevenuto
F, et al. (2019) Supervised Learning for Fake News 12. Khan JY, Khondaker MTI, Afroz S, Uddin G, Iqbal A
Detection. IEEE Intelligent Systems 34(2): 76-81. (2021) A benchmark study of machine learning models
for online fake news detection. Machine Learning with
2. Khanam Z, Alwasel BN, Sirafi H, Rashid M (2021) Fake Applications 4: 100032.
News Detection Using Machine Learning Approaches. IOP
Conference Series: Materials Science and Engineering 13. Chauhan T, Palivela H (2021) Optimization and
1099(1): 012040. improvement of fake news detection using deep learning
approaches for societal benefit. International Journal of
3. Shu K, Silva A, Wang S, Tang J, Liu H (2017) Fake News Information Management Data Insights 1(2): 100051.
Detection on social media: A Data Mining Perspective.
ACM SIGKDD Explorations Newsletter 19(1): 22-36.

Jain R, et al. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Copyright© Jain R, et al.
Classification of Fake News Articles. Adv Rob Tec 2024, 2(1): 000108.

You might also like