0% found this document useful (0 votes)

50 views22 pages

Machine Learning-Based Approach For Fake News Detection

Uploaded by

ashutosh250803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views22 pages

Machine Learning-Based Approach For Fake News Detection

Uploaded by

ashutosh250803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Machine Learning-Based Approach for Fake

News Detection

H. L. Gururaj1,∗ , H. Lakshmi2 , B. C. Soundarya3,∗ ,

Francesco Flammini4 and V. Janhavi5

1
Department of Information Technology, Manipal Institute of Technology
Bengaluru, Manipal Academy Of Higher Education, Manipal, India
2
Department of Information Science and Engineering, Vidyavardhaka College of
Engineering, Mysuru, India
3
Department of Artificial Intelligence and Machine Learning, Alva’s Institute of
Engineering and Technology, Mangalore, India
4
IDSIA USI-SUPSI, Department of Innovative Technologies, University of Applied
Sciences and Arts of Southern Switzerland, CH
5
Department of Computer Science and Engineering, Vidyavardhaka College of
Engineering, Mysuru, India
E-mail: [email protected], [email protected]
∗
Corresponding Author

Received 01 January 2022; Accepted 23 September 2022;

Publication 02 December 2022

Abstract
In the modern era where the internet is found everywhere and there is rapid
adoption of social media which has led to the spread of information that was
never seen within human history before. This is due to the usage of social
media platforms where consumers are creating and sharing more information
where most of them are misleading with no relevance with reality. Classifying
the text article automatically as misinformation is a bit challenging task.
This development addresses how automated classification of text articles can
be done. We use a machine learning approach for the classification of news
articles. Our study involves exploring different textual properties that may
be often used to distinguish fake contents from real ones. By using those

Journal of ICT Standardization, Vol. 10_4, 509–530.

doi: 10.13052/jicts2245-800X.1042
© 2022 River Publishers
510 H. Lakshmi et al.

properties, can train the model with different machine learning algorithms
and evaluate their performances. The classifier with the best performance is
used to build the classification model which predicts the reliability of the
news articles present in the dataset.

Keywords: Fake news, machine learning, classification.

1 Introduction
Social media and the internet have the access to information data a whole
lot simpler and comfortable. As most of our lives are spent interacting online
via social media systems, we generally look out for or devour news from
web-based platforms rather than conventional news associations because it is
easy to share and talk about the information with buddies or different readers
in social media. Instead of the advantages supplied through social media,
the standard and high quality of stories are much less than the traditional
news organizations. News outlets have been benefited through the enormous
use of different social media platforms by giving up-to-date information in
close to the actual time to its subscribers. The news articles with intentionally
deliberately fake facts are produced online for spreading the news with the
purpose of financial or political gain. The more spread of fake news informa-
tion might negatively impact society and individuals. False information may
do authenticity equilibrium break of ecosystem related to the news. False
news is typically manipulated and its modifications the manner how human
beings interpret and see real news. False news is used by spammers mainly
to make revenues by using advertisements through click-baits.
Fake news is one of the greatest threats to commerce, journalism, and
democracy all over the world, with huge collateral damages. A US $130 bil-
lion loss in the stock market was the direct result of a fake news report that US
President Barak Obama got injured in an explosion [1]. Another case of fake
news campaigns that demonstrate the enormous impact that fake news can
have include the sudden shortage of salt in Chinese supermarkets after a fake
report that iodized salt would help counteract the effects of radiation after the
Fukushima nuclear leak in Japan [2], and an escalation of tensions between
India and Pakistan that began with fake reporting of the Bagalkot strike and
resulted in the deaths of military personnel and the loss of expensive military
equipment.
Social media has a big impact on society, so few people take advantage
or benefit from this reality. This will result in generating data articles, which
Machine Learning-Based Approach for Fake News Detection 511

aren’t real or possibly fake. Some websites produce fake news deliberately
post half of-truths, hoaxes, and disinformation which assert to be real infor-
mation. They use a social community to power website visitors of the net. The
essential intention of fake records internet sites is to affect ordinary public
opinion on topics.

2 Literature Survey
Mykhailo Granik et al. in their paper proposed a totally simple technique
for phony news identification by the utilization of credulous Bayes classifier.
This kind of got completed as a product program and got inspected towards
an agreement set of Facebook news posts. These have been amassed from the
three huge pages of Facebook all of them have news from the right and the
left, and there are three gigantic, huge standard political news pages. They
have accomplished the exactness of arrangement about generally 80% of
Classification precision for news which is phony is somewhat more terrible.
This may cause the skewness of the dataset to just 4.7% is likewise a piece of
phony information.
Himank Gupta has given a system which is upheld varying sorts of AI
idea which has different issues, which has the lack of exactness, interference,
time-stretch which handles numerous tweets in scarcely any sec. To start with,
they gather 40,000 tweets from the HSpam14 of a dataset. Then, at that point,
they will go on and describe the 150,000 tweets which are spam and 250,000
tweets that are non-spam. They will likewise determine the assortment of
lightweight highlights along with Top-30 words which are giving the most
perfect information acquired from the Bag of Words Model. 4. Accomplish-
ment has had the option to get with the precision of around 92% and defeated
this arrangement by practically 18%.
Marco l. dellavedova et. al.is the principal who proposed a hypothesis of
ML counterfeit news location idea which should be possible by the blend of a
news channel and the social substance highlights, outperforms and strategies
which are existing in the writing, exactness has been expanded up to 79.8%.
Besides, their strategy has been executed in the Facebook Messenger Chatbot
which they checked its anything but true applications, by which counter-
feit news recognition exactness was acquired around 82%. Just objective
was to separate the reason as solid or phony, they chief portrayed dataset
which they would use for his/her test, then, at that point, the content-based
methodology has been introduced that they would execute and the system
they proposed to follow it’s anything but a socially based term accessible in
512 H. Lakshmi et al.

writing. The outcome dataset comprises 15,600 posts, which came from 33
pages with more than 220000 preferences by 800,000 above clients. 8,922
fabrications posts and 6,578 non-scams posts.
Cody buntain et al. has fostered a strategy for mechanization of a
phony news distinguishing on applications like Twitter and by understanding
the method of expectation of the exactness of evaluations in exact way 2
believability news coverage-based appraisals of the correctness. This kind of
strategy is applied to the Twitter content which is sourced from Buzz Feeds
counterfeit news datasets. The element investigation recognizes every one
of the highlights which will be for the most part anticipated by structure
publicly supports and editorial exactness tests, and consequences of these are
following past work. They rely upon recognizing the profoundly re-tweeted
strings of discussions and utilizations the highlights of these sensible strings
to separate the tales and constraints of these work appropriateness just to
the well-known arrangement of tweets. As a larger part of tweets that are
infrequently re-tweeted, this technique consequently can be just utilized on a
minor measure of the Twitter discussion strings.
Shivam B. Parikh et al. likes to introduce an outline of the portrayal of
the article in the conditions of the late populace along with the differentiable
substance styles of the article and its peruser’s effect. Further, we move into
existing phony news discovery and intensely upheld approaches that are
text-based investigation and tell the famous ones of phony news datasets.
We finish up the paper by recognizing five keys open-research difficulties
which will assist us with knowing the future exploration. It is likewise the
hypothetical way approach that gives the outlines about the phony news
recognition and by breaking down the variables like mental ones.
One of the earliest works at the automated detection of fake news become
via Vlachos and Riedel. The authors described the project of fact-checking,
gathered a dataset from two popular reality-checking websites, and took into
consideration k-Nearest Neighbors classifiers for managing fact-checking as
a class undertaking. Wang (2017) released the LIAR dataset, which contains
12.8K manually labeled brief statements from PolitiFact.
Table 1 shows the various techniques and datasets that have been used by
various researchers for fake news detection.

3 Methodology
This framework involves fundamental theoretical information about every
single component and every aspect of the project. It gives a picture of
Machine Learning-Based Approach for Fake News Detection 513

Table 1 Techniques and datasets that have been used for fake news detection
Work Year Detection Model
Wang 2016 Fake news 6 levels majority LR, SVM,
bi-LSTM, CNN
Ma et al. 2016 Rumors 2 levels RF, DT SVM, RNN
Ruchansky 2017 LSTM
Popat 2018 Credibility 2 or 5 levels bi-LSTM, LSTM, CNN
Buntain and Gollbeck 2017 Credibility RF
Yang et al. 2018 Fake news 2 classes TI-CNN,LSTM, RNN
Karimi et al. 2019 Fake news 2 classes N-grams, LIWC, RST,
BiGRNN-CNN LSTM,
HDSF
Ahmed et al. 2018 Fake news & reviews SVM, SDG, LR, KNN, DT
Zhou et al. 2020 Fake news Clickbaits SVM, RF XGB, LR, NB
Disinform
Pamungkas et al. 2016 Stance LR

the simulation implementation and modeling, statistical analysis, software

implementation, and calculations done.

3.1 System Design

The system is developed in two parts. They are Static Implementation and
Dynamic Implementation as shown in Figure 1. The first component is Static
implementation that works on Machine Learning algorithms. Here, what we
do is extract the features from the dataset which is already pre-processed.
The features which are extracted are fed into four different classifiers.
The classifiers used are Logistic Regression, Random Forest Classifier, Sup-
port Vector Machine, and Passive-Aggressive Classifier. After fitting the
model, we compare the accuracies. Model performance is determined with
the help of a confusion matrix. The second component is Dynamic imple-
mentation which will take the keyword or text present in the news articles
from the dataset.

3.2 Algorithms Used for Classification

3.2.1 Random forest
Random Forest is the brand name for an outfit of decision trees. In Random
Forest, we will have an assortment of the choice trees. To classify a brand-
new object which is supported by its attributes and every tree will give a
514 H. Lakshmi et al.

Figure 1 System design.

special classification and which we say the tree “votes” for that specific class.
Random forest consistently picks the characterization which has more votes.
Random forest may be a classification algorithm that consists of many sorts
of decisions trees. Random forest will use different characteristic randomness
and bagging when the building of every single tree and uncorrelated forest of
trees will be created whose forecast is made by the committee is most accu-
rately perfect than the other single tree. Random forest like the name suggests
comprises a more prominent number of single and individual choice trees that
will be worked as a group. Every individual and single tree within the forest
spits out into a prediction of class which of sophistication with a high no of
votes and that will become our prediction model. The explanation of each one
of the models will work so well as an outsized number of models which are
relative and which are uncorrelated will be operating as a committee that can
outperform anybody of the individual and single constituent models. To know
how random forest makes sure that the behavior of every individual tree is not
an excessive amount of correlated with the behavior of any other individual
trees within the model. It uses the two subsequent methods:

3.2.1.1 Bagging
Decisions trees are more delicate to the information and that are trained
and little modifications within the set of training may end up in various
unlike the structure of trees. Random forest will exploit this by permitting
every individual tree to a dataset with a replacement which will randomly
sample that leads to various types of trees. This kind of process is called
bootstrapping or bagging.
Machine Learning-Based Approach for Fake News Detection 515

3.2.1.2 Feature randomness

Within the ordinary decision tree, the time comes for a node when is the
separation, we can consider every feature possible and might select which
produces most of the separation which is between the observations within the
node which is left and the right node. In contrast with each and, every single
tree in a random forest will take from the attributes that are subset to it only.
1: To make n Classifiers:
2: for i = 1 to n do
3: Sample the training data A Randomly with the replacement for Ai
output
4: Build an Ai – containing root node, Ni
5: Call BuildTree(Ni)
6: end for
7: BuildTree(N):
8: if N includes instances of only one class, then returns then
9: else
10: Select X% of the possible splitting characteristics at random in N
11: Select the feature P with the highest information gain to split on
12: Create f child nodes N, Ni, . . . , Np, where P has p possible values
(P1,. . . , Pp)
13: for i = 1 to P do
14: Select the contents of Ni to Ai is all instances in N that match Pi
15: Call BuildTree(Ni)
16: end for
17: end if

3.3 Logistic Regression

It is a kind of classification algorithm. This is not a regression algorithm.
It is an effective method for binary classification problems. In understandable
words, it can predict the probability of an event occurring that may be found
by fitting the data to the function which is called the logit function. So, it is
also called logit regression. It can predict the output and its probability which
is in the range of1 and 0. This is named for the function, which is used at the
main method, the Logistic function, which is also called the sigmoid function.
It is a curve that is S-shaped that can take any esteemed number which is real
and can map it into a value between 1 and 0 but will never suppress these
points. 1/(1 + e∧ -value) Where e is the base of the natural logarithms.
516 H. Lakshmi et al.

As the aim of our model is to simply classify the news article as true/false,
logistic regression is a good choice.
1. We first create an object of Logistic Regression.
2. The logistic regression classifier is trained to bypass the news article
training set into the fit function. After it is trained, predictions are made
on the test set using the predict function.
3. Accuracy is calculated to understand the performance of the classifier.
1: Procedure STOCHASTICGRADIENTDESCENT (D, Labels, Iter)
Input: Dataset D, Labels of Dataset, Iteration num
Output: Optimal weight of logistic regression
2: w ← [1, 1, . . . , 1]
3: Initialize qi with zero for all i
4: for k = 1 → Iter do
5: choose Data = D
6: for i = 1 → m do
7: γ ← Learning Rate
8: λ ← Regularization Lambda
9: u = u + γλ
10: Select a index of choose Data idx randomly
11: x ← Choose Data [idx]
12: del choose Data [idx]
13: for i e featuresinsamplex do
14: wi = wi − γδloss(w,xi)
δw
15: wh ← w
16: if wi > 0 then
17: wi ← max (0, wi − (u + qi))
18: else if wi < 0 then
19: wi ← min (0, wi − (u + qi))
20: end if
21: qi ← qi + (wi − wh)
22: end for
23: end for
24: end for

3.4 Passive Agreesive Classifier

The Passive-Aggressive Classifier is an algorithm that is used online, and it
is perfect for classifying nearly all of the huge data streams. It is very simple
to implement, and it is much faster, and it can be explained by an example,
Machine Learning-Based Approach for Fake News Detection 517

where learning and gaining from it would be easier and afterward sput it
aside. An algorithm of this kind will remain passive for the correctly predicted
data of outcomes and will turn more aggressive for the false data of outcomes
and will make the update and also will make required adjustments. Unlike
other algorithms, it will not converge. These is called Passive-Aggressive
algorithms because of the following reasons:
• PASSIVE: If it is the correct prediction, then it will keep the model the
same and will not make any changes.
• AGGRESSIVE: If it is an incorrect prediction, then it will make
changes to the model.
INPUT: Aggressiveness parameter C > 0
INITIALIZE: w1 = (0,. . . ,0)
For t = 1,2, . . .
• Receive instance: xt e Rn
• Predict: yt = sign (wt, xt)
• Receive correct label: yt e {−1, +1}
• Suffer loss: lt = max {0, 1 − yt (wt,xt)}
• Update

3.5 Support Vector Machine

This is ML model which is supervised that will use classification algo-
rithms for two-group classification problems. In the wake of giving SVM
model arrangements of marked preparing information for each class, they
are prepared to classify new content. It uses the concept of a hyperplane to
separate the two classes. The point of SVM is to partition the datasets into
classes to track down a most extreme negligible hyperplane. SVM uses a
technique called the kernel trick to transform data and then based on these
transformations it finds an optimal boundary between the possible outputs.
Given a set of n features, the SVM algorithm uses n-dimensional space to
plot the data item with the coordinates representing the value of each feature.
The hyperplane obtained to separate the two classes is used for classifying
the data.

SVM Pseudocode
F [0. . . N−1]: A feature set in N features that is sorted by information gain
in decreasing order accuracy (i): accuracy of the prediction model based on
SVM with F [0. . . i] gone set
518 H. Lakshmi et al.

Figure 2 Flowchart.

Low = 0
High = N−1
Value = accuracy (N−1)
IG_RFE_SVM (F [0. . . N-1], value, low, high) {
if (high<=low) then
return F [0. . . N-1] and value
Mid = (low + high) / 2
Value_2 = accuracy (mid)
if (value_2 >=value) then
return IG_RFE_SVM (F[0. . . mid], value_2, low, mid)
else(value_2 < value)
return IG_REF_SVM (F[0. . . high], value, mid, high)
end if
end if

3.6 Flowchart
The flowchart given in Figure 2 starts with the collection of datasets.
The dataset is preprocessed and then it is subjected to features election.
Four different Machine Learning algorithms are used to train the model.
Confusion matrix which is used here to calculate performance and accuracy
Results from all classifiers are compared and the classifier which gives the
best accuracy for the given dataset is used to build the classification model to
predict the reliability of the news.

4 Implementation
The implementation part is almost the same as the framework and system
design part which describes the system, which is performed at the finest level
Machine Learning-Based Approach for Fake News Detection 519

Figure 3 Pie chart derived from the dataset.

of details, down to the code level. This topic is regarding the realization of
the topics and earlier developed ideas.

4.1 Data Collection

Online news is mostly collected from different types of sources, like press
agencies, search engines, and websites of social media. The dataset used in
our project is a simple and realistic dataset that contains 6335 news articles
simply classified as Fake or Real and later stored in a CSV file.
The attributes of the dataset are:
Id: A unique identifier for the article
Title: Article Headline
Text: Article textual content
Class Label: Fake and Real

4.2 Data Pre-processing

Data that is taken from social media will be mostly not structured and most
of which will be an informal type of communication with shortcuts, different
slang, and bad grammar. To increase the performance and reliability we have
to pre-process the data before using it as a predictive model.

4.3 Data Cleaning

The data may be in the format of either structured or unstructured. A struc-
tured format has patterns that are well defined and the unstructured data do
not have a proper structure. Among structured and semi-structured formats,
comparison of the better structured and then unstructured way. Text and the
data have to be cleaned to highlight the attributes which we want our machine
learning system to work on accuracy.
520 H. Lakshmi et al.

It comprises a few steps:

(1) Punctuation Removal
Punctuations will give the grammatical expression to the sentence which
enhances our understandability. The vectorize checks the number of words
and cannot be the context that it doesn’t add esteem, hence we will eliminate
every character which is special. Example: “How are you doing?” Instead
“How are you doing”.
(2) Tokenization
It divides the context into certain units as sentences split into words like this
the unstructured text is given a structure.
Example: “work at the place” is split into wok“ “at ” “the ”.
(3) Removing Stop words
These are the common words which are mostly appeared in any text. They
lend much about the info hence we remove them. Example: copper or
aluminum is okay for me->copper, aluminum, okay.
(4) Stemming
This process helps us to reduce the length of the word to its stem form. It
generally treats words that are related similarly. Suffices, like “er”, “ible”,
“ness”, etc. are removed by rule-based approach.
After data cleaning is done we perform exploratory data analysis to
improve the statistical analysis of the given dataset.

4.4 Feature Generation

Many features like word count, repetition of distinctive words, repetition of
large words, etc can be done using text data. It is done through the creation of
the representation of words that catch meanings, relationships, and numerous
other types of context which are used within, Computer is made to understand
the given text and then do the classification of text. To make machine learning
algorithms understand our data vectorizing is done which encodes the text
into integers which is numerical form thus to vectors.
• Count Vectorizer tells us about the words that are present in the data
which are texts.
The result is given as 1 if it is there in the sentence or else 0 is. For each
text document bag of words is created along with the document.
• TF-IDF calculates the relative frequency (number of times repeated) of
the word that is seen in the document when its frequency is compared
Machine Learning-Based Approach for Fake News Detection 521

within each and every document. TF addresses the Term Frequency and
also computes the frequency of a term appearing within a document.
IDF stands for Inverse Document Frequency.
To store each word of relative count in the document matrix TF-IDF is
applied to the body.
Number of times t occurs in document ‘d’
TF (t, d) =
Total word count of document ‘d’

Total number of documents
IDF (t, d) =
Number of documents with term t in it
TFIDF (t, d) = T F (t, d) ∗ IDF (t)

To store each word of relative count in the document matrix TF-IDF is

applied on the body.

4.5 Training the Model

Using four different machine learning classifiers like Logistic regression,
Random forest classifier, Support vector machine, and Passive aggressive
classifier, training of the model is done after the features are extracted
from the pre-processed datasets. A passive-aggressive classifier is our good
performing classifier hence it is selected finally. After this, it is stored in the
disk where it is used for fake news classification. It in-takes the article in the
dataset from the user as input then predicts the reliability of the news.

4.6 Model Evaluation

Once fitting the model, we evaluate the performance of each model with the
help of a confusion matrix. After comparing the accuracies of the four clas-
sifiers, the classifier which is performing best will be taken as a classification
model for the detection of news. Most of the approaches will consider this as
a problem of classification that estimates if the information is real or fake:
True Positive (TP): when anticipated information that is false is correctly
grouped as fake news.
True Negative (TN): when anticipated information that is true is cor-
rectly grouped as true news.
False Negative (FN): when anticipated information that is true is not
correctly grouped as fake news.
522 H. Lakshmi et al.

Table 2 Confusion matrix

Class 1 (Predicted) Class 2 (Predicted)
Class 1 (Actual) TP FN
Class 2 (Actual) FP TN

Figure 4 Blockchain structure.

Figure 5 Confusion matrix obtained from random forest algorithm.

False Positive (FP): when anticipated information that is false is not

correctly grouped as true news.

Confusion matrix: A table that outlines the performance of a classifier on

a bunch of test information for which the genuine values are known. It is
used to visualize algorithm performance. The incorrect and correct numbers
Machine Learning-Based Approach for Fake News Detection 523

Figure 6 Confusion matrix obtained from logistic regression.

Figure 7 Confusion matrix obtained from passive-aggressive classification.

Figure 8 Graph obtained from all the algorithms based on accuracy versus models.
524 H. Lakshmi et al.

Table 3 Accuracy of classifiers

Model Accuracy
Random Forest Classifier 0.90
Logistic Regression 0.92
Passive Aggressive Classifier 0.94
Support Vector Machine 0.93

Figure 9 Predicting real news.

of predictions are concluded with value count and further, each class will
be broken. A confusion matrix is a summary of prediction results on a
classification model. It describes how the model of classification will be
confused when it makes predictions. It’s anything but an understanding not
just into the mistakes that are made by the classifier yet more critically the
sorts of blunders that are being made.
Machine Learning-Based Approach for Fake News Detection 525

Figure 10 Predicting fake news as fake.

Formulas for Precision, Recall, F1 score, accuracy:

1. Precision = TP/(TP + FP)
2. Recall = TP /(TP + FN)
3. F1 Score = 2 *((precision*recall) / (precision + recall))
4. Accuracy = (TP + TN) / (TP + TN + FP + FN)
These are the metrics used in machine learning which enable us to
evaluate the performance of a classifier model.

5 Result and Discussion

This section, demonstrates the working of the system. It includes a compre-
hensible summary of the results of all critical tests that were carried out.
Figure 4 shows the Blockchain structure.
526 H. Lakshmi et al.

Figure 5 shows the Confusion Matrix for predicted label and True label
obtained from Random Forest Algorithm. The true label contains 2 labels are
True and Fake.
Figure 6 shows the Confusion Matrix for predicted label and True label
obtained from Logistic Regression Algorithm.

5.1 Snapshots of the System Working

Figure 9 shows that when the text is entered it is going to classify whether it is
true or false. The below figure shows the text entered by the user is Real/True.
Likewise, Figure 10 shows the text entered is False.

6 Conclusion
Fake news detection is a research area that has a lot of scopes and also has a
large dataset. Our model is run against the existing dataset. From Table 2, we
conclude Passive-aggressive algorithms show a Maximum Accuracy of up to
94%. So using this classifier we built our classification model for fake news
detectors. The user can enter the text or keyword on the web page and check
the reliability of the news.
In our future work, we are looking forward to building a dataset on our
own which will be up to date and will have all the news accordingly. All the
latest data and live news will be updated in the database and the subsequent
stage is to train the model and break down how the exactness change with the
new information to add further develop it.

References
[1] Iftikhar Ahmad et al. “Fake news detection using machine learning
ensemble methods”. In: Complexity 2020 (2020).
[2] Monther Aldwairi and Ali Alwahedi. “Detecting fake news in
social media networks”. In: Procedia Computer Science 141 (2018),
pp. 215–222.
[3] Cody Buntain and Jennifer Golbeck. “Automatically identifying fake
news in popular twitter threads”. In: 2017 IEEE International Confer-
ence on Smart Cloud (SmartCloud). IEEE. 2017, pp. 208–215.
[4] Nadia K Conroy, Victoria L Rubin, and Yimin Chen. “Automatic decep-
tion detection: Methods for finding fake news”. In: Proceedings of the
association for information science and technology 52.1 (2015), pp. 1–4.
Machine Learning-Based Approach for Fake News Detection 527

[5] Marco L Della Vedova et al. “Automatic online fake news detection
combining content and social signals”. In: 2018 22nd Conference of
Open Innovations Association (FRUCT). IEEE. 2018, pp. 272–279.
[6] Mykhailo Granik and Volodymyr Mesyura. “Fake news detection using
naive Bayes classifier”. In: 2017 IEEE first Ukraine conference on elec-
trical and computer engineering (UKRCON). IEEE. 2017, pp. 900–903.
[7] A Santhosh Kumar et al. “Fake News Detection on Social Media
Using Machine Learning”. In: Journal of Physics: Conference Series.
Vol. 1916. 1. IOP Publishing. 2021, p. 012235.
[8] Benjamin Markines, Ciro Cattuto, and Filippo Menczer. “Social spam
detection”. In: Proceedings of the 5th international workshop on adver-
sarial information retrieval on the web. 2009, pp. 41–48.
[9] Cade Metz. “The bittersweet sweepstakes to build an AI that destroys
fake news”. In: Wired.com (2016).
[10] Rada Mihalcea and Carlo Strapparava. “The lie detector: Explorations
in the automatic recognition of deceptive language”. In: Proceedings of
the ACL-IJCNLP 2009 conference short papers. 2009, pp. 309–312.
[11] Shivam B Parikh and Pradeep K Atrey. “Media-rich fake news detec-
tion: A survey”. In: 2018 IEEE conference on multimedia information
processing and retrieval (MIPR). IEEE. 2018, pp. 436–441.
[12] Kai Shu et al. “Fake news detection on social media: A data mining
perspective”. In: ACM SIGKDD explorations newsletter 19.1 (2017),
pp. 22–36.
[13] Kelly Stahl. “Fake news detection in social media”. In: California State
University Stanislaus 6 (2018), pp. 4–15.
[14] William Yang Wang. “liar,liar pants on fire: A new benchmark dataset
for fake news detection”. In: arXivpreprint arXiv:1705.00648 (2017).
[15] Jiawei Zhang, Bowen Dong, and S Yu Philip. “Fake detector: Effec-
tive fake news detection with deep diffusive neural network”. In: 2020
IEEE 36th International Conference on Data Engineering (ICDE). IEEE.
2020, pp. 1826–1829.
528 H. Lakshmi et al.

Biographies

H. L. Gururaj Gururaj is currently working as Associate Professor, Depart-

ment of Information Technology, Manipal Institute of Technology – MIT,
Manipal Academy Of Higher Education (MAHE), Bangalore Campus, India.
He holds a Ph.D. Degree in Computer Science and Engineering from
Visweswaraya Technological University, Belagavi, India in 2019. He is a
professional member of ACM and working as ACM Distinguish Speaker
from 2018. He is the founder of Wireless Internetworking Group(WiNG).
He is a Senior member of IEEE and lifetime member of ISTE and CSI.

H. Lakshmi is working as an assistant professor in the department of

Information science and engineering, Vidyavardhaka College of Engineer-
ing, Mysuru. She has published papers in various national and international
conferences and journals.
Machine Learning-Based Approach for Fake News Detection 529

B. C. Soundarya is working as an assistant professor at Alva’s Institute of

Engineering and Technology Mangalore in the Department of AIML. She is
a member of IEEE and member of ACM-W. She published research papers
in various international journals and conferences.

Fransesco Flammini Since January 2020, Francesco Flammini is a Full

Professor of Computer Science with a focus on Cyber-Physical Systems at
Mälardalen University (MDH page).
He has been a Senior Lecturer and an Associate Professor (“Docent”)
in Computer Science at the Department of Computer Science and Media
Technology of Linnaeus University. He has led the Cyber-Physical Systems
(CPS) research and education area within the Smarter Systems complete
knowledge environment.
He got with honors his master (2003) and doctoral (2006) degrees in
Computer Engineering from the University of Naples Federico II, Italy.
530 H. Lakshmi et al.

V. Janhavi is working as an associate professor in the department of

computer science and engineering, Vidyavardhaka College of Engineering,
Mysuru. She has published papers in various national and international
conferences and journals.

Fake News Detection
No ratings yet
Fake News Detection
21 pages
Fake News Detection Using Machine Learning - IEEE Conference Publication - IEEE Xplore
No ratings yet
Fake News Detection Using Machine Learning - IEEE Conference Publication - IEEE Xplore
8 pages
Applications of Machine Learning and Data Analytics Models in Maritime Transportation
No ratings yet
Applications of Machine Learning and Data Analytics Models in Maritime Transportation
319 pages
Credit Risk Predictive Modelling - by EY
0% (1)
Credit Risk Predictive Modelling - by EY
37 pages
Beyond Panaceas in Water Institutions
No ratings yet
Beyond Panaceas in Water Institutions
6 pages
Fake News
No ratings yet
Fake News
22 pages
Minor Project Report
No ratings yet
Minor Project Report
46 pages
Relationship Between HR Practices and Employee Engagement in Indian Insurance Companies
No ratings yet
Relationship Between HR Practices and Employee Engagement in Indian Insurance Companies
10 pages
Fake News Detection
No ratings yet
Fake News Detection
15 pages
ML Question Bank
No ratings yet
ML Question Bank
68 pages
Fake News Detection Paper
No ratings yet
Fake News Detection Paper
10 pages
Big Data ML-Based Fake News Detection Using Distributed Learning
No ratings yet
Big Data ML-Based Fake News Detection Using Distributed Learning
31 pages
Da Unit III
No ratings yet
Da Unit III
43 pages
M.Thasleemabanu Document
No ratings yet
M.Thasleemabanu Document
56 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
24 pages
PD-1 SEC 2012 The New Philippine Socioeconomic Classification PDF
100% (1)
PD-1 SEC 2012 The New Philippine Socioeconomic Classification PDF
17 pages
Lec 05
No ratings yet
Lec 05
53 pages
Bioconf Iscku2024 00049
No ratings yet
Bioconf Iscku2024 00049
12 pages
A Novel Technique To Detect The Fake News by
No ratings yet
A Novel Technique To Detect The Fake News by
52 pages
Fake News Detection Using Machine Learning Algorithms
No ratings yet
Fake News Detection Using Machine Learning Algorithms
6 pages
ML Paper 7
No ratings yet
ML Paper 7
18 pages
Lec12 Logreg
No ratings yet
Lec12 Logreg
41 pages
FAke News Report
No ratings yet
FAke News Report
16 pages
Forecasting of Economic Recession Using Machine Learning
No ratings yet
Forecasting of Economic Recession Using Machine Learning
6 pages
ML Unit-2 Half
No ratings yet
ML Unit-2 Half
16 pages
An Enhanced Method For Detecting Fake Ne
No ratings yet
An Enhanced Method For Detecting Fake Ne
19 pages
Fictitious News Detection
No ratings yet
Fictitious News Detection
7 pages
Arti Research Paper Mca
No ratings yet
Arti Research Paper Mca
8 pages
Fake News Detection Using Machine Learning Approaches
No ratings yet
Fake News Detection Using Machine Learning Approaches
15 pages
[email protected] 4
No ratings yet
[email protected] 4
6 pages
Fake News Detection On Social Media Using Machine Learning Report
100% (1)
Fake News Detection On Social Media Using Machine Learning Report
27 pages
Data Science Interview Preparation 1
100% (3)
Data Science Interview Preparation 1
79 pages
Sima Final Paper
No ratings yet
Sima Final Paper
6 pages
Hybrid Deep Learning Model For Automatic Fake News Detection
No ratings yet
Hybrid Deep Learning Model For Automatic Fake News Detection
11 pages
Fake News Detection Using Machine Learning Report Final
No ratings yet
Fake News Detection Using Machine Learning Report Final
24 pages
Analyzing The Performance of Novel Logistic Regression Over Linear Regression Algorithms
No ratings yet
Analyzing The Performance of Novel Logistic Regression Over Linear Regression Algorithms
5 pages
Fake News Spotting Using Interrelated Feature Selection Model Using Logistic Reg
No ratings yet
Fake News Spotting Using Interrelated Feature Selection Model Using Logistic Reg
6 pages
How To Improve Public Transport Usage in A Medium-Sized City: Key Factors For A Successful Bus System
No ratings yet
How To Improve Public Transport Usage in A Medium-Sized City: Key Factors For A Successful Bus System
13 pages
Icoase51841 2020 9436605
No ratings yet
Icoase51841 2020 9436605
7 pages
Fake News Detection System by Manish Verma 16scse111009
No ratings yet
Fake News Detection System by Manish Verma 16scse111009
7 pages
Kumarjain2020 6
No ratings yet
Kumarjain2020 6
6 pages
Reserch Paper
No ratings yet
Reserch Paper
8 pages
Reserch Paperupdated
No ratings yet
Reserch Paperupdated
8 pages
Identifying Fake News in Real Time 230603 103213
No ratings yet
Identifying Fake News in Real Time 230603 103213
6 pages
Fake News Detection Using Deep Learning
No ratings yet
Fake News Detection Using Deep Learning
5 pages
Fake News Paper2
No ratings yet
Fake News Paper2
6 pages
Fake News Detection With Different Model
No ratings yet
Fake News Detection With Different Model
15 pages
E032462 Full
No ratings yet
E032462 Full
9 pages
Fake News Detection Using Python
No ratings yet
Fake News Detection Using Python
11 pages
Fake News Classification Using Transfer Learning
No ratings yet
Fake News Classification Using Transfer Learning
7 pages
Ieee Paper
No ratings yet
Ieee Paper
4 pages
Fake News Detection Using Machine Learning MINI REPORT Computer Science
No ratings yet
Fake News Detection Using Machine Learning MINI REPORT Computer Science
29 pages
TARP
No ratings yet
TARP
21 pages
Fake News Detection
No ratings yet
Fake News Detection
5 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
20 pages
Synopsis
No ratings yet
Synopsis
5 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
Report
No ratings yet
Report
2 pages
Lung Cancer Prediction System Using Logistic Regression Approach
No ratings yet
Lung Cancer Prediction System Using Logistic Regression Approach
5 pages
A Study of The Use of Cashless Payments in Relation To Income, Financial Behavior, and Almsgiving Behavior in Sumatera, Indonesia
No ratings yet
A Study of The Use of Cashless Payments in Relation To Income, Financial Behavior, and Almsgiving Behavior in Sumatera, Indonesia
5 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
6 pages
Journal of Psychosomatic Research: Asish Subedi, Krishna Pokharel, Birendra Prasad Sah, Pashupati Chaudhary
No ratings yet
Journal of Psychosomatic Research: Asish Subedi, Krishna Pokharel, Birendra Prasad Sah, Pashupati Chaudhary
6 pages
Regression
No ratings yet
Regression
4 pages
Machine Learning Techniques For The Classification of Fake News
No ratings yet
Machine Learning Techniques For The Classification of Fake News
5 pages
Fake News Detection System Using LSTM and Tensorflow
No ratings yet
Fake News Detection System Using LSTM and Tensorflow
4 pages
A Smart System For Fake News Detection Using Machine Learning
No ratings yet
A Smart System For Fake News Detection Using Machine Learning
7 pages
You Are What Apps You Use: Demographic Prediction Based On User's Apps
No ratings yet
You Are What Apps You Use: Demographic Prediction Based On User's Apps
4 pages
Fake News Detection of Indian and United States
No ratings yet
Fake News Detection of Indian and United States
6 pages
Face Mask Detection Using Deep Learning
No ratings yet
Face Mask Detection Using Deep Learning
31 pages
MSC Nursing
No ratings yet
MSC Nursing
8 pages
Mgt1051 Business-Analytics-For-Engineers TH 1.1 47 Mgt1051
No ratings yet
Mgt1051 Business-Analytics-For-Engineers TH 1.1 47 Mgt1051
2 pages
A Smart System For Fake News Detection Using Machine Learning
No ratings yet
A Smart System For Fake News Detection Using Machine Learning
7 pages
Data Science Kelly 2018
No ratings yet
Data Science Kelly 2018
6 pages
Taller6 Econometria2
No ratings yet
Taller6 Econometria2
3 pages
Fake News - 01
No ratings yet
Fake News - 01
5 pages
Levels of Measurement and Choosing The C
No ratings yet
Levels of Measurement and Choosing The C
2 pages
Fake News Detection Based On Word and Document Embedding Using Machine Learning Classifiers
No ratings yet
Fake News Detection Based On Word and Document Embedding Using Machine Learning Classifiers
11 pages
A Tool For Fake News Detection: September 2018
No ratings yet
A Tool For Fake News Detection: September 2018
9 pages
Fake News Detection
No ratings yet
Fake News Detection
11 pages
Fake News Detection Using Python and Machine Learning
No ratings yet
Fake News Detection Using Python and Machine Learning
6 pages
Fake News Synopsis 1
No ratings yet
Fake News Synopsis 1
6 pages
Fake News Synopsis 1
No ratings yet
Fake News Synopsis 1
6 pages
Fake News Detec-WPS Office
No ratings yet
Fake News Detec-WPS Office
4 pages
3.efficient Fake New Detector
No ratings yet
3.efficient Fake New Detector
9 pages
Report Se
No ratings yet
Report Se
4 pages
What Statistical Analysis Should I Use
No ratings yet
What Statistical Analysis Should I Use
2 pages
Synopsis
No ratings yet
Synopsis
8 pages
Fake News Detection Using Machine Learning Models
No ratings yet
Fake News Detection Using Machine Learning Models
5 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet

Machine Learning-Based Approach For Fake News Detection

Uploaded by

Machine Learning-Based Approach For Fake News Detection

Uploaded by

Machine Learning-Based Approach for Fake

H. L. Gururaj1,∗ , H. Lakshmi2 , B. C. Soundarya3,∗ ,

Received 01 January 2022; Accepted 23 September 2022;

Journal of ICT Standardization, Vol. 10_4, 509–530.

Keywords: Fake news, machine learning, classification.

the simulation implementation and modeling, statistical analysis, software

3.1 System Design

3.2 Algorithms Used for Classification

Figure 1 System design.

3.2.1.2 Feature randomness

3.3 Logistic Regression

3.4 Passive Agreesive Classifier

3.5 Support Vector Machine

Figure 3 Pie chart derived from the dataset.

4.1 Data Collection

4.2 Data Pre-processing

4.3 Data Cleaning

It comprises a few steps:

4.4 Feature Generation

To store each word of relative count in the document matrix TF-IDF is

4.5 Training the Model

4.6 Model Evaluation

Table 2 Confusion matrix

Figure 4 Blockchain structure.

Figure 5 Confusion matrix obtained from random forest algorithm.

False Positive (FP): when anticipated information that is false is not

Confusion matrix: A table that outlines the performance of a classifier on

Figure 6 Confusion matrix obtained from logistic regression.

Figure 7 Confusion matrix obtained from passive-aggressive classification.

Table 3 Accuracy of classifiers

Figure 9 Predicting real news.

Figure 10 Predicting fake news as fake.

Formulas for Precision, Recall, F1 score, accuracy:

5 Result and Discussion

5.1 Snapshots of the System Working

H. L. Gururaj Gururaj is currently working as Associate Professor, Depart-

H. Lakshmi is working as an assistant professor in the department of

B. C. Soundarya is working as an assistant professor at Alva’s Institute of

Fransesco Flammini Since January 2020, Francesco Flammini is a Full

V. Janhavi is working as an associate professor in the department of

You might also like