0% found this document useful (0 votes)

37 views10 pages

Fake News Detection Using Machine Learning Algorithm

In our modern era where the internet is ubiquitous, everyone relies on various online resources for news. Along with the increase in the use of social media platforms like Facebook, Twitter, etc. news spread rapidly among millions of users within a very short span of time. The spread of fake news has far-reaching consequences like the creation of biased opinions to swaying election outcomes for the benefit of certain candidates.

Uploaded by

SMARTX BRAINS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views10 pages

Fake News Detection Using Machine Learning Algorithm

Uploaded by

SMARTX BRAINS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Journal Publication of International Research for Engineering and Management (JOIREM)

Volume: 05 Issue: 09 | Sept-2021

Fake News Detection using Machine Learning Algorithm

1. Author Name:Prof.Naresh Thoutam

2.Tanvi Deore 3.Sakshi Gawali 4.Gayatri Choudhar

Sandip Institude of Technology And Research Centre,Nashik

---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - In our modern era where the internet social media, the standard of stories on social media is less
is ubiquitous, everyone relies on various online than traditional news organizations. However, because it's
resources for news. Along with the increase in the inexpensive to supply news online and far faster and easier
use of social media platforms like Facebook,
Twitter, etc. news spread rapidly among millions of
users within a very short span of time. The spread
of fake news has far-reaching consequences like the
creation of biased opinions to swaying election
outcomes for the benefit of certain candidates.
Moreover, spammers use appealing news headlines
to generate revenue using advertisements via clickbaits.
In this paper, we aim to perform binary
classification of various news articles available
online with the help of concepts pertaining to
Artificial Intelligence, Natural Language
Processing and Machine Learning. We aim to
provide the user with the ability to classify the news
as fake or real and also check the authenticity of
thewebsite publishing the news.
Key Words: Internet, Social Media, Fake News,
Classification, Artificial Intelligence, Machine
Learning, Websites, Authenticity.

1. INTRODUCTION

As an increasing amount of our lives is spent interacting

online through social media platforms, more and more
people tend to hunt out and consume news from social
media instead of traditional news organizations.[1] The
explanations for this alteration in consumption behaviours
are inherent within the nature of those social media
platforms: (i) it's often more timely and fewer expensive
to consume news on social media compared with
traditional journalism , like newspapers or television; and
(ii) it's easier to further share, discuss , and discuss the
news with friends or other readers on social media. For
instance, 62 percent of U.S. adults get news on social
media in 2016, while in 2012; only 49 percent reported
seeing news on social media [1]. It had been also found
that socialmedia now outperforms television because the
major news source. Despite the benefits provided by
© 2021, JOIREM |www.joirem.com| Page
1
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021
to propagate through social media, r large volumes of faux
news, i.e., those news articles with intentionally false
information, are produced online for a spread of purposes,
like financial and political gain. it had been estimated that
ove1 million tweets are associated with fake news
―Pizzagate" by the top of the presidential election. Given
the prevalence of this new phenomenon, ―Fake news"
was even named the word of the year by the Macquarie
dictionary in 2016 [2]. The extensive spread of faux news
can have a significant negative impact on individuals and
society. First, fake news can shatter the authenticity
equilibrium of the news ecosystem for instance; it's
evident that the most popular fake news was even more
outspread on Facebook than the most accepted genuine
mainstream news during the U.S. 2016
presidential election. Second, fake news intentionally
persuades consumers to simply accept biased or false
beliefs. Fake news is typically manipulated by
propagandists to convey political messages or influence
for instance, some report shows that Russia has created
fake accounts and social bots to spread false stories. Third,
fake news changes the way people interpret and answer
real news, for instance, some fake news was just created
to trigger people's distrust and make them confused;
impeding their abilities to differentiate what's true from
what's not. To assist mitigate the negative effects caused
by fake news (both to profit the general public and
therefore the news ecosystem). It's crucial that we build up
methods to automatically detect fake news broadcast on
social media [3]. Internet and social media have made the
access to the news information much easier and
comfortable [2]. Often Internet users can pursue the events
of their concern in online form, and increased number of
the mobile devices makes this process even easier. But
with great possibilities come great challenges. Mass media
have an enormous influence on the society, and because it
often happens, there's someone who wants to require
advantage of this fact. Sometimes to realize some goals
mass-media may manipulate the knowledge in several
ways. This result in producing of the news articles that
isn‘t completely true or maybe completely false. There
even exist many websites that produce fake news almost
exclusively. They intentionally publish hoaxes, half-
truths, propaganda and disinformation asserting to be real
news – often using social media to drive web traffic and
magnify their effect. The most goals of faux news
websites are to affect the general public opinion on certain

© 2021, JOIREM |www.joirem.com| Page

2
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

matters (mostly political). Samples of such websites could Himank Gupta et. al. [10] gave a framework based on
also be found in Ukraine, United States of America, different machine learning approach that deals with
Germany, China and much of other countries [4]. Thus, various problems including accuracy shortage, time lag
fake news may be a global issue also as a worldwide (BotMaker) and high processing time to handle thousands
challenge. Many scientists believe that fake news issue of tweets in 1 sec. Firstly, they have collected 400,000
could also be addressed by means of machine learning and tweets from HSpam14 dataset. Then they further
AI [5]. There‘s a reason for that: recently AI algorithms characterize the 150,000 spam tweets and 250,000 non-
have begun to work far better on many classification spam tweets. They also derived some lightweight features
problems (image recognition, voice detection then on) along with the Top-30 words that are providing highest
because hardware is cheaper and larger datasets are information gain from Bag-ofWords model. 4. They were
available. There are several influential articles about able to achieve an accuracy of 91.65% and surpassed the
automatic deception detection. In [6] the authors provide existing solution by approximately18%.
a general overview of the available techniques for the
Marco L. Della Vedova et. al. [11] first proposed a novel
matter. In [7] the authors describe their method for fake
ML fake news detection method which, by combining
news detection supported the feedback for the precise
news content and social context features, outperforms
news within the micro blogs. In [8] the authors actually
existing methods in the literature, increasing its accuracy
develop two systems for deception detection supported
up to 78.8%. Second, they implemented their method
support vector machines and Naive Bayes classifier (this
within a Facebook Messenger Chabot and validate it with
method is employed within the system describedduring
a real-world application, obtaining a fake news detection
this paper as well) respectively. They collect the info by
accuracy of 81.7%. Their goal was to classify a news item
means of asking people to directly provide true or false
as reliable or fake; they first described the datasets they
information on several topics – abortion, execution and
used for their test, then presented the content-based
friendship. The accuracy of the detection achieved by the
approach they implemented and the method they proposed
system is around 70%. This text describes an easy fake
to combine it with a social-based approach available in the
news detection method supported one among the synthetic
literature. The resulting dataset is composed of 15,500
intelligence algorithms – naïve Bayes classifier, Random
posts, coming from 32 pages (14 conspiracy pages, 18
Forest and Logistic Regression. The goal of the research
scientific pages), with more than2, 300, 00 likes by
is to look at how these particular methods work for this
900,000+ users. 8,923 (57.6%) posts are hoaxes and 6,577
particular problem given a manually labelled news dataset
(42.4%) are non-hoaxes. Cody Buntain et. al. [12]
and to support (or not) the thought of using AI for fake
develops a method for automating fake news detection on
news detection. The difference between these article and
Twitter by learning to predict accuracy assessments in two
articles on the similar topics is that during this paper
credibilityfocused Twitter datasets: CREDBANK, a
Logistic Regression was specifically used for fake news
crowd sourced dataset of accuracy assessments for events
detection; also, the developed system was tested on a
in Twitter, and PHEME, a dataset of potential rumours in
comparatively new data set, whichgave a chance to gauge
Twitter and journalistic assessments of their accuracies.
its performance on a recent
They apply this method to Twitter content sourced from
A. Characteristics of Fake News: BuzzFeed‟s fake news dataset. A feature analysis
identifies features that are most predictive for crowd
They often have grammatical mistakes. They are often
sourced and journalistic accuracy assessments, results of
emotionally coloured. They often try to affect readers‘
which are consistent with prior work. They rely on
opinion on some topics. Their content is not always true.
identifying highly retweeted threads of conversation and
They often use attention seeking words and news format
use the features of these threads to classify stories, limiting
and click baits. They are too good to be true. Their sources
this work‘s applicability only to the set of popular tweets.
are not genuine most of the times [9].
Since the majority of tweets are rarely retweeted, this
method therefore is only usable on a minority of Twitter
2. Body of Paper conversationthreads.
Mykhailo Granik et. al. in their paper [3] shows a simple
approach for fake news detection using naive Bayes his paper, Shivam B. Parikh et. al. [13] aims to present an
classifier. This approach was implemented as a software insight of characterization of news story in the modern
system and tested against a data set of Facebook news diaspora combined with the differential content types of
posts. They were collected from three large Facebook news story and its impact on readers. Subsequently, we
pages each from the right and from the left, as well as three dive into existing fake news detection approaches that are
large mainstream political news pages (Politico, CNN, heavily based on textbased analysis, and also describe
ABC News). They achieved classification accuracy of popular fake news datasets. We conclude the paper by
approximately 74%. Classification accuracy for fake news identifying 4 key open research challenges that can guide
is slightly worse. This may be caused by the skewness of future research. It is a theoretical Approach which gives
the dataset: only 4.9% of it is fakenews

© 2021, JOIREM |www.joirem.com| Page

3
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

Illustrations of fake news detection by analysing the

psychological factors.
Methodology:
This paper explains the system which is developed in three
parts. The first part is static which works on machine
learning classifier. We studied and trained the model with
4 different classifiers and chose the best classifier for final
execution. The second part is dynamic which takes the
keyword/text from user and searches online for the truth
probability of the news. The third part provides the
authenticity of the URL input by user. In this paper, we
have used Python and its Sci-kit libraries [14]. Python has
a huge set of libraries and extensions, which can be easily
used in Machine Learning. Sci-Kit Learn library is the best
source for machine learning algorithms where nearly all
types of machine learning algorithms are readily available
for Python, thus easy and quick evaluation of ML ii) Dynamic SearchThe second search field of the site
algorithms is possible. We have used Django for the web asks for specific keywords to be searched on the net upon
based deployment of the model, provides clientside which it provides a suitable output for the percentage
implementation using HTML, CSS and Javascript. We probability of that term actually being present in an article
have also used Beautiful Soup (bs4), requests for online or a similar article with those keyword references in it. iii)
scrapping. URL SearchThe third search field of the site accepts a
specific website domain name upon which the
implementation looks for the site in our true sites database
or the blacklisted sites database. The true sites database
holds the domain names which regularly provide proper
and authentic news and vice versa. If the site isn‘t found
in either of the databases then the implementation
doesn‘tclassify the domain it simply states that the news
aggregator does not exist

IV. IMPLEMENTATION:
4.1 DATA COLLECTION AND ANALYSIS
We can get online news from different sources like social
media websites, search engine, homepage of news agency
websites or the fact-checking websites. On the Internet,
there are a few publicly available datasets for Fake news
classification like Buzzfeed News, LIAR [15], BS
B. System Architecturei) Static SearchThe architecture of Detector etc. These datasets have been widely used in
Static part of fake news detection system is quite simple different research papers for determining the veracity of
and is done keeping in mind the basic machine learning news. In the following sections, I have discussed in brief
process flow. The system design is shown below and self- about the sources of the dataset used in this work.
explanatory. The main processes in the design are Online news can be collected from different sources, such
as news agency homepages, search engines, and social
media websites. However, manually determining the
veracity of news is a challenging task, usually requiring
annotators with domain expertise who performs careful
analysis of claims and additional evidence, context, and
reports from authoritative sources. Generally, news data
with annotations can be gathered in the following ways:
Expert journalists, Fact-checking websites, Industry
detectors, and Crowd sourced workers. However, there
are no agreed upon benchmark datasets for the fake news
detection problem. Data gathered must be pre-processed-

© 2021, JOIREM |www.joirem.com| Page

4
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

that is, cleaned, transformed and integrated before it can basic pre processing was done on the News training data.
undergo training process [16]. The dataset that we used is This step was comprised of
explained below: Data Cleaning: While reading data, we get data in the
structured or unstructured format. A structured format has
LIAR: This dataset is collected from fact-checking a well defined pattern whereas unstructured data has no
website PolitiFact through its API [15]. It includes 12,836 proper structure. In between the 2 structures, we have a
human labelled short statements, which are sampled from semi-structured format which is a comparably better
various contexts, such as news releases, TV or radio structured than unstructured format. Cleaning up the text
interviews, campaign speeches, etc. The labels for news data is necessary to highlight attributes that we‘re going
truthfulness are fine-grained multiple classes: pants-fire, to want our machine learning system to pick up on.
false, barely-true, half-true, mostly true, and true. The Cleaning (or pre processing) the data typically consists of
data source used for this project is LIAR dataset which a number of steps:
contains 3 files with .csv format for test, train and A. Remove punctuation Punctuation can provide
validation. Below is some description about the data files grammatical context to a sentence which supports
used for this project. our understanding. But for our vectorizer which
1. LIAR: A Benchmark Dataset for Fake News counts the number of words and not the context,
Detection William Yang Wang, ―Liar, Liar it does not add value, so we remove all special
Pants on Fire‖: A New Benchmark Dataset for characters. eg: How are you?-
Fake News Detection, to appear in Proceedings >How are you
of the 55th Annual Meeting of the Association B. c) Remove stopwords Stopwords are common
for Computational Linguistics (ACL 2017), words that will likely appear in any text. They
short paper, Vancouver, BC, Canada, July 30- don‘t tell us much about our data so we remove
August 4, ACL. them. eg: silver or lead is fine for me-> silver,
2. Below are the columns used to create 3 lead, fine
datasets that C. d) Stemming Stemming helps reduce a word to
3. have been in used in this project- its stem form. It often makes sense to treat
4. Column1: Statement (News headline or text). related words in the same way. It removes
5. Column2: Label (Label class contains: True, suffices, like ―ing‖, ―ly‖, ―s‖, etc. by a simple
6. False) rule-based approach. It reduces the corpus of
7. The dataset used for this project were in csv words but often the actual words get neglected.
format eg: Entitling, Entitled -> Entitle. Note: Some
8. named train.csv, test.csv and valid.csv. search engines treat words with the same stem
9. 2. REAL_OR_FAKE.CSV we used this dataset for as synonyms
10. passive aggressive classifier. It contains 3 D. B. Feature Generation We can use text data to
columns viz generate a number of features like word count,
11. 1- Text/keyword, 2-Statement, 3-Label frequency of large words, frequency of unique
(Fake/True) words, n-grams etc. By creating a representation
of words that capture their meanings, semantic
4.2 DEFINITIONS AND DETAIL relationships, and numerous types of context
have a semi-structured format which is a comparably they are used in, we can enable computer to
better structured than unstructured format. Cleaning up understand text and perform Clustering,
the text data is necessary to highlight attributes that we‘re Classification etc [19]
going to want our machine learning system to pick up on. E. Vectorizing Data: Vectorizing is the process of
Cleaning (or pre processing) the data typically consists of encoding text as integers i.e. numeric form to
a number of steps: Social media data is highly create feature vectors so that machine learning
unstructured – majority of them are informal
algorithms can understand our data1.
communication with typos, slangs and bad-grammar etc.
Vectorizing Data: Bag-Of-Words Bag of Words
[17]. Quest for increased performance and reliability has
made it imperative to develop techniques for utilization (BoW) or CountVectorizer describes the
of resources to make informed decisions [18]. To achieve presence of words within the text data. It gives a
better insights, it is necessary to clean the data before it result of 1 if present in the sentence and 0 if not
can be used for predictive modelling. For this purpose, present. It, therefore, creates a bag of words
with a document matrix count in each text
document.

© 2021, JOIREM |www.joirem.com| Page

5
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

F. 2. Vectorizing Data: N-Grams N-grams are Note: Vectorizers outputs sparse matrices.
simply all combinations of adjacent words or Sparse Matrix is a matrix in which most entries
letters of length n that we can find in our source are 0 [21].
text. Ngrams with n=1 are called unigrams.
Similarly, bigrams (n=2), trigrams (n=3) and so
on can also be used. Unigrams usually don‘t B. Algorithms used for Classification This section
contain much information as compared to deals with training the classifier. Different
bigrams and trigrams. The basic principle behind classifiers were investigated to predict the class of
n-grams is that they capture the letter or word is the text. We explored specifically four different
likely to follow the given word. The longer the n- machine learning algorithms – Multinomial Naïve
gram (higher n), the more context you have to Bayes Passive Aggressive Classifier and Logistic
work with [20]. 1. Vectorizing Data: Bag-Of- regression. The implementations of these
Words Bag of Words (BoW) or CountVectorizer classifiers were done using Python library Sci-Kit
describes the presence of words within the text Learn.
data. It gives a result of 1 if present in the C. Brief introduction to the algorithms 1. Naïve
sentence and 0 if not present. It, therefore, creates Bayes Classifier: This classification technique is
a bag of words with a document matrix count in based on Bayes theorem, which assumes that the
each text document presence of a particular feature in a class is
G. 2. Vectorizing Data: N-Grams N-grams are independent of the presence of any other feature.
simply all combinations of adjacent words or It provides way for calculating the posterior
letters of length n that we can find in our source probability.
text. Ngrams with n=1 are called unigrams. D. P(c|x)= posterior probability of class given
Similarly, bigrams (n=2), trigrams (n=3) and so predictor P(c)= prior probability of class P(x|c)=
on can also be used. Unigrams usually don‘t likelihood (probability of predictor given class)
contain much information as compared to P(x) = prior probability of predictor
bigrams and trigrams. The basic principle behind E. 2. Random Forest: Random Forest is a trademark
n-grams is that they capture the letter or word is term for an ensemble of decision trees. In Random
likely to follow the given word. The longer the n- Forest, we‘ve collection of decision trees (so
gram (higher n), the more context you have to known as ―Forest‖). To classify a new object
work with [20]. based on attributes, each tree gives a classification
H. 3. Vectorizing Data: TF-IDF It computes and we say the tree ―votes‖ for that class. The
―relative frequency‖ that a word appears in a forest chooses the classification having the most
document compared to its frequency across all votes (over all the trees in the forest). The random
documents TF-IDF weight represents the relative forest is a classification algorithm consisting of
importance of a term in the document and entire many decisions trees. It uses bagging and feature
corpus [17]. TF stands for Term Frequency: It randomness when building each individual tree to
calculates how frequently a term appears in a try to create an uncorrelated forest of trees whose
document. Since, every document size varies, a prediction by committee is more accurate than that
term may appear more in a long sized document of any individual tree. Random forest, like its
that a short one. Thus, the length of the document name implies, consists of a large number of
often divides Term frequency. individual decision trees that operate as an
Note: Used for search engine scoring, text ensemble. Eachindividual tree in the random
summarization, document clustering. forest spits out a class prediction and the class with
IDF stands for Inverse Document Frequency: A the most votes becomes our model‘s prediction.
word is not of much use if it is present in all the The reason that the random forest model works so
documents. Certain terms like ―a‖, ―an‖, the‖, well is:
―on‖, ―of‖ etc. appear many times in a F. A large number of relatively uncorrelated models
document but are of little importance. IDF (trees) operating as a committee will outperform
weighs down the importance of these terms and any of the individual constituent models. So how
increase the importance of rare ones. The more does random forest ensure that the behaviour of
the value of IDF, the more unique is the word each individual tree is not too correlated with the
TF-IDF is applied on the body text, so the behaviour of any of the other trees in the model?
relative count of each word in the sentences is It uses the following two methods:
stored in the document matrix

© 2021, JOIREM |www.joirem.com| Page

6
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

G. 2.1 Bagging (Bootstrap Aggregation) — The extracted features are fed into different
Decisions trees are very sensitive to the data they classifiers. We have used Naive-bayes, Logistic
are trained on — small changes to the training set Regression, and Random forest classifiers from
can result in significantly different tree structures. sklearn. Each of the extracted features was used
Random forest takes advantage of this by allowing in all of the classifiers. Step 3: Once fitting the
each individual tree to randomly sample from the model, we compared the f1 score and checked the
dataset with replacement, resulting in different confusion matrix. Step 4: After fitting all the
trees. This process is known as bagging or classifiers, 2 best performing models were
bootstrapping. selected as candidate models for fake news
H. 2.2 Feature Randomness — In a normal decision classification. Step 5: We have performed
tree, when it is time to split a node, we consider parameter tuning by implementing
every possible feature and pick the one that GridSearchCV methods on these candidate
produces the most separation between the models and chosen best performing paramters for
observations in the left node vs. those in the right these classifier. Step 6: Finally selected model
node. In contrast, each tree in a random forest can was used for fake news detection with the
pick only from a random subset of features. This probability of truth. Step 7: Our finally selected
forces even more variation amongst the trees in and best
the model and ultimately results in lower C. It takes a news article as input from user then
correlation across trees and more diversification model is used for final classification output that
[22]. is shown to user along with probability of truth.
I. 3. Logistic Regression: It is a classification not a D. problem can be broken down into 3 statements
regression algorithm. It is used to estimate discrete 1) Use NLP to check the authenticity of a news
values (Binary values like 0/1, yes/no, true/false) article. 2) If the user has a query about the
based on given set of independent variable(s). In authenticity of a search query then we he/she can
simple words, it predicts the probability of directly search on our platform and using our
occurrence of an event by fitting data to a logit custom algorithm we output a confidence score.
function. Hence, it is also known as logit 3) Check the authenticity of a news source. These
regression. Since, it predicts the probability, its sections have been produced as search fields to
output values lies between 0 and 1 (as expected). take inputs in 3 different forms in our
Mathematically, the log odds of the outcome are implementation of the problem statement
J. . Passive Aggressive Classifier: The Passive E. 4.4 EVALUATION MATRICES
Aggressive Algorithm is an online algorithm; F. Evaluate the performance of algorithms for fake
ideal for classifying massive streams of data (e.g. news detection problem; various evaluation
twitter). It is easy to implement and very fast. It metrics have
works by taking an example, learning from it and G. been used. In this subsection, we review the most
then throwing it away [24]. Such an algorithm widely used metrics for fake news detection.
remains passive for a correct classification Most existing approaches consider the fake news
outcome, and turns aggressive in the event of a problem as a classification problem that predicts
miscalculation, updating and adjusting. Unlike whether a news article is fake or not: True
most other algorithms, it does not converge. Its Positive (TP): when predicted fake news pieces
purpose is to make updates that correct the loss, are actually classified as fake news; True
causing very little change in the norm of the Negative (TN): when predicted true news pieces
weight vector [25]. are actually classified as true news; False
K. 4.3 IMPLEMENTATION STEPS Negative (FN): when predicted true news pieces
A. Static Search Implementation In static part, we are actually classified as fake news; False
have trained and used 3 out of 4 algorithms for Positive (FP): when predicted fake news pieces
classification. They are Naïve Bayes, Random are actually classified as true news
Forest and Logistic Regression.
B. Step 1: In first step, we have extracted features Confusion Matrix: A confusion matrix is a table
from the already pre-processed dataset. These that is often used to describe the performance of
features are; Bag-of-words, Tf-Idf Features and a classification model (or ―classifier‖) on a set of
N-grams. Step 2: Here, we have built all the test data for which the true values are known. It
classifiers for predicting the fake news detection. allows the visualization of the performance of

7
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

an algorithm. A confusion matrix is a summary T Implementation was done using the above algorithms
with Vector features- Count Vectors and Tf-Idf vectors at
Word level and Ngram-level. Accuracy was noted for all
models. We used K-fold cross validation technique to
improve the effectiveness of the models.
This cross-validation technique was used for splitting the
dataset randomly into k-folds. (k-1) folds were used for
building the model while kth fold was used to check the
effectiveness of the model. This was repeated until each
of the k-folds served as the test set. I used 3-fold cross
validation for this experiment where 67% of the data is
used for training the model and remaining 33% for
testing.
B. Confusion Matrices for Static System After applying
various extracted features (Bag-of words, Tf-Idf. N-
grams) on three different classifiers (Naïve bayes,
Logistic Regression and Random Forest), their confusion
matrix showing actual set and predicted sets are
mentioned below:

Table 2: Confusion Matrix for Naïve Bayes Classifier

4.5 SNAPSHOTS OF SYSTEM WORKING using Tf-Idf features

Table 4: Confusion Matrix for Random Forest Classifier

using Tf-Idf features

V. RESULTS

8
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

As evident above our best model came out to be Logistic [cs.SI], 3 Sep 2017 [3] M. Granik and V. Mesyura, "Fake
Regression with an accuracy of 65%. Hence we then used news detection using naive Bayes classifier," 2017 IEEE
grid search parameter optimization to increase the First Ukraine Conference on Electrical and Computer
performance of logistic regression which then gave us the Engineering (UKRCON), Kiev, 2017, pp. 900-903. [4]
accuracy of 80%. Hence we can say that if a user feed a Fake news websites. (n.d.) Wikipedia. [Online].
particular news article or its headline in our model, there Available:
are 80% chances that it will be classified to its true nature. https://en.wikipedia.org/wiki/Fake_news_website.
Accessed Feb. 6, 2017 [5] Cade Metz. (2016, Dec. 16).
3. CONCLUSIONS The bittersweet sweepstakes to build an AI that destroys
In the 21st century, the majority of the tasks are done fake news. [6] Conroy, N., Rubin, V. and Chen, Y.
online. Newspapers that were earlier preferred as hard (2015). ―Automatic deception detection: Methods for
copies are now being substituted by applications like finding fake news‖ at Proceedings of the Association for
Facebook, Twitter, and news articles to be read online. Information Science and Technology, 52(1), pp.1-4. [7]
Whatsapp‘s forwards are also a major source. The Markines, B., Cattuto, C., & Menczer, F. (2009, April).
growing problem of fake news only makes things more ―Social spam detection‖. In Proceedings of the 5th
complicated and tries to change or hamper the opinion International Workshop on Adversarial Information
and attitude of people towards use of digital technology. Retrieval on the Web (pp. 41-48) [8] Rada Mihalcea ,
When a person is deceived by the real news two possible Carlo Strapparava, The lie detector: explorations in the
things happen- People start believing that their automatic recognition of[8] Rada Mihalcea , Carlo
perceptions about a particular topic are true as assumed. Strapparava, The lie detector: explorations in the
Thus, in order to curb the phenomenon, we have automatic recognition of deceptive language,
developed our Fake news Detection system that takes Proceedings of the ACL-IJCNLP [9] Kushal Agarwalla,
input from the user and classify it to be true or fake. To Shubham Nandan, Varun Anil Nair, D. Deva Hema,
implement this, various NLP and Machine Learning ―Fake News Detection using Machine Learning and
Techniques have to be used. The model is trained using Natural Language Processing,‖ International Journal of
an appropriate dataset and performance evaluation is also Recent Technology andEngineering (IJRTE) ISSN:
done using various performance measures. The best 2277-3878, Volume-7, Issue-6, March 2019 [10] H.
model, i.e. the model with highest accuracy is used to Gupta, M. S. Jamal, S. Madisetty and M. S. Desarkar, "A
classify the news headlines or articles. As evident above framework for real-time spam detection in Twitter," 2018
for static 10th International Conference on Communication
search, our best model came out to be Logistic Regression Systems & Networks (COMSNETS), Bengaluru, 2018,
with an accuracy of 65%. Hence we then used grid search pp. 380-383 [11] M. L. Della Vedova, E. Tacchini, S.
parameter optimization to increase the performance of Moret, G. Ballarin, M. DiPierro and L. de Alfaro,
logistic regression which then gave us the accuracy of "Automatic Online Fake News Detection Combining
75%. Hence we can say that if a user feed a particular Content and Social Signals," 2018 22nd Conference of
news article or its headline in our model, there are 75% Open Innovations Association (FRUCT), Jyvaskyla,
chances that it will be classified to its true nature. The user 2018, pp. 272-279. [12] C. Buntain and J. Golbeck,
can check the news article or keywords online; he can also "Automatically Identifying Fake News in Popular Twitter
check the authenticity of the website. The accuracy for Threads," 2017 IEEE International Conference on Smart
dynamic system is 93% and it increases with every Cloud (SmartCloud), New York, NY, 2017, pp. 208-215.
iteration. We intent to build our own dataset which will [13] S. B. Parikh and P. K. Atrey, "Media-Rich Fake
be kept up to date according to the latest news. All the News Detection: A Survey," 2018 IEEE Conference on
live news and latest data will be kept in a database using Multimedia Information Processing and Retrieval
Web Crawler and online database. VII. (MIPR), Miami, FL, 2018, pp. 436-441 [14] Scikit-
Learn- Machine Learning In Python [15] Dataset- Fake
News detection William Yang Wang. " liar, liar pants on
_re": A new benchmark dataset for fake news detection.
arXiv preprint arXiv:1705.00648, 2017[16] Shankar M.
REFERENCES Patil, Dr. Praveen Kumar, ―Data mining model for
[1] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and effective data analysis of higher education students using
Huan Liu, ―Fake News Detection on Social Media: A MapReduce‖ IJERMT, April 2017 (Volume-6, Issue-4).
Data Mining Perspective‖ arXiv:1708.01967v3 [cs.SI], 3 [17] Aayush Ranjan, ― Fake News Detection Using
Sep 2017 [2] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Machine Learning‖, Department Of Computer Science &
Tang, and Huan Liu, ―Fake News Detection on Social Engineering Delhi Technological University, July 2018.
Media: A Data Mining Perspective‖ arXiv:1708.01967v3

9
Journal Publication of International Research for Engineering and Management (JOIREM)
Volume: 05 Issue: 09 | Sept-2021

[18] Patil S.M., Malik A.K. (2019) Correlation Based

Real-Time Data Analysis of Graduate Students
Behaviour. In: Santosh K., Hegadi R. (eds) Recent Trends
in Image Processing and Pattern Recognition. RTIP2R
2018. Communications in Computer and Information
Science, vol 1037. Springer, Singapore. [19] Badreesh
Shetty, “Natural Language Processing (NLP) for machine
learning‖ at towardsdatascience, Medium. [20] NLTK
3.5b1 documentation, Nltk generate n gram [21] Ultimate
guide to deal with Text Data (using Python) – for Data
Scientists and Engineers by Shubham Jain, February 27,
2018 [22] Understanding the random forest by Anirudh
Palaparthi, Jan 28, at analytics vidya. 23] Understanding
the random forest by Anirudh Palaparthi, Jan 28, at
analytics vidya. [24]Shailesh-Dhama,―Detecting-Fake-
News-with Python‖, Github, 2019 [25] Aayush Ranjan,
― Fake News Detection Using Machine Learning‖,
Department Of Computer Science & Engineering Delhi
Technological University, July 2018. [26] What is a
Confusion Matrix in Machine Learning by Jason
Brownlee on November 18, 2016 in Code Algorithms
From Scratch

Full Stack
100% (1)
Full Stack
81 pages
Mold Design Using Creo Parametric 3
No ratings yet
Mold Design Using Creo Parametric 3
604 pages
Asymptotic Notation - Analysis of Algorithms
No ratings yet
Asymptotic Notation - Analysis of Algorithms
37 pages
Fake News Detection Using Deep Learning
No ratings yet
Fake News Detection Using Deep Learning
10 pages
CS610 MIDTERM SOLVED MCQS by JUNAID
83% (6)
CS610 MIDTERM SOLVED MCQS by JUNAID
33 pages
SER S Uide: Weather Sensor FD12P
No ratings yet
SER S Uide: Weather Sensor FD12P
154 pages
Autocad - Tutorial Auto Cad 2002 2D 3D
100% (4)
Autocad - Tutorial Auto Cad 2002 2D 3D
56 pages
Fake News Documentation
No ratings yet
Fake News Documentation
96 pages
Santhosh Kumar 2021 J. Phys. Conf. Ser. 1916 012235
100% (1)
Santhosh Kumar 2021 J. Phys. Conf. Ser. 1916 012235
10 pages
Transformer-Based Approach Fake News Detection Based On News Content and Social Contexts
No ratings yet
Transformer-Based Approach Fake News Detection Based On News Content and Social Contexts
28 pages
Fake News Detection On Social Media Using Machine Learning Report
100% (1)
Fake News Detection On Social Media Using Machine Learning Report
27 pages
Ieee Paper
No ratings yet
Ieee Paper
4 pages
Stec55x - Pilot Guide
No ratings yet
Stec55x - Pilot Guide
80 pages
Fake News
No ratings yet
Fake News
22 pages
Ttl1 Module
No ratings yet
Ttl1 Module
50 pages
IITKGP Induction Handbook
No ratings yet
IITKGP Induction Handbook
52 pages
Computer Organization Lab Manual
No ratings yet
Computer Organization Lab Manual
40 pages
Question #1: Ans: B
No ratings yet
Question #1: Ans: B
25 pages
HTML Script
No ratings yet
HTML Script
81 pages
Language Summary ? 2
No ratings yet
Language Summary ? 2
16 pages
174820-Fake News Detection Using Python
No ratings yet
174820-Fake News Detection Using Python
74 pages
Fake News Detection Using Machine Learning Report Final
No ratings yet
Fake News Detection Using Machine Learning Report Final
26 pages
Fake News Detector - Final Project Report - (154429, 160041026, 160041028) (2) - Md. Rabiul Alam, 160041026
No ratings yet
Fake News Detector - Final Project Report - (154429, 160041026, 160041028) (2) - Md. Rabiul Alam, 160041026
47 pages
Starcoder 2
No ratings yet
Starcoder 2
61 pages
Cambridge IGCSE ™: Mathematics 0580/12
No ratings yet
Cambridge IGCSE ™: Mathematics 0580/12
6 pages
Fake News Rumours Survey
No ratings yet
Fake News Rumours Survey
53 pages
M.Thasleemabanu Document
No ratings yet
M.Thasleemabanu Document
56 pages
02 - Bharghav Fake News Detection
No ratings yet
02 - Bharghav Fake News Detection
49 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
20 pages
Microcontroller Based Hand Gloves Dispenser Machine
No ratings yet
Microcontroller Based Hand Gloves Dispenser Machine
4 pages
Fake News, Disinformation and Misinformation in Social Media: A Review
No ratings yet
Fake News, Disinformation and Misinformation in Social Media: A Review
36 pages
Synthesis and Application of Nanomaterials With Future Perspectives
No ratings yet
Synthesis and Application of Nanomaterials With Future Perspectives
2 pages
2025 Review Publications
No ratings yet
2025 Review Publications
21 pages
Once Upon A Brand: Crafting Connections Through Storytelling in Modern Marketing
No ratings yet
Once Upon A Brand: Crafting Connections Through Storytelling in Modern Marketing
11 pages
3.efficient Fake New Detector
No ratings yet
3.efficient Fake New Detector
9 pages
A Taxonomy of Fake News Classification Techniques Survey and Implementation Aspects
No ratings yet
A Taxonomy of Fake News Classification Techniques Survey and Implementation Aspects
28 pages
Iot Based Substation Monitoring
No ratings yet
Iot Based Substation Monitoring
13 pages
Fake News Detection Using Machine Learning Report Final
No ratings yet
Fake News Detection Using Machine Learning Report Final
24 pages
Machine Learning Applications in Fake News Prediction
No ratings yet
Machine Learning Applications in Fake News Prediction
14 pages
Coding Guidelines
No ratings yet
Coding Guidelines
22 pages
ASC 4 Switchboard Data Sheet 4921240553 UK
No ratings yet
ASC 4 Switchboard Data Sheet 4921240553 UK
19 pages
Automatic Power Factor Detection and Correction Using Aurdino
No ratings yet
Automatic Power Factor Detection and Correction Using Aurdino
13 pages
Trends in Combating Fake News On Social Media A Survey
No ratings yet
Trends in Combating Fake News On Social Media A Survey
21 pages
Feasibility Study of Physical Properties On Concrete Beam Using With & Without FRP LAMINATE
No ratings yet
Feasibility Study of Physical Properties On Concrete Beam Using With & Without FRP LAMINATE
9 pages
Fake News Detection Using Hybrid Deep Learning Met
No ratings yet
Fake News Detection Using Hybrid Deep Learning Met
19 pages
Fake News Detection Paper
No ratings yet
Fake News Detection Paper
10 pages
Identifying Fake News Using Real Time Analytics
No ratings yet
Identifying Fake News Using Real Time Analytics
9 pages
Fake News Classification Using Machine Learning Techniques
No ratings yet
Fake News Classification Using Machine Learning Techniques
12 pages
Fake News Detection Using Recurrent Neural Network Based On Bidirectional LSTM and GloVe - 2024
No ratings yet
Fake News Detection Using Recurrent Neural Network Based On Bidirectional LSTM and GloVe - 2024
16 pages
An Intelligent Cybersecurity System For Detecting Fake News in Social Media Websites
No ratings yet
An Intelligent Cybersecurity System For Detecting Fake News in Social Media Websites
15 pages
1 s2.0 S2667096820300070 Main
No ratings yet
1 s2.0 S2667096820300070 Main
13 pages
01 Var2022 Strasser Si Ea Portfolio Pub
No ratings yet
01 Var2022 Strasser Si Ea Portfolio Pub
14 pages
Fake News Paper
No ratings yet
Fake News Paper
15 pages
Effectiveness of Artificial Intelligence in Stock Market Prediction Based On Machine Learning
No ratings yet
Effectiveness of Artificial Intelligence in Stock Market Prediction Based On Machine Learning
8 pages
Detection of Fake News Using Machine Learning Algorithms
No ratings yet
Detection of Fake News Using Machine Learning Algorithms
7 pages
Natural Language Processing Based Online Fake
No ratings yet
Natural Language Processing Based Online Fake
7 pages
Design and Development of 3D Printed Robotic Arm For Painting Application
No ratings yet
Design and Development of 3D Printed Robotic Arm For Painting Application
7 pages
A Smart System For Fake News Detection Using Machine Learning
No ratings yet
A Smart System For Fake News Detection Using Machine Learning
6 pages
A Study On Financial Statement Analysis With Reference To GVK Power & Infrastructure PVT LTD
No ratings yet
A Study On Financial Statement Analysis With Reference To GVK Power & Infrastructure PVT LTD
12 pages
Types of Computer Networks
No ratings yet
Types of Computer Networks
9 pages
Enhancing Construction Claims Analysis Using Computer Simulation
No ratings yet
Enhancing Construction Claims Analysis Using Computer Simulation
11 pages
Arti Research Paper Mca
No ratings yet
Arti Research Paper Mca
8 pages
Sima Final Paper
No ratings yet
Sima Final Paper
6 pages
Complexity - 2020 - Ahmad - Fake News Detection Using Machine Learning Ensemble Methods
No ratings yet
Complexity - 2020 - Ahmad - Fake News Detection Using Machine Learning Ensemble Methods
11 pages
Charging of Electric Vehicle Using Wireless Power Transmission With Renewable Energy Source
No ratings yet
Charging of Electric Vehicle Using Wireless Power Transmission With Renewable Energy Source
7 pages
Fake News Detection Using Machine Learning Algorithms
No ratings yet
Fake News Detection Using Machine Learning Algorithms
10 pages
Fake News Detection On Social Media: A Data Mining Perspective
No ratings yet
Fake News Detection On Social Media: A Data Mining Perspective
12 pages
Mitigation of Transient Over-Voltage in Micro Grid Using Supercapacitor Connected Statcom
No ratings yet
Mitigation of Transient Over-Voltage in Micro Grid Using Supercapacitor Connected Statcom
10 pages
A Smart System For Fake News Detection Using Machine Learning
No ratings yet
A Smart System For Fake News Detection Using Machine Learning
7 pages
Fake News Detection
No ratings yet
Fake News Detection
5 pages
Parking Space Detection
No ratings yet
Parking Space Detection
5 pages
Fake News Detection Journal
No ratings yet
Fake News Detection Journal
10 pages
Indian Air Quality Prediction and Analysis Using Machine Learning
No ratings yet
Indian Air Quality Prediction and Analysis Using Machine Learning
5 pages
Bioconf Iscku2024 00049
No ratings yet
Bioconf Iscku2024 00049
12 pages
enCORE v18 Brochure EUR JB67210XE 2 16028724799052147
No ratings yet
enCORE v18 Brochure EUR JB67210XE 2 16028724799052147
11 pages
Fake News Detection System by Manish Verma 16scse111009
No ratings yet
Fake News Detection System by Manish Verma 16scse111009
7 pages
Inductive Charging of Ultracapacitor Buses
No ratings yet
Inductive Charging of Ultracapacitor Buses
6 pages
A Review - Auto Temperature Detector For Entrance For Covid Safety
No ratings yet
A Review - Auto Temperature Detector For Entrance For Covid Safety
5 pages
Final Paper - SANTHOSH C
No ratings yet
Final Paper - SANTHOSH C
8 pages
Design of Agro Waste Bricks For Tribal Regions
No ratings yet
Design of Agro Waste Bricks For Tribal Regions
6 pages
Managing The Forces of Fragmentation
No ratings yet
Managing The Forces of Fragmentation
9 pages
Beyond News Contents: The Role of Social Context For Fake News Detection
No ratings yet
Beyond News Contents: The Role of Social Context For Fake News Detection
9 pages
Fake News Detection Using Machine (Review Paper)
No ratings yet
Fake News Detection Using Machine (Review Paper)
10 pages
Exhibit A
No ratings yet
Exhibit A
8 pages
An Intelligent System For Detecting Fake News
No ratings yet
An Intelligent System For Detecting Fake News
8 pages
Configuring The SMC Flex With The MicroLogix 1100 Via Modbus RTU
No ratings yet
Configuring The SMC Flex With The MicroLogix 1100 Via Modbus RTU
9 pages
Analysis of Property Valuation For Residential Building
No ratings yet
Analysis of Property Valuation For Residential Building
4 pages
Contactless Delivery Robot For Medical Application
No ratings yet
Contactless Delivery Robot For Medical Application
4 pages
Fake News Detection Using Source Information and Bayes Classifier
No ratings yet
Fake News Detection Using Source Information and Bayes Classifier
7 pages
Comparative Analysis For With & Without Elastomeric Base Isolated Structure Using E-Tabs
No ratings yet
Comparative Analysis For With & Without Elastomeric Base Isolated Structure Using E-Tabs
3 pages
Battery Temprature Control With Temprature Monitoring
No ratings yet
Battery Temprature Control With Temprature Monitoring
5 pages
Methods To Identify Fake News in Social Media Using Arti Cial Intelligence Technologies
No ratings yet
Methods To Identify Fake News in Social Media Using Arti Cial Intelligence Technologies
9 pages
Fictitious News Detection
No ratings yet
Fictitious News Detection
7 pages
Fin Irjmets1655633757
No ratings yet
Fin Irjmets1655633757
5 pages
Fake News Detection System Using LSTM and Tensorflow
No ratings yet
Fake News Detection System Using LSTM and Tensorflow
4 pages
A Smart System For Fake News Detection Using Machine Learning
No ratings yet
A Smart System For Fake News Detection Using Machine Learning
7 pages
Identifying Fake News in Real Time 230603 103213
No ratings yet
Identifying Fake News in Real Time 230603 103213
6 pages
Fake News Detection System To Identify Unreliable News in Social Media
No ratings yet
Fake News Detection System To Identify Unreliable News in Social Media
6 pages
Fake News Detection and Classification U
No ratings yet
Fake News Detection and Classification U
3 pages
Not Everything You Read Is True Fake News Detection Using Machine Learning Algorithms
No ratings yet
Not Everything You Read Is True Fake News Detection Using Machine Learning Algorithms
4 pages
Review Paper On Design and Analysis of Adaptive Headlight System
No ratings yet
Review Paper On Design and Analysis of Adaptive Headlight System
4 pages
Sieving Fake News From Genuine: A Synopsis: Shahid Alam Abdulaziz Ravshanbekov
No ratings yet
Sieving Fake News From Genuine: A Synopsis: Shahid Alam Abdulaziz Ravshanbekov
5 pages
1ST Periodical Exam-Ict 12
No ratings yet
1ST Periodical Exam-Ict 12
5 pages
College Bus Tracking and Notification System
No ratings yet
College Bus Tracking and Notification System
4 pages
Week 1 Handouts
No ratings yet
Week 1 Handouts
3 pages
Vol 11 2
No ratings yet
Vol 11 2
4 pages
Fake News Detection Using Machine Learning and Natural Language Processing
No ratings yet
Fake News Detection Using Machine Learning and Natural Language Processing
4 pages
5 Upcoming Programs - India International Centre
No ratings yet
5 Upcoming Programs - India International Centre
2 pages
Workshop Layout: Teaching / Learning Areas Size Area Total Area
No ratings yet
Workshop Layout: Teaching / Learning Areas Size Area Total Area
3 pages
40 Gbps QSFP Cables Ds
No ratings yet
40 Gbps QSFP Cables Ds
2 pages
Disinformation and electoral campaigns
From Everand
Disinformation and electoral campaigns
Yves-Marie Doublet
No ratings yet

Fake News Detection Using Machine Learning Algorithm

Uploaded by

Fake News Detection Using Machine Learning Algorithm

Uploaded by

Journal Publication of International Research for Engineering and Management (JOIREM)

Volume: 05 Issue: 09 | Sept-2021

Fake News Detection using Machine Learning Algorithm

1. Author Name:Prof.Naresh Thoutam

Sandip Institude of Technology And Research Centre,Nashik

As an increasing amount of our lives is spent interacting

© 2021, JOIREM |www.joirem.com| Page

© 2021, JOIREM |www.joirem.com| Page

Illustrations of fake news detection by analysing the

© 2021, JOIREM |www.joirem.com| Page

© 2021, JOIREM |www.joirem.com| Page

© 2021, JOIREM |www.joirem.com| Page

© 2021, JOIREM |www.joirem.com| Page

Table 2: Confusion Matrix for Naïve Bayes Classifier

Table 4: Confusion Matrix for Random Forest Classifier

© 2021, JOIREM |www.joirem.com| Page

© 2021, JOIREM |www.joirem.com| Page

[18] Patil S.M., Malik A.K. (2019) Correlation Based

© 2021, JOIREM |www.joirem.com| Page

You might also like