Fake News Detection by Using Machine Learning
Fake News Detection by Using Machine Learning
Learning.
A Project Report
Submitted in partial fulfillment for the requirements of the award of the
degree
Of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
.
Supervised by
Session 2021-22
DECLARATION
We declare that
a. The work contained in this report is original and has been done by us under the guidance
of our supervisor.
b. The work has not been submitted to any other institute for any degree or diploma.
c. We have followed the guidelines provided by the institute to prepare the report.
d. We have conformed to the norms and guidelines given in the ethical code of conduct
of the institute.
e. Wherever we have used materials (data, theoretical analysis, figures and text) from
other sources, we have given due credit to them by citing them in the text of the report
and giving their details in the references.
i
CERTIFICATE
This is to certify that the project Report entitled, “Fake news detection by using
Machine Learning” submitted by Shivam Tripathi, Rakesh Gupta, Shivam Tayal, in
the Department of Information Technology of KIET Group of Institutions, Ghaziabad,
affiliated to Dr. A. P. J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh,
India, is a record of bona fide project work carried out by them under my supervision
and guidance and is worthy of consideration for the award of the degree of Bachelor of
Technology in Information Technology of the Institute.
Signature of Supervisor:
ii
List of Figures
1.1 It shows survey results about COVID-19 and fake news in India (2020).
1.2 Shows of the perceived multidimensional of fake news (left) and real news (right),
averaged and per individual item. Error bars show 95% confidence intervals [12].
iii
List of Tables
1.1 Shows the inductive typology claims regarding Covid-19 related misinformation.
iv
List of Acronyms
ML MACHINE LEARNING
LR LOGISTIC REGRESSION
NN NEURAL NETWORK
RF RANDOM FOREST
v
CONTENT
PAGE No.
Declaration i
Certificate ii
List of Tables iv
List of Acronyms v
Abstract vii
CHAPTER 1: Introduction 1
1.1 Methodology 6
1.2 APPROACH 7
1.2.1 NAÏVE BAYES 7
1.2.2 SUPPORT VECTOR MACHINE 12
1.2.3 LOGISTIC REGRESSION 15
CHAPTER 4: RESULT 28
CHAPTER 5: CONCLUSION 30
REFERENCE 31
APPENDICES:
vi
Abstract
Fake information is all around us whether we can identify it or not. Individuals and
organizations publish fake news all the time to override the unfavorable truths. A good example
of fake news is Covid -19 Vaccine Before the vaccine came out huge amounts of fake news
and altered images were circulating on the internet.
There were some sources stated that there was already a fully effective vaccine available, some
stated that it was coming very soon, and other stated that it would take decade for safe and
functional one to be released but trusting and following the wrong sources can lead to harm
than good. This paper takes a look at the application of Support Vector Machine, Logistic
Regression, Naïve bays Learning techniques to identify fake news accurately.
vii
CHAPTER 1
Introduction
The main motive behind creating a fake news is largely to mislead people by
making them fall prey to a range of hoaxes, propaganda and inaccurate
information. There are articles that are either completely false or simply random
opinion of single one presented as news.
Nowadays all the key platform of social media, like Facebook, Twitter,
WhatsApp and Reddit are spreading fake news rapidly. In our research paper we
propose a technique for identification of the fake news employing a few ML
methods like Naïve-Bayes, SV-Machine, and Logistic-Regression.
Page 1 of 42
A good example of fake news is the Covid-19 vaccine. Prior to the launch of this
vaccine, a large amount of fake news and altered images were widespread on the
Internet.
Some sources say that a fully effective vaccine is available, some say it's coming
soon, and some say it will take 10 years for a safe and effective vaccine to be
released. But trusting and following bad sources can do more harm than good.
According to the study, about thirty per-cent of Indians used WhatsApp for
COVID-19 hoax, and only about half of those who used WhatsApp for COVID-
19 fact-checked fewer than half of the texts before sharing them. Even more
shocking, 13% of respondents indicated they never fact-checked communications
before forwarding them on. Individuals and organizations are constantly posting
fake news to avoid unnecessary facts
The study too noticed age groupings, finding those beyond 65 were more likely
to hear disinformation, as well as believe and act on it, while those under the age
of 25 were the least likely. This resulted that between twenty four and twenty
seven per-cent of participants hold that they had contemplated using COVID-19
therapies that were herbal, Ayurveda, or homoeopathic.
Seven to eight percent stated they had tried them, while twelve percent said they
had tried home treatments.
Even though an attached link or reference of a source does not necessarily make
a claim authentic, three-quarters of Indians believe it makes a message more
trustworthy. Only a third of Indians said they believed signals from foreign
countries.
Page 2 of 42
Figure 1.1 shows survey results about COVID-19 and fake news in India (2020)
Figure 1.1
Page 3 of 42
Table 1.1shows the inductive typology claims regarding Covid-19 related
misinformation [11].
Table 1.1
Page 4 of 42
Figure 2 shows the perceived manipulativeness of fake news (left) and real news
(right), averaged and per individual item. Error bars show 95% confidence
intervals [12].
Figure 1.2
Page 5 of 42
1.1 METHODOLOGY
Artificial Intelligence (AI) was used to create this as the deadline approaches,
it is critical to place precise orders rather than distinguishing between the real
and the counterfeit, using computations that aren't able to reflect a person's
feelings capacities.
1.2 Approach
The reason behind this is the proposed technique is a combination of Naïve Bayes,
Support vector machine and Logistic Regression. Artificial Intelligence (AI) was
used to create this as the deadline approaches, it is critical to place precise orders
rather than distinguishing between the real and the counterfeit, using
computations that aren't able to reflect a person's feelings capacities.
The three-part strategy is a hybrid between the calculations of Machine Learning
divide into procedures for supervised learning, and preparing a distinctive
language technique.
Eq. (1)
Eq. (2)
Given Eq 1 and 2 help in the classifications. Naïve Bayesian models are easy to
build and particularly useful for small & medium sized data sets (the one used
in this article is evidence of that!). Along with simplicity, Naïve Bayes is known
to outperform even highly sophisticated classification methods.
Rather than attempting to calculate the values of each attribute value P (d1, d2,
d3|h), they are assumed to be conditionally independent given the target value
and calculated as P (d1|h) * P (d2|H) and so on.
Page 7 of 42
Given a naïve Bayes model, you can make predictions for new data using Bayes
theorem.
Using our example above, if we had a new instance with the weather of sunny,
we can calculate:
We can choose the class that has the largest calculated value. We can turn these
values into probabilities by normalizing them as follows:
If we had more input variables, we could extend the above example. For
example, pretend we have a “car” attribute with the values “working” and
“broken “. We can multiply this probability into the equation.
For example, below is the calculation for the “go-out” class label with the
addition of the car input variable set to “working”:
Page 8 of 42
Figure 1.3 shows the classification of naïve bayes, X1,X2,…………….Xn are
probabilistic events.
Figure 1.3
Page 9 of 42
Types of Naive Bayes Classifier
Like the multinomial model, this model is popular for document classification
tasks, where binary term occurrence (i.e. a word occurs in a document or not)
features are used rather than term frequencies (i.e. frequency of a word in the
document).
In Gaussian Naïve Bayes, continuous values associated with each feature are
assumed to be distributed according to a Gaussian distribution (Normal
distribution).
Page 10 of 42
When plotted, it gives a bell-shaped curve which is symmetric about the mean
of the feature values as shown in Figure 1.4.
Figure 1.4
Page 11 of 42
Eq. (3)
The Support Vector Machine (SVM) was created with the goal of determining
the best boundary for classifying positive and negative data points. The original
data point can be mapped into a high-dimensional.
Vector space using a well-defined kernel function, allowing features to be
extracted; thus, SVM is regarded as an important machine learning technique of
regression and classification. [9]
According to the SVM algorithm we find the points closest to the line from both
the classes. These points are called support vectors. Now, we compute the
distance between the line and the support vectors. This distance is called the
margin. Our goal is to maximize the margin. The hyper plane for which the
margin is maximum is the optimal hyper plane.
This data is clearly not linearly separable. We cannot draw a straight line that can
classify this data. But, this data can be converted to linearly separable data in
Page 12 of 42
higher dimension. Let’s add one more dimension and call it z-axis. Let the co-
ordinates on z-axis be governed by the constraint shown by Eq. 4,
Now the data is clearly linearly separable. Let the purple line separating the
data in higher dimension be z=k, where k is a constant. Since, z=x²+y² we
get x² + y² = k; which is an equation of a circle. So, we can project this linear
separator in higher dimension back in original dimensions using this
transformation.
For example let’s assume a line to be our one dimensional Euclidean space (i.e.
let’s say our datasets lie on a line). Now pick a point on the line, this point divides
the line into two parts. The line has 1 dimension, while the point has 0 dimensions.
So a point is a hyper plane of the line.
Kernel functions
Linear
These are commonly recommended for text classification because most of these
types of classification problems are linearly separable.
The linear kernel works really well when there are a lot of features, and text
classification problems have a lot of features. Linear kernel functions are faster
than most of the others and you have fewer parameters to optimize.
Page 13 of 42
In this equation (5), w is the weight vector that you want to minimize, X is the
data that you're trying to classify, and b is the linear coefficient estimated from
the training data. This equation defines the decision boundary that the SVM
returns.
Figure 1.5
Page 14 of 42
Figure 1.6
Figure 1.6 shows an example of classification of two different samples. Class A and
Class B are respective datasets.
Page 15 of 42
The binary logistic model classifies specimen into two classes, whereas the
multinomial logistic model extends this to an arbitrary number of classes without
ordering them.
The mathematics of logistic regression rely on the concept of the “odds” of the
event, which is a probability of an event occurring divided by the probability of
an event not occurring.
Just as in linear regression, logistic regression has weights associated with
dimensions of input data. In contrary to linear regression, the relationship
between the weights and the output of the model (the “odds”) is exponential, not
linear.
Another excellent method for categorizing issues is logistic regression. The term
"logistic regression" talks about the use to solve classification hurdles. Explain
below is function used to depict a variable in logistic regression.
The main difference amongst linear and logistic is that later has only range (0, 1).
In addition, unlike linear regression, logistic do not require a linearity between
the I/O variables. It is accurate and for better clarification refer to Eq. (6).
Eq. (6)
In given Eq. 6 x is the input variable in the logistic function equation. Let's input
the logistic function values ranging from 20 to 20.
Page 16 of 42
Figure 1.7
Confusion matrix is one of the easiest and most intuitive metrics used for finding
the accuracy of a classification model, where the output can be of two or more
categories.
This is the most popular method used to evaluate logistic regression. Below is
the confusion matrix for logistic regression.
Page 17 of 42
True positive is nothing but the case where the actual value as well as the
predicted value are true. The patient has been diagnosed with cancer, and the
model also predicted that the patient had cancer.
In False negative, the actual value is true, but the predicted value is false, which
means that the patient has cancer, but the model predicted that the patient did not
have cancer.
This is the case where the predicted value is true, but the actual value is false.
Here, the model predicted that the patient had cancer, but in reality, the patient
doesn’t have cancer. This is also known as Type 1 Error.
Figure 1.8
Page 18 of 42
And ordinal logistic regression deals with three or more classes in a
predetermined order.
Figure 1.9
Multinomial logistic regression is a model where there are multiple classes that
an item can be classified as. There is a set of three or more predefined classes set
up prior to running the model.
Ordinal logistic regression is also a model where there are multiple classes that
an item can be classified as; however, in this case an ordering of classes is
required. Classes do not need to be proportionate. The distance between each
class can vary.
Page 19 of 42
CHAPTER 2
System Architecture
First, we gather news from various sources that is being circulating in media
Then we classify the news as social, controversial etc. tags depending on the
Nature of news. Then we performed data analysis by using algorithms like SVI,
Naïve Bayes logistic regression.
Figure 2.1 shows the flow of system methodology where Figure 2.2 is describing
the proposed model architecture and classified each domain role briefly.
Page 20 of 42
Figure 2.2 Proposed Model
This is the overall system architecture of our model which help us to determine
and news filtrations.
Page 21 of 42
` CHAPTER 3
Literature Review
Many automatic detection algorithms for hoaxes have been described in the
literature. Since there are numerous hoaxes, ranging from Chabot’s to promote
misinformation to the use of click baits to spread rumors, many clickable links
are there on platforms such as Facebook, which encourage people to share and
like postings, spreading false information. There has been a lot of effort put
towards detecting fake data.
In this paper, Shivam B. Parikh and Pradeep K. Atrey (2018) [2] developed a
Nave-Bayes classifier based on the idea that fakes are more likely to be detected.
News pieces frequently utilize the same set of terms, whereas actual news uses a
unique set of words. The accuracy score of the overall. The accuracy of the model
utilizing the nave-bays classifier was around 70%.
In their study, Mykhailo Granik et al. [3] provide a simple strategy for detecting
fake news using a naïve-Bayes. This method was turned into a software system
and put to the test on a collection of Facebook news posts.
They came from three major Facebook pages on the right and left, as well as
three major mainstream political news sites (Politico, CNN, and ABC News).
They were able to attain a classification accuracy of around 74%. The accuracy
of fake news classification is slightly lower.
This could be due to the dataset's skewness: only 4.9 percent of it is bogus news.
During and during the 2016 U.S. presidential election, the authors of [4] saw
frequency or retweeting of roughly 14 million data is about 400,000 times on
Twitter. Bots are running for president and will be elected. If there are methods
to categories the posts spread by bots.
Page 22 of 42
Linguistic Cue Approaches with Machine Learning, BOW Approach, Rhetorical
Structure and Discourse Analysis are all described by the authors in [5].
The authors of [6] suggested a Fake detector automatic hoax spotting model
depending on textual categorization that uses an extensive diffusive network
model to simultaneously grasp the depiction of hoax, authors, and subjects. Fake
Detector covers two primary components: representation feature learning and
credibility label inference, which will be combined to form the Fake Detector
deep diffusive network model.
The authors of [7] have developed a model that uses Machine Learning
techniques. They used a variety of machine learning algorithms to improve
accuracy, including the Linear Support Vector Machine, Multilayer Perceptron,
Random Forest Algorithm, and KNN.
Using random forest, the author was able to obtain a 98 percent accuracy. In this
research paper, the author's goal is to examine small sentences and news in
concern and construct the reliability count with the news by putting serially
feature extraction and credibility scores reliability count with the news by putting
serially feature extraction and credibility scores in this research paper, the author's
goal is to examine small sentences and news in concern and construct the
reliability count with the news by putting serially feature extraction and
credibility scores in this research paper, the author's goal is to examine small
sentences and news in concern and construct the reliability count.
Page 23 of 42
Figure 3.1 shows characteristic observed authentic news.
Figure 3.1
By analyzing the bulk of data related true news it is observe the presence of
distinct patterns in fake hoxes It takes into account the number of words in an
article of authentic news with title of the news.
Page 24 of 42
By analyzing the bulk of data related false news it is observe the presence of
distinct patterns in fake hoxes. It takes into account the number of words in an
article of authentic news with title of the news.
Figure 3.2
It takes into account the number of words in an article of authentic news with title
of the news.
Figure 3.2
Page 25 of 42
Table 3.1 summarizes the related works [09][14][15].
Page 26 of 42
CHAPTER 4
Result
For the implementation purpose, the 3 existing approaches are considered. The
dataset [10] comprises 4897 rows with labeling 1 as true and 0 as false id
represents the id of text and text column comprises text of articles. Figure 4.1 and
Figure 4.2 shows the implementation.
Figure 4.1
Page 27 of 42
Figure 4.2
Page 28 of 42
Table 4.3 shows algorithm accuracy Output of models
Page 29 of 42
CHAPTER 5
CONCLUSION
Finding the accuracy of news that is available on the internet is critical. The
components for spotting fake news are listed in the study discussed.
A realization that not all, if not all, are phony Web-based networking will spread
the word about the media. Currently, the recommended solution is being tested.
SVM and the Naïve Bayes classifier technique for Logistic Regression are
employed. The resulting algorithm could be useful in the future.
Hybrid techniques yield superior results when the same goal is achieved. As
previously described, the technology detects bogus news based on the models that
were used.
It also supplied some assistance and suggested breaking news on the subject,
which is highly interesting. Any user will find this handy. The efficiency of the
system will improve in the future, and the prototype's precision can be improved
to a certain extent, as well as the user interface.
Page 30 of 42
REFRENCES
[6] S. I. Manzoor, J. Singla, and Nikita, “Fake news detection using machine learning
approaches: A systematic review,” in Proceedings of the International Conference on Trends
in Electronics and Informatics, ICOEI 2019, Apr. 2019.
[7] I. Ahmad, M. Yousaf, S. Yousaf, and M. O. Ahmad, “Fake News Detection Using
Machine Learning Ensemble Methods,” Complexity, vol. 2020, 2020.
Page 31 of 42
[8] P. Kaviani and M. S. Dhotre, “Short Survey on Naive Bayes Algorithm,”
International Journal of Advance Engineering and Research Development, vol. 4, no. 11,
2017.
[9] A. Jain, A. Shakya, H. Khatter, and A. K. Gupta, “A smart System for Fake News
Detection Using Machine Learning,” Sep. 2019.
[11] Felix Simon, Dr. Philip N, Howard, Prof Rasmus Kelis Nielsen, “Types, sources, and
claims of COVID-19 misinformation”, 7 april 2020
[12] Melisa Basol, Manon Berrich, Fatih Uenal, Sander Van Der Linden, “Towards
psychological herd immunity: Cross-cultural evidence for two prebunking interventions
against COVID-19 misinformation”, may 2021
[14] S. B. Parikh, V. Patil, and P. K. Atrey, “On the Origin, Proliferation and Tone of
Fake News,” Proc. - 2nd Int. Conf. Multimed. Inf. Process. Retrieval, MIPR 2019, pp. 135–
140, 2019.
Page 32 of 42
APPENDIX A
Authors: Shivam Tripathi, Ajay Agarwal, Shivam Tayal and Rakesh Gupta
Title: Survey paper on Fake News Detection Using Machine Learning
Number : 23
Authors: Shivam Tripathi, Ajay Agarwal, Shivam Tayal and Rakesh Gupta
Title: Research paper on Fake News Detection Using Machine Learning
Number: 6713
Page 33 of 42
APPENDIX B
Page 34 of 42
APPENDIX C
[email protected] [email protected]
[email protected] [email protected]
Abstract: Fake information is all around us The main motive behind creating a fake news is largely to
whether we can identify it or not. Individuals and mislead people by making them fall prey to a range of
organization publish fake news all the time to override hoaxes, propaganda and inaccurate information. There are
the unfavorable truths. A good example of fake news is articles that are either completely false or simply random
Covid -19 Vaccine Before the vaccine came out huge opinion of single one presented as news. Now a days all
amounts of fake news and altered images were the key platform of social media, like Facebook, Twitter,
circulating on the internet. There were some sources WhatsApp and Reddit are spreading fake news rapidly. In
stated that there was already a fully effective vaccine our research paper we propose a technique for
available, some stated that it was coming very soon, identification of the fake news employing a few ML
and other stated that it would take decade for safe and methods like Naïve-Bayes, SV-Machine, and Logistic-
functional one to be released but trusting and following Regression
the wrong sources can lead to harm than good. This The types of fake news are as follows: -
paper takes a look at the application of Support Vector
1.For the sake of politics.
Machine, Logistic Regression, Naïve bayes Learning
techniques to identify fake news accurately. Our 2.For unrelated stuff, use a fictitious image.
research looks into many textual qualities that can be
used to tell the difference between bogus and genuine 3.Content that is completely unfounded.
content. We train a variety of machine learning
algorithms using various ensemble approaches and 4.The IT cell spreads rumors.
then evaluate their performance on real-world
datasets using those properties. According to 5.Religious content that is deceptive.
experimental results, the performance of our
suggested ensemble learner strategy is superior to that As we all know, there was a lot of misinformation
of individual learners. surrounding COVID-19 during the pandemic.
A new wave of COVID-19 poured across India, bringing
Keywords-Fake News; clickbait’s; social media, with it a new flood of fake news. A scientific study [1]
issued in Journal of Medical Research by medical
Classification, Machine learning.
practitioners from Rochester, New York, and Pune- India,
supplies perception in conduct of Indian cyberspace
I. Introduction:
Page 35 of 42
customers during the epidemic on social media, a major Table 1
origin of coronavirus-19 hoaxes in the nation.
According to the study, about thirty per-cent of Indians
used WhatsApp for COVID-19 hoax, and only about half
of those who used WhatsApp for COVID-19 fact-checked
fewer than half of the texts before sharing them. Even
more shocking, 13% of respondents indicated they never
fact-checked communications before forwarding them on.
The study too noticed age groupings, finding those beyond
65 were more likely to hear disinformation, as well as
believe and act on it, while those under the age of 25 were
the least likely. This resulted that between twenty-four and
twenty-seven per-cent of participants hold that they had
contemplated using COVID-19 therapies that were herbal,
ayurvedic, or homoeopathic. Seven to eight percent stated
they had tried them, while twelve percent said they had
tried home treatments.
Figure 2
Figure 2 shows of the perceived manipulativeness of fake
news (left) and real news (right), averaged and per
individual item. Error bars show 95% confidence intervals
[12].
Page 2 of 42
same set of terms, whereas actual news uses a unique set that uses a extensive diffusive network model to
of words. The accuracy score of the overall. The accuracy simultaneously grasp the depiction of hoax, authors, and
of the model utilizing the nave-bayes classifier was around subjects. Fake Detector covers two primary components:
70%. representation feature learning and credibility label
inference, which will be combined to form the Fake
In their study, Mykhailo Granik et al. [3] provide a simple Detector deep diffusive network model. Good deception
strategy for detecting fake news using a naïve-Bayes. This modelling algorithms (bag of words, rhetorical algorithm)
method was turned into a software system and put to the give robust and solid architecture, and the author focuses
test on a collection of Facebook news posts. They came on specialized linguistics content.
from three major Facebook pages on the right and left, as
well as three major mainstream political news sites The authors of [7] have developed a model that uses
(Politico, CNN, and ABC News). They were able to attain Machine Learning techniques. They used a variety of
a classification accuracy of around 74%. The accuracy of machine learning algorithms to improve accuracy,
fake news classification is slightly lower. This could be including the Linear Support Vector Machine, Multilayer
due to the dataset's skewness: only 4.9 percent of it is Perceptron, Random Forest Algorithm, and KNN. Using
bogus news. random forest, the author was able to obtain a 98 percent
accuracy. In this research paper, the author's goal is to
During and during the 2016 U.S. presidential election, the examine small sentences and news in concern and
authors of [4] saw frequency or retweeting of roughly 14 construct the reliability count with the news by putting
million data is about 400,000 times on Twitter. Bots are serially feature extraction and credibility scores reliability
running for president and will be elected. The If there are count with the news by putting serially feature extraction
methods to categories the posts spread by bots. and credibility scores in this research paper, the author's
goal is to examine small sentences and news in concern
Linguistic Cue Approaches with Machine Learning, BOW and construct the reliability count with the news by putting
Approach, Rhetorical Structure and Discourse Analysis serially feature extraction and credibility scores in this
are all described by the authors in [5]. research paper, the author's goal is to examine small
sentences construct the reliability count.
The authors of [6] suggested a Fake detector automatic
hoax spotting model depending on textual categorization
III. Methodology:
III.1 Approach
Recognizing the category of news is difficult due to the
multi-dimensional nature of fake news. It is self-evident Eq. (1)
that a realistic approach is required. To be effective, a
technique must include a variety of viewpoints precisely We can use highest likelihood or MAP estimation to
deal with the problem The reason behind this is the estimate these parameters from (labelled) data. We can
proposed technique is a combination of Naïve bayes, label new examples after studying a Naive classifier from
Support vector machine and Logistic Regression. input by determining the class label c* with the highest
Artificial Intelligence (AI) was used to create this as the anterior probability by monitoring sx1..., sxn.
deadline approaches, it is critical to place precise orders
rather than distinguishing between the real and the
counterfeit, using computations that aren't able to reflect a
person's feelings capacities. The three-part strategy is a
Eq. (2)
hybrid between the calculations of Machine Learning
Graphical representation of above equation is as follows:
divide into procedures for supervised learning, and
preparing a distinctive language technique.
II.1.1 Naive Bayes:
Page 3 of 42
X is the input variable in the logistic function equation.
Let's input the logistic function values ranging from 20 to
20.
Figure 3
Figure 5
Figure 4
Page 2 of 42
Figure 6. Describe the Proposed System Methodology
This is the overall system architecture of our model which
Figure 6 shows the flow of system methodology where help us to determine and news filtrations.
Figure 7 is describing the proposed model architecture and
classified each domain role briefly. V. IMPLEMENTATION AND RESULTS
Figure 9
Figure 7. Proposed Model
SUPPOR .94
VECTOR
MACHINE
NAÏVE .81
BAYES
LOGISTIC .85
REGRESSIO
Figure 10
N
Page 3 of 42
V. CONCLUSION
Finding the accuracy of news that is available on the
internet is critical. The components for spotting fake
Confusion Matrix: news is listed in the study discussed. A realization
that not all, if not all, are phony Web-based
networking will spread the word about the media.
Currently, the recommended solution is being tested.
SVM and the Naïve Bayes classifier technique for
Logistic Regression are employed. The resulting
algorithm could be useful in the future. Hybrid
techniques yield superior results when the same goal
is achieved. Any user will find this handy. The
efficiency of the system will improve in the future,
and the prototype's precision can be improved to a
certain extent, as well as the user interface. There are
numerous outstanding issues in the detection of fake
news that researchers must address. Identifying
essential aspects involved in the distribution of news,
for example, is a vital step in reducing the spread of
Figure 11
fake news. Graph theory and machine learning
approaches can be used to identify the primary
sources engaged in the dissemination of fake news.
Real-time fake news detection in videos could also
be a promising future option.
References
[1] J. A. Bapaye and A. Bapaye, “Impact of WhatsApp as a [6] [6] S. I. Manzoor, J. Singla, and Nikita, “Fake news
Source of Misinformation for the Novel Coronavirus detection using machine learning approaches: A
Pandemic in a Developing Country: Cross-Sectional systematic review,” in Proceedings of the International
Questionnaire Study”, Jun. 2021 Conference on Trends in Electronics and Informatics,
ICOEI 2019, Apr. 2019.
[2] S. B. Parikh and P. K. Atrey, “Media-Rich Fake News
Detection: A Survey,” in Proceedings - IEEE 1st [7] [7] I. Ahmad, M. Yousaf, S. Yousaf, and M. O. Ahmad,
Conference on Multimedia Information Processing and “Fake News Detection Using Machine Learning
Retrieval, MIPR 2018, Jun. 2018 Ensemble Methods,” Complexity, vol. 2020, 2020.
[3] M. Granik and V. Mesyura, “FAKE STATEMENTS [8] [8] P. Kaviani and M. S. Dhotre, “Short Survey on Naive
DETECTION WITH ENSEMBLE OF MACHINE Bayes Algorithm,” International Journal of Advance
LEARNING ALGORITHMS,” Problems of Information Engineering and Research Development, vol. 4, no. 11,
Technology, vol. 09, no. 2, pp. 48–52, Jul. 2018 2017.
[4] C. Shao, G. L. Ciampaglia, O. Varol, K. Yang, A. [9] [9] A. Jain, A. Shakya, H. Khatter, and A. K. Gupta, “A
Flammini, and F. Menczer, “The spread of low-credibility smart System for Fake News Detection Using Machine
content by social bots,” Jul. 2017. Learning,” Sep. 2019.
[5] N. J. Conroy, V. L. Rubin, and Y. Chen, “Automatic [10 [10] Dataset link-
Deception Detection: Methods for Finding Fake News,” https://www.kaggle.com/c/fakenewskdd2020/data?select
2015. =train.csv access date 2 march 2022
Page 2 of 42
[11] Felix Simon, Dr. Philip N, Howard, Prof Rasmus Kelis [12 [12] Melisa Basol , Manon Berrich, Fatih Uenal, Sander
Nielsen , “Types, sources, and claims of COVID-19 Van Der Linden, “ Towards psychological herd immunity:
misinformation”, 7 april 2020 Cross-cultural evidence for two prebunking interventions
against COVID-19 misinformation”, may 2021
Page 1 of 42
Page 1 of 42