0% found this document useful (0 votes)
163 views

Fake News Detection System Using Machine Learning

Expansion of deluding data in ordinary access news sources, for example, web-based media channels, news web journals, and online papers have made it testing to distinguish reliable news sources, hence expanding the requirement for computational apparatusesready to give bits of knowledge into the unwavering quality of online substance. In this paper, every person center around the programmed ID of phony substance in the news stories. In

Uploaded by

Velumani s
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views

Fake News Detection System Using Machine Learning

Expansion of deluding data in ordinary access news sources, for example, web-based media channels, news web journals, and online papers have made it testing to distinguish reliable news sources, hence expanding the requirement for computational apparatusesready to give bits of knowledge into the unwavering quality of online substance. In this paper, every person center around the programmed ID of phony substance in the news stories. In

Uploaded by

Velumani s
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ISSN 2320 - 2602

Volume 10 No.6, June 2021


Anagha S Anand et al., International Journal of Advances in Computer Science and Technology, 10(6), June 2021, 12 - 15
International Journal of Advances in Computer Science and Technology
Available Online at http://www.warse.org/IJACST/static/pdf/file/ijacst021062021.pdf
https://doi.org/10.30534/ijacst/2021/021062021

Fake News Detection System Using machine Learning


Anagha S Anand1 , Aneeta Solaman2, Berin John3 ,Vineeta Samson4,Asst Prof Teenu Jose5
1
Department of Computer Science and Engineering, Albertian Institute of Science and Technology, Kalamassery, Kerala,
India,[email protected]
2
Department of Computer Science and Engineering, Albertian Institute of Science and Technology, Kalamassery, Kerala,
India,[email protected]
3
Department of Computer Science and Engineering, Albertian Institute of Science and Technology, Kalamassery, Kerala,
India,[email protected]
4
Department of Computer Science and Engineering, Albertian Institute of Science and Technology, Kalamassery, Kerala,
India,[email protected]
5
Department of Computer Science and Engineering, Albertian Institute of Science and Technology, Kalamassery, Kerala,
India, [email protected]

ABSTRACT expected effect on the society. At the end of the day,


counterfeit news is truly deluding and impacting individuals
Expansion of deluding data in ordinary access news sources, to accept on something that isn't valid and most likely have
for example, web-based media channels, news web journals, been controlled. Sometime in the past on the off chance that
and online papers have made it testing to distinguish reliable anybody required any news, the person in question would sit
news sources, hence expanding the requirement for tight for the following day paper. Nonetheless, with the
computational apparatuses ready to give bits of knowledge development of online papers who update news quickly,
into the unwavering quality of online substance. In this paper, individuals have discovered a superior and quicker approach
every person center around the programmed ID of phony to be educated regarding the issue of his/her advantage.
substance in the news stories. In the first place, all of us These days informal communication frameworks, online
present a dataset for the undertaking of phony news news entryways, and other on the web media have become
identification. All and sundry depict the pre-preparing, the primary wellsprings of information through which
highlight extraction, characterization and forecast measure in fascinating and breaking news are shared at a fast pace. Not
detail. We've utilized Logistic Regression language handling with standing, numerous news entries serve exceptional
strategies to order counterfeit news. The prepreparing interest by taking care of with mutilated, in part right, and at
capacities play out certain tasks like tokenizing, stemming times fanciful news that is probably going to draw in the
and exploratory information examination like reaction consideration of an objective gathering of individuals.
variable conveyance and information quality check (for Counterfeit news has become a significant concern for being
example invalid or missing qualities). Straightforward pack dangerous once in a while spreading disarray and purposeful
of-words, n-grams, TF-IDF is utilized as highlight extraction disinformation among individuals. The term counterfeit news
strategies. Strategic relapse model is utilized as classifier for has become a popular expression nowadays.
counterfeit news identification with likelihood of truth. Notwithstanding, a concurred meaning of the expression
"counterfeit news is still to be found. It may very well be
Key words: Fake news detection, Logistic regression, TF- characterized as a kind of sensationalist reporting or publicity
IDF vectorization. that consists of purposeful falsehood or lies spread by means
of conventional print and broadcast news media or online
1. INTRODUCTION web-based media. These are distributed for the most part
with the goal to misdirect to harm a local area or individual,
Counterfeit news spread can't be undermined as it ready to make tumult, and gain monetarily or strategically. Since
convey adverse consequences to the general population for a individuals are frequently unfit to invest sufficient energy to
since a long time ago run. Tricky issues may emerge from cross-check references and make certain of the validity of
counterfeit news, for example, defamations, disarray and information, mechanized discovery of phone news is
confusions and provocative untruths until up to a level essential. Accordingly, it is accepting incredible
conclusion issues played by flippant gatherings or people consideration from the examination local area.
who love to spread disdain and devastation among one after
another. A news that has been controlled or created in its 2. LITERATURE REVIEW
substance with things that are random, completely or
somewhat bogus is classified as phony news. Identifying the In general, Fake news might be categorized into three groups.
news is being phony or on the other hand not phony is truly the primary group is fake news, which is news that is
troublesome. However, within the ability of man-made completely fake and is formed up by the writers of the
brainpower in AI, this is made conceivable to recognize articles. The second group is fake satire news, which is fake
counterfeit news. A few nations have shown their news whose main purpose is to supply humour to the readers.
responsibility in managing counterfeit news because of its
12
Anagha S Anand et al., International Journal of Advances in Computer Science and Technology, 10(6), June 2021, 12 - 15

The third group is poorly written news articles, which have a 3. METHODOLOGY
point of real news, but they're not entirely accurate. In short,
it's news that uses, for example, quotes from political figures
to report a totally fake story. Usually, this type of stories is
meant to market certain agenda or biased opinion [1]. In the
article published by Kai Shu, Amy SlivaSuhang Wang,
Jiliang Tang, and Huan Liu [2], they explored the fake news
problem by reviewing existing literature in two phases:
characterization and detection. In the characterization phase,
they introduced the essential concepts and principles of faux
news in both traditional media and social media. within the
detection phase, they reviewed existing fake news detection
approaches from a data mining perspective, including feature
extraction and model construction.

Hadeer Ahmed, Issa Traore, and Sherif Saad [3]


proposed in their paper, a fake news detection model that
uses n-gram analysis and machine learning techniques. They
investigated and compared two different features extraction
techniques and 6 different machine classification techniques.
Experimental evaluation yields the best performance using
Term FrequencyInverted Document Frequency (TF-IDF) as
feature extraction technique, and Linear Support Vector
Machine (LSVM) as a classifier, with an accuracy of 92%.
Perez-Rosas, Veronica & Kleinberg, Bennett and Lefevre
Alexandra and Rada Mihalcea [4] in their publication
“Automatic detection of faux news” specialise in the
automatic identification of faux contents in online news. For
this they introduced two different datasets, one obtained
through crowd sourcing and covering six news domains
(sports, business, entertainment, politics, technology and Figure 1:flowchart of the proposed system
education) and another one obtained from the web covering
celebrities. They developed some classification models using 3.1 Data Preprocessing
linear sum classifier and fivefold cross verification with This module contains all the pre processing functions needed
accuracy, precision and recall and FI measures averaged over to process all the input documents and texts. First we read the
the five iterations that rely on the mixture of lexical, syntactic train, test and validation data files then perform some pre
and semantic information also as features representing text processing like tokenizing, stemming etc. There are some
readability properties which are like human ability to identify exploratory data analysis is performed like response variable
fakes. E.M Okoro, B.A Abara, A.O. Umagba, A.A. distribution and data quality checks like null or missing
Ajonyeand Z. S. Isa [5] in their publication _A Hybrid values etc.
approach to fake news detection on social media employing a
combination of both human-based and machine-based 3.2 Stemming
approaches. Since traditional and machine based approaches In linguistic morphology and knowledge retrieval, stemming
have some limitations and can’t single handedly solve the is that the process of reducing inflected (or sometimes
problems like human literacy and cognitive limitations and derived) words to their word stem, base or root form—
the inadequacy of machine based approach. To solve all these generally a word form. The stem needn't be just like the
problems, they proposed a Machine Human (MH) model for morphological root of the word; it's usually sufficient that
fake news detection in social media. This model combines related words map to an equivalent stem, even if this stem
the human literacy news detection tool and machine isn't in itself a legitimate root.
linguistic and network-based approaches. This way, the 2
parallel approaches of detection are at work, each helping to 3.3 Tokenizing
supply a balance for the opposite . The existing system and Tokenization is that the process of replacing sensitive data
research work reveal that the majority classification with unique identification symbols that retain all the essential
algorithms perform well to detect or predict the fakeness of a information about the info without compromising its security.
news story . Though the logistic regression serves well for Tokenization, which seeks to attenuate the quantity of
this purpose, our system is based on this information and thus knowledge a business must keep it up hand, has become a
we focus to figure with classification algorithms just like the well-liked way for little and mid-sized businesses to bolster
logistic regression. the security of mastercard and e-commerce transactions while
minimizing the value and complexity of compliance with
industry standards and government regulations.
13
Anagha S Anand et al., International Journal of Advances in Computer Science and Technology, 10(6), June 2021, 12 - 15

3.4 Feature Selection feature extraction and machine learning techniques. The
In this module we've performed feature extraction and proposed model achieves accuracy of roughly 92% when
selection methods from sci-kit learn python libraries. using TF-IDF features and logistic regression classifier. After
testing the data, the result will be an exactness of 0.92% and
3.5 Count Feature F1 score of 0.923.
The CountVectorizer provides an easy thanks to both
tokenize a set of text documents and build a vocabulary of 4.1 Input
known words, but also to encode new documents using that
vocabulary. you'll use it as follows: 1. Create an instance of
the CountVectorizer class. 2. Call the fit() function so as to
find out a vocabulary from one or more document. 3. Call the
transform() function on one pr more document as required to
encode each as vector.

An encoded vector is returned with a length of the entire


vocabulary and an integer count for the number of times each
word appeared within the document. Because these vectors
will contain tons of zeros, we call them sparse. Python
provides an efficient way of handling sparse vectors in the
scipy.sparse package. The vectors returned from a call to
transform() are going to be sparse vectors,and you'll
transform them back to numpy arrays to seem and better
understand what's happening by calling the to array()
function.

3.6 Classifier
In this module everybody build all the classifiers for
predicting the fake news detection. The extracted features are
fed into different classifiers. One and all used Logistic
Figure 2: user entering a news
Regression classifier from sklearn. Each of the extracted
features were utilized in the classifier.Once fitting the model, Figure 2 shows the user interface where input values are submitted
we compared the f1 score and checked the confusion to the system.
matrix. After fitting all the classifiers, two best performing
models were selected as candidate models for fake news 4.2 Output
classification.Finally selected model was used for fake news
detection with the probability of truth. additionally to this,
also extracted the highest 50 features from our term-
frequency tfidf Vectorizer to ascertain what words are most
and important in each of the classes. All of us have also used
Precision-Recall and learning curves to see how training and
test sets perform once everybody increases the quantity of
knowledge in our classifiers.

3.7 Logistic Regression


It is a Machine Learning classification algorithm that is used
to predict the probability of a categorical dependent variable.
In logistic regression, the dependent variable may be a binary
variable that contains data coded as 1 (yes, success, etc) or 0
(no, failure, etc.).In other words, the logistic regression
model predicts P(Y=1) as a function of X.

4. RESULT AND CONCLUSION

In this paper, each person used Logistic Regression classifier


which can serve the model and work with the user input.
Here, ourselves presented a detection model for fake news Figure 3: Result shown after prediction
using TFIDF analysis through the lenses of different feature
extraction techniques. Everyone have investigated different Figure 3 is the snapshot of output screen which gives the output
values to user.
14
Anagha S Anand et al., International Journal of Advances in Computer Science and Technology, 10(6), June 2021, 12 - 15

REFERENCES [6]. Metz C (2016), “The bittersweet sweepstakes to build


an AI destroys fake news’’, Dec 2016 (Online).
[1]. Schow, “A:The 4 Types of ‘Fake News’ ”. observer Availablehttps://www.wired.com/2016/12/bittersweet-
(2017). http://observer.com/2017/01/fake- news- russia sweepstakesbuild-aidestroysfake-news/.
hacking-clinton-loss/.
[7]. Granik M, Mesyura V (2017) ,“ Fake news detection
[2]. “Fake News Detection on Social Media: A Using Naïve bayes classifier”. In: 2017 IEEE first
Data Mining Perpective”. Kai Shu, Amy Sliva, Jiliang Ukraine conference on electrical and computer
Tang,and Huan Liu Computer Science & Engineering, engineering (UKRCON), Kiev, Ukraine.
Arizona State UniversityTempe, AZ, USA CharlesRiver
Analytics, Cambridge,MA, USA Computer Science & [8].Bhowmik D, Zargari S, Ajao O (2018), “ Fake news
Engineering ,Michigan State University,East Lansing, areidentification twitter with hybrid CNN and RNN
USA . models”. In: Proceedings of the 9th international conference
on social media and society.
[3].“Detection of Online Fake News Using N-Gram
Analysis And machine learning technique”.Hadeer [9]. Zheng L, Zhang J ,CuiQ, LiZ, Yang P S, YangY(2018).
Ahmed, Issa Traore, and Sherif Saad ECE “TICNN :convolutional neural networks for fake
Department, University of Victoria, Victoria, BC, news detection”.Arxiv preprint.
Canada School of Computer Science, University of
Windsor, Windsor, ON, Canada. [10]. Lakshmanarao A, Swathi Y, Kiran TSR (2019). “ An
efficient fake news detection system using machine
[4]. Verónica Pérez - Rosas, Kleinberg Bennett, Alexandra learning”. Int J InnovTechnol Exploring Eng (IJITEE)
Lefevre and RadaMihalcea, “Automatic detection of 8(10).
Fake news”, ||proceedings of the 27th International
Conference on Computational Linguistics, pp. 3391–
3401,Santa Fe, New Mexico, USA, 2018.
[5]. E. M. Okoro, B. A. Abara, A. O. Umagba, A.A. Ajonye,
And Z.S Isa,―“Hybrid Approach to Fake news
detection on social media”,‖ vol. 37, no. 2, pp. 454-
462, 2018.

15

You might also like