0% found this document useful (0 votes)
69 views9 pages

Identifying Fake News

The document discusses identifying fake news using deep learning models. It presents a fake news detection model based on LSTM-RNN. The model uses publicly available news article datasets to evaluate the performance of LSTM compared to other methods like CNN, vanilla RNN, and unidirectional LSTM-RNN. The results show LSTM achieves superior accuracy for fake news detection. The document also provides an overview of existing fake news detection systems and algorithms commonly used like Naive Bayes, LSTM, SVM, and logistic regression.

Uploaded by

akshay tarate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views9 pages

Identifying Fake News

The document discusses identifying fake news using deep learning models. It presents a fake news detection model based on LSTM-RNN. The model uses publicly available news article datasets to evaluate the performance of LSTM compared to other methods like CNN, vanilla RNN, and unidirectional LSTM-RNN. The results show LSTM achieves superior accuracy for fake news detection. The document also provides an overview of existing fake news detection systems and algorithms commonly used like Naive Bayes, LSTM, SVM, and logistic regression.

Uploaded by

akshay tarate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Identifying Fake News

Akshay Tarate1, Shubham Aglawe 2, Aditya Hirve 3, Anil Rathod4, Trupti Dange5
(Computer Engineering, RMDSSOE, SPPU, India)

ABSTRACT :

The rapid development of the Internet allows for the rapid dissemination of information via social media and
websites.Social Media plays a vital role in the public dissemination of information about events nowadays. Without
the concern about the credibility of the information, the unverified or fake news is spread in social networks and
reaches thousands of users. Fake news is typically generated for commercial and political interests to mislead and
attract readers. The spread of fake news has raised a big challenge to society. Automatic credibility analysis of news
articles is a current research interest. Deep learning models are widely used for linguistic modeling. Typical deep
learning models such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) can detect
complex patterns in textual data. Long Short-Term Memory (LSTM) is a tree-structured recurrent neural network
used to analyze variable-length sequential data. LSTM allows looking at particular sequences both from
front-to-back as well as from back-to-front. The paper presents a fake news detection model based on the
LSTM-recurrent neural network. Two publicly available unstructured news articles datasets are used to assess the
performance of the model. The result shows the superiority in terms of accuracy of LSTM model over other methods
namely CNN, vanilla RNN and unidirectional LSTM for fake news detection.

Keywords - Deep learning; Convolutional Neural Network; Recurrent Neural Network;Long Short-Term Memory;
LSTM

I. INTRODUCTION
Fake news is a yellow press that is intentionally misinforming or smearing through both conventional print
media and modern online social media.There are some barriers to fake news recognition in social media. Firstly,
fake news data can hardly be collected. In addition, the manual labelling of false news is difficult. As they are
published deliberately to confuse viewers, simply by news content it is difficult to identify them. In addition, closed
messaging applications are Facebook, Whatsapp, and Twitter. Consequently, it is difficult to accept the
disinformation distributed by reliable newsagents or their friends and family as false. The credibility of fresh and
time-bound news is not easy to check because it is not enough for the application data set to be trained.
The topic of disinformation on social media can be addressed in a number of ways. Statistical methods are
used to determine the relationship between different aspects of the information, analyse the information's originator,
and examine distribution patterns. Untrustworthy content is classified using machine learning algorithms, and the
accounts that post it are investigated. Various methods concentrate on the development of strategies for knowledge
authentication as well as case studies.
We define the fake news detection issue as a problem of credibility reduction, in which real ones are more
credible, whereas unauthentic ones are less credible.

II. EXISTING SYSTEM


The majority of current fake news detection models approach the problem as a binary classification task,
limiting the model's ability to understand how related or unrelated the broadcast news is to the real news.So
there is a need for a system that can provide the answers to the query.
III.Literature Survey

Sr. Paper Name Publication Author Concept


No. + Year
1. Fake News ICRTAC 2020 Pritika Bahada, Result:
Detection using Preeti Saxenaa Author, Platform
Bi-directional ,Raj Kama information is used to determine the
LSTM-Recurrent authenticity of the news.
Neural Network Conclusion:
The accuracy of Bi-directional LSTM model
with CNN, vanilla RNN, and unidirectional
LSTM-RNN are evaluated and compared.

2. Defending Arxiv.org, Rowan Zellers, Result:


Against 2019 Ari Holtzman, Author, Platform
Neural Fake Hannah information is used to determine the
News Rashkin, authenticity of the news.
Conclusion:
Yonatan Bisk
The accuracy of Bi-directional LSTM model
Ali Farhadi, with CNN, vanilla RNN, and unidirectional
Franziska LSTM-RNN are evaluated and compared.
Roesner, Yejin
Choi
3. A Survey of Fake Researchgate. XINYI ZHOU, Result:
News: net, 2019 REZA Data mining techniques have been used by
Fundamental ZAFARANI applying
Theories, association rules algorithms to generate
Detection course rules.
Methods, and Conclusion:
Opportunities Coverage
a measure was used to study the authenticity
of the news.

4. Fake News, researchgate.n Valeryia Result:


Conspiracies and et,2019 Mosinzova, Algorithms used in Recommendation
Myth Debunking Benjamin system: collaborative recommendation
in Social Media Fabian, Tatiana algorithm based on users and items studies.
Ermakova, Conclusion:
Annika It solves the problem of improving the
Baumann quality of feed recommendations.
5. Fake News IEEE 2020 Ankit Result:
Detection Kesarwani, Features selection from the datasets is used
on Social Media Sudakar Singh by data mining algorithm (K-Nearest
using Chauhan,Anil Neighbor) to classify the news article on
K-Nearest Ramachandran social media.
Neighbor Nair Conclusion:
Classifier Provide a specific frame to predict fake news
on social media.

IV. Fake News Identifying System

a. By using URL and Keywords:

When the user asks a query to the system, Naive Bayes classifies the query and LSTM gives the probability
score.

b. Search Query and Keywords in the knowledge database:

Doc2Vec is based on the Word2Vec model. It is used to preserve word order information. Extracts
Word2Vec features and adds an additional “document vector” with information about the entire document.

V. Algorithms

1. Naive Bayes Algorithm(Classification)


2. LSTM(Long Short Term Memory)
3. SVM(Support Vector Machine)
4. Logistic Regression
5. Random Forest

VI. Algorithm Details


1. Naive Bayes

The Naive Bayes model is simple to construct and is especially good for huge data sets. Naive Bayes is
renowned to outperform even the most advanced classification systems due to its simplicity.The Bayes theorem
allows you to calculate posterior probability P(c|x) from P(c), P(x), and P(x|c) using P(c), P(x), and P(x|c). Consider
the following equation:
2. LSTM
LSTMs are specifically developed to prevent the problem of long-term dependency. They don't have to work
hard to remember knowledge for lengthy periods of time; it's like second nature to them!
All recurrent neural networks are made up of a series of repeated neural network modules. This repeating
module in ordinary RNNs will have a relatively simple structure, such as a single tanh layer. LSTMs have a
chain-like structure as well, but the repeating module is different.
Instead of a single neural network layer, there are four, each of which interacts in a unique way.
Each line in the figure above transmits a full vector from one node's output to the inputs of others. The pink
circles denote pointwise operations, such as vector addition, and the yellow boxes denote learnt neural network
layers. Concatenation occurs when lines merge, whereas forking occurs when a line's content is replicated and
the copies are sent to various locations.

3. SVM
Each data item is plotted as a point in n-dimensional space (where n is the number of features you have), with
the value of each feature being the value of a certain coordinate in the SVM algorithm. Then we accomplish
classification by locating the hyper-plane that clearly distinguishes the two classes (look at the below snapshot).
Individual observation coordinates are what Support Vectors are. The SVM classifier is a frontier that separates
the two classes (hyper-plane/line) as well as possible.

4. Logistic Regression

Under Supervised Learning approaches, one of the most common Machine Learning algorithms is logistic
regression.It can be used for both classification and regression problems, though it is more commonly employed
for classification.With the help of independent factors, logistic regression is utilised to predict the categorical
dependent variable.Only 0 and 1 can be the outcome of a Logistic Regression problem.

When the probabilities between two classes must be calculated, logistic regression can be utilised. For example,
if it will rain today or not, 0 or 1, true or false, and so on.

5. Random Forest

Random forests, also known as random decision forests, are an ensemble learning method for classification
and other tasks that work by building a large number of decision trees during training and then outputting
the class that is the mode of the classes (classification) or mean/average prediction (regression) of the
individual trees. Random forests outperform decision trees in general, but their accuracy is low.
VII. PROPOSED MODEL

System Architecture

● We start from collecting the data to train our model first step include Data pre processing, In this
step we clean the data i.e remove all blank spaces ,noise,etc
● The processed data is tokenized,tagged ,after this we collect the verbs and topics to which the
news is related.
● After this we vectorize the data using doc to vec model which makes it easy for the next process.
● This data in vector form is then processed through our classification models which classify the
news according to their topics.
● There are 4 types of classification used i.e Naive Bayes,SVM,Logistic Regression and Random
Forest,all these models run simultaneously using Pipelining.
● Whichever model provides the best and fast result is then selected and processed into the main
model .i.e LSTM.
● Using this model it checks the data using web search and is scored accordingly.
● The score is divided into 3 category -
1. Time Credibility: It checks how much time was required for the data to be available,It
accounts for 40% of the score.Real news won't take a lot of time to be searched.
2. Website Credibility: This checks the website url if it is a trusted website or not.It
accounts for 40% of the score.

3. Data Credibility: In this we check the headline,data and compare it with the entered
data.This accounts for 20% of the score.

● After this process the score is summed up and then we take an average score.
● If the score is above 60 we treat it as real news and if the score is less than 60 we search this news
on social media app Twitter.
● After getting the score from this social media module we add up this score with the previously
fetched web search score and take the average frt is termed as fake newsom this score if the score
is above 60 we term it as Real news and if the score is below 60 it is termed as fake news.
VIII. Fake News Detection system can be divided into three major parts -

● Web Application i.e., Front end


● Processing part
● LSTM model

1. Web Application is the front end from which users can ask questions that can be related
to anything from politics, financial to other general news which they got from whatsapp
or any social media site. Here User is provided with information on whether news is fake
or Truthful.
2. In processing part the gathered information is processed in which we remove noisy
data.The processed data is tokenized and tagged. Classifiers classify news according to
the topic. There are four type of classifier used i.e Naive Bayes, SVM, Logistic
Regression and Random Forest, all these models run simultaneously using pipelining.
3. The LSTM model checks data using web search and it is scored accordingly. The score
is divided into 3 categories: time credibility, website credibility and data credibility.
Average of these scores is accepted. If the score is above 60 it is real news and if the
score is less than 60 it is fake news.

IX. CONCLUSION

Much of the jobs will be completed digitally in the 21st century. Applications like Facebook, Twitter and
news articles that had previously been preferred as hardcopies are now being replaced. The increasing issue of fake
news just complicates matters and seeks to modify or impede people's views and attitudes towards digital
technology usage. Thus Google and Facebook take action to discourage the dissemination of false news in order to
stop the phenomenon. Our systems enter a URL or an existing database and mark it as valid or incorrect. Different
Algorithms and Machine Learning techniques must be used to implement this.
For the balanced and imbalanced high-dimensional news data collection, the proposed model works well.
In the future, more in-depth research will be needed to better understand how a deep learning model with attention
will aid in the automated credibility analysis of news.
REFERENCES

[1]Reuters World News, October 21, 2020, https://www.reuters.com/article/us-brazil-election-whatsa pp-explaine-fake-news-in-braz


il-election-idUSKCN1MU0UP,last accessed 2020/07/13

[2] CNN Business April 21, 2020, https://edition.cnn.com/2020/04/21/tech/sri-lanka-blocks-social-media/index.html, last accessed 2019/07/13.

[3] Fake News Challenge, http://www.fakenewschallenge.org/, last accessed 2020/07/13.

[4] William Kai Shu, Amy Sliva , Suhang Wang , Jiliang Tang and Huan Liu. (2019). “Fake News Detection on Social Media: A Data Mining
Perspective”, SIGKDD Explorations: 19(1).

[5] H. Allcott and M. Gentzkow.(2019) “Social Media and Fake News in the 2018 Election,” Journal of Economic Perspectives, 31(2): 211–236.

[6]Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. (2019). “Comparative Study of CNN and RNN for Natural Language
Processing”.

[7] Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao. (2019). “Recurrent Convolutional Neural Networks for Text Classification”, Proceedings of the
Twenty-Ninth AAAI Conference on Artificial Intelligence

[8]Granik, Mykhailo ,Volodymyr Mesyura.(2019). “Fake News Detection using Naive Bayes Classifier.” IEEE First Ukraine Conference
on Electrical and Computer Engineering (UKRCON):900-903.

[9]S. Gilda. (2019). “Evaluating Machine Learning Algorithms for Fake News Detection,” IEEE 15th Student Conference on Research
and Development (SCOReD), Putrajaya: 110-115.

[10]Bourgonje, Peter, Moreno Schneider, Julian and Rehm, Georg. (2019). “From Clickbait to Fake News Detection: An Approach
based on Detecting the Stance of Headlines to Articles”. Proceedings of the 2019 EMNLP Workshop: Natural Language Processing
meets Journalism:84 -89.

[11] Wang, William Yang. (2019). “"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection.” ACL.

[12]Liu, Yang & Han, Kun & Tan, Zhao & Lei, Yun.(2019). “Using Context Information for Dialog Act Classification in DNN
Framework”, Proceedings of the Conference on Empirical Methods in Natural Language Processing: 2170–2178.

[13] Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Jim Jansen,Kam-Fai Wong and Meeyoung Cha. (2018). “Detecting Rumors from
Microblogs with Recurrent Neural Networks”, Proceedings of the Twenty-Fifth International Joint Conference on Artificial
Intelligence:3818-3824.

[14]Lakkaraju, Himabindu, Richard Socher, and Chris Manning, (2018). “Aspect Specific Sentiment Analysis using Hierarchical Deep
Learning”, NIPS Workshop on deep learning and representation learning.

[15]Kim, Yoon. (2018). “Convolutional Neural Networks for Sentence Classification”. Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing. 10.3115/v1/D14-1181.

[16] GloVe: Global Vectors for Word Representation, https://nlp.stanford.edu/projects/glove/, last accessed 2019/07/13 .

[17] Hassan and A. Mahmood. (2019). “Convolutional Recurrent Deep Learning Model for Sentence Classification.” IEEE Access
6:13949-13957

[18] Hochreiter, J. Schmidhuber. . “Long Short-Term Memory”, Neural Computation, 9(8):1735-1780.

[19] K. Greff, R. K.Srivastava, J. Koutník, B. R. Steunebrink, J. Schmidhuber. (2019). “LSTM: A search space odyssey.”IEEE Transactions on
Neural Networks and Learning Systems.

[20] real_or_fake, https://www.kaggle.com/rchitic17/real-or -fake, last accessed 2020/07/13. [21] Fake News detection,
https://www.kaggle.com/jruvika/fake-news-detection, last accessed 2020/07/13.

You might also like