Project Presentation
Project Presentation
PRESENTATION
FAKE NEWS
DETECTION USING
ML TECHNIQUES
GROUP NO. 9
1
MOTIVATION
Fake News is a challenging problem in today's times.
Social Media websites are flooded with much
misinformation, which can prove fatal.
Twitter particularly struggles with the fake news problem.
However, there is a certain regular pattern in fake news.
Some individuals are more likely to spread fake news.
We can use Machine Learning to identify such patterns and
try to predict fake news.
2
LITERATURE REVIEW
Researchers used ML models like logistic regression and deep
learning models.
Researchers describe the use of detecting fake news.
They discuss vectorization techniques like TF-IDF and BOW to
convert text to numeric values.
Researchers discussed the importance of addressing the bias
using lexical and sentiment analysis.
They experimented with several models like SVM, Random
Forest etc.
3
DATASET USED
Used Liar Dataset
It contains sentences with their speakers
and their affiliations with labels
representing fake news or not.
The dataset contains 16 columns and
12788 rows.
Some columns are labels, statement
speakers, etc.
4
EDA SOME SCATTERPLOTS !!!
5
DATA PREPROCESSING
We clean the data and remove
punctuation marks, white spaces, etc
We tokenize the data using NLP
We use TF-IDF and BOW to vectorize the
tokenized data into numeric form.
We use wordcloud to visualize the word
frequencies.
We use a label encoder on Political Party
and speaker column.
6
DATA PREPROCESSING - 2
We dropped 12 columns out of 16 columns
The decision to keep which columns were
taken based on the heatmap.
Columns with high correlation were
dropped.
We take 4 partitions of data with and
without party and speaker using TF-IDF
and BOW.
7
MORE ON NLP !!!
We use TF-IDF and BOW in NLP
vectorisation.
8
PERFORMING TSNE !!
Following is the result of TSNE on TF-IDF Vector
9
ML MODELS
1 We use Grid Search to find
the best hyperparameters
10
GAUSSIAN NAIVE BAYES
The accuracy using Gaussian Naive Bayes is 58.09%
11
LOGISTIC REGRESSION
The accuracy using Logistic Regression is 61.74%
12
DECISION TREE
The accuracy using the Decision Tree is 56.91%
13
RANDOM FOREST
The accuracy using Random Forest is 59.57%
14
ADABOOST
The accuracy using Adaboost is 58.93%
15
SVM
The accuracy using SVM is 59.34%
16
MLP
The accuracy using MLP is 58.95%
17
MLP WITH PCA
The accuracy using MLP along with PCA is 57.70%
18
MLP WITH TSNE
The accuracy using MLP along with TSNE is 55.27%
19
RESULTS SUMMED UP !!
20
LIMITATIONS
The accuracies are close to 60%, which is not much
efficient.
This is because it's impossible to solve this problem
using standard ML and NLP Techniques.
It is impossible to predict whether the news is fake
without knowing the ground truth at that time.
21
CONCLUSION
We can predict whether the given news is
fake or not with an accuracy better than a
random guess i.e. 50%.
The best accuracy was 62.93% using
Random Forest, BOW vectorisation, with
speaker and party and Gini gain as feature
selection criteria.
22
FUTURE WORK
We can extend the scope by also incorporating
visual and audio content in news articles.
We will try to incorporate languages other than
English.
Develop an interactive system where users can
give a news article as input and can receive a
credibility score for that article, suggesting its
credibility.
23
Timeline
Week 9
Week 8 Week 10
Learned about
Learned TSNE and PCA. Documented
Week 7 about MLP.
Observed
the complete Week 11
accuracy of
Implemented project.
Learned the MLP with
MLP in code. Did a Identified the
about SVM and without
Observed the complete future work
Implemented TSNE
accuracy of analysis of Identified the
SVM Model
the model. performance limitations
in the Code
of all models
24
Work Division
Sahil Goyal
Deeptorshi
Mondal Handled the
data Vibhor
Anshak Goel Handled the preprocessing Agarwal
Handled the NLP
documentation part
Part. Also helped in Helped in Decided which
Also did some NLP Part. implementing ML Models to
data the ML Models use.
preprocessing Analyzed the
accuracies of
the models. 25
ANY
QUESTIONS ?
26
REFERENCES
https://arxiv.org/pdf/1705.00648.pdf%E2%80%8B
https://www.researchgate.net/publication/336436870_Fak
e_News_Detection_Using_Machine_Learning_approaches_
A_systematic_Review
https://paperswithcode.com/paper/liar-liar-pants-on-fire-
a-new-benchmark
https://github.com/manideep2510/siamese-BERT-fake-
news-detection-LIAR
27