Machine Learning Techniques For The Classification of Fake News
Machine Learning Techniques For The Classification of Fake News
of Fake News
2021 International Conference on Computational Intelligence and Computing Applications (ICCICA) | 978-1-6654-2040-2/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCICA52458.2021.9697267
Abstract—Social Networking sites like Twitter, Instagram, and information (text) using machine learning techniques that will
Facebook have become an essential part of our daily lives, but differentiate between the “fake” and the “real” news.
social media comes with its own advantages and disadvantages.
Many of the time, these social networking platforms are used to The manuscript is organized as the first section has
distribute fake news or incorrect information, and there is a discussed the background of fake news detection and its need.
growing demand for classification and categorization of this In the second section, a brief review is presented on popular
type of content. As a result, we have explored a novel technique researches on the fake news detection problem briefing about
for classifying fake news that incorporates machine learning their findings, and future opportunities for further research.
methods. This paper describes the development of a method that The next section has briefed about the dataset source and its
provides the TF-IDF Vectorizer to classify which news is properties. Implementations of the popular machine learning
legitimate and which is fraudulent. Implementation is
performed using datasets from Kaggle. The results indicate that
techniques are presented in the fourth section. In the end, a
this method performs effectively. summary of the work with further opportunities for research
in fake news detection is given in the conclusion section.
Keyword—Fake News, Machine Learning, Term
Frequency, Inverse Document Frequency, Vectorizer. II. LITERATURE SURVEY
Authorized licensed use limited to: Khon Kaen University provided by UniNet. Downloaded on September 29,2022 at 04:52:29 UTC from IEEE Xplore. Restrictions apply.
The authors in [5] used a simple approach for tackling this performance metrics, the Linear Support Vector (Linear
problem, they have implemented a Naive Bayes classifier in SVC), Logistic Regression (LR), and Passive Aggressive
their model of fake news detection. They have used a dataset (PA) algorithms each perform better while using the TF-IDF,
from Buzzfeed News and they managed to achieve 74% CV, and HV feature extraction approaches, respectively. To
accuracy. The authors found a resemblance in spam messages select the best model, the authors [19] have investigated
and fake news articles, and because the Naive Bayes classifier conventional machine learning approaches to develop a
works well on spam messages, the authors decided to use that supervised machine learning model that can categorize fake
in classifying fake news articles. In [6], the authors have news as positive or negative, using tools such as Natural
shown the performance of twelve classifiers considering their Language Processing (NLP) and Python Scikit Learn for text
false prediction ratio on three different datasets. Based on the analysis.
Authorized licensed use limited to: Khon Kaen University provided by UniNet. Downloaded on September 29,2022 at 04:52:29 UTC from IEEE Xplore. Restrictions apply.
resources, Buzzfeed and correlation between fake fake news or not cannot be
PolitiFact. news and user profiles. accurately judged.
Passive
The author used various
aggressive Using a Web Crawler and
algorithms to accurately
classifier, an online database, the
access whether the news is
Logistic BuzzFeed News, BS detector, authors hope to create their
[14] true or fake.
regression, LIAR Dataset. own dataset that will be
Implementation results
Random forest, maintained up to date with
showed an accuracy of
Naive Bayes all live and important news.
92.73 %
classifier
Random Forest,
Naïve Bayes, Need to explore more
0.98 f1-score on fandom
[15] Multinomial, Indonesian news dataset datasets and compare the
forest
Support Vector performance on them.
Machine
The authors compare their 1) The authors used an
model's performance using absolute probability
1) OpenSources.co, 2) Gather
three different features, threshold while assessing
the data from Before its
e.g., TF-IDF using their model. For models
News, Zero Hedge, Raw
syntactical structure with poorly calibrated
Story, etc. for fake news
TF-IDF bi- frequency, bigram probability scoring, this
[16] articles and, BCC, USA
gram, PCFG frequency, and union of method is unreliable. 2)
Today, Washington Post, etc.
features to find which The authors used vectorized
for reliable news articles. It
factors are most predictive. approvals that make it
contains a total of 11051
Implementation results difficult to predict which
news articles.
showed an accuracy of features are more
77.2% important.
1) The proposed model not
only determines whether
The proposed model will
the information is true or
Article from google news, not able to predict the news
Naive Bayes, not but also suggests
[17] Feedly news360 to compare article as fake or real if it is
SVM, NLP relevant and genuine news
them with the given text. too recent and not available
articles. The proposed
in the database.
model works with 93.6%
accuracy.
Random
Forest(RF),
Support Vector
The views of Machine
Classifier, Naïve Experiment results showed
Learning and Natural
Bayes, OpenSources dataset, Kaggle that Gradient Boosting has
Language Processing can
[18] AdaBoost, dataset, dataset by George the maximum as 88% mean
be compared to a deep
KNN, Multi- McIntire accuracy and 0.91 F1-
learning strategy for
Layer Score.
detecting false news.
Perceptron &
Gradient
Boosting
In [20], authors have proposed the method for predicting dataset more quickly and easier to implement the proposed model.
fake news that mixes the headline and the content of the
article. Authors have mentioned in their observations from TABLE II. DATASET DESCRIPTION
the experimental results that such combinations can more
accurately forecast bogus news. The deployment of deep
learning approaches for false news detection has been studied Total Fake Real
Corp Cleaned #Featu
by the authors to overcome the limitations of machine articl new new
us articles res
learning techniques in fake news detection [21]. es s s
Authorized licensed use limited to: Khon Kaen University provided by UniNet. Downloaded on September 29,2022 at 04:52:29 UTC from IEEE Xplore. Restrictions apply.
cleaned which includes 10369 are fake news and 10349 are
real news.
, , , ∙ , (1)
Where,
, 1 , (2)
(3)
:
shows that at least this time you are wrong, a better Corpus dataset is implemented for the fake news detection
model should modify this mistake. using TF-IDF vectorizer and TF-IDF vectorizer with NLP. Results
of both feature extraction techniques are presented comparatively
D. Natural Language Processing: A field in Artificial in table-3, 4, and 5 using the corpus discussed in dataset section.
It is observed from the results presented in table-3 that both
Intelligence that is used to make an analogy between algorithms perform equally on the dataset. The accuracy of the
computer and human language and how to build an model does not vary significantly when NLP is used. TF-IDF
application that can process and identify meaningful vectorizer has attained 92.72% accuracy without any additional
information in a given set of texts [9]. algorithm. However, TF-IDF vectorizer with NLP gives 92.66%
accuracy which is not much changed as compared with TF-IDF
vectorizer.
Authorized licensed use limited to: Khon Kaen University provided by UniNet. Downloaded on September 29,2022 at 04:52:29 UTC from IEEE Xplore. Restrictions apply.
TABLE III. RESULTS OF COUNT VECTORIZER focused on exploring fake news detecting techniques. In future
IMPLEMENTATION ON KAGGLE DATASET
more broad and large datasets can be used to increase the accuracy
of the techniques studied.
Count F- Preci
Accuracy Recall
Vectorizer score sion REFERENCES
Multinomia [1] Rohit Kumar Kaliyar, “Fake News Detection Using A Deep Neural
89.78% 0.947 1.00 0.900 Network”, 2018 4th International Conference on Computing Communication
l and Automation (ICCCA), 2018, IEEE.
Passive [2] Kyeong-hwan Kim and Chang-sung Jeong, “Fake News Detection System
93.62% 0.947 1.00 0.900 using Article Abstraction”, 2019 16 International Conference on Computer
Aggressive
Science and Software Engineering(ICSSE), 2019, IEEE
[3] KarishnuPoddar, Geraldine Bessie Amali D. And K. S. Umadevi,
“Comparison of Various Machine Learning Models for Accurate Detection
TABLE IV. RESULTS OF COUNT VECTORIZER
of Fake News”2019 Innovations in Power and Advanced Computing
IMPLEMENTATION ON KAGGLE DATASET
Technologies (i-PACT), 2019, IEEE.
TF-IDF Accura F- Precisi Recal [4] BhavikaBhutani, NehaRastogi, PriyanshuSehgal and Archana Purwar, “Fake
Vectorizer cy score on l News Detection Using Sentiment Analysis”, 2019 Twelfth International
Conference on Contemporary Computing (IC3), 2019, IEEE.
Multinomia 89.99 [5] MykhailoGranik, VolodymyrMesyura, “Fake News Detection Using Naive
0.947 1.00 0.900 Bayes Classifier”, 2017 IEEE First Ukraine Conference on Electrical and
l %
Computer Engineering (UKRCON), 2017, IEEE.
Passive 95.11 [6] Kaur, Sawinder, Parteek Kumar, and PonnurangamKumaraguru.
0.947 1.00 0.900 "Automating fake news detection system using multi-level voting model."
Aggressive %
Soft Computing 24.12 (2020): 9049-9069.
[7] Ahmed, Hadeer, IssaTraore, and SherifSaad. "Detection of online fake news
TABLE V. RESULTS OF COUNT VECTORIZER using n-gram analysis and machine learning techniques." International
IMPLEMENTATION ON UNIVERSITY OF VICTORIA DATASET conference on intelligent, secure, and dependable systems in distributed and
cloud environments. Springer, Cham, 2017.
Count Accurac F- Precisi [8] Reis, Julio CS, et al. "Explainable machine learning for fake news detection."
Recall
Vectorizer y score on Proceedings of the 10th ACM conference on web science. 2019.
[9] Khan, JunaedYounus, et al. "A benchmark study on machine learning
Multinom methods for fake news detection." arXiv preprint arXiv:1905.04749 (2019).
96.27% 0.951 0.997 0.981
ial [10] Gravanis, Georgios, et al. "Behind the cues: A benchmarking study for fake
news detection." Expert Systems with Applications 128 (2019): 201-213.
Passive
[11] Qian, Feng, et al. "Neural User Response Generator: Fake News Detection
Aggressiv 97.2% 0.953 1.0 0.994 with Collective User Intelligence." IJCAI. Vol. 18. 2018.
e
[12] Yang, Shuo, et al. "Unsupervised fake news detection on social media: A
Passive aggressive approach wins over the multinomial generative approach." Proceedings of the AAAI conference on artificial
intelligence. Vol. 33. No. 01. 2019.
which is clearly analyzed from the implementation results.
[13] Shu, Kai, Suhang Wang, and Huan Liu. "Understanding user profiles on
Moreover, TF-IDF Vectorizer has shown good performance social media for fake news detection." 2018 IEEE Conference on Multimedia
as compared with the Count Vectorizer. Information Processing and Retrieval (MIPR). IEEE, 2018.
[14] Sharma, Uma, SiddarthSaran, and Shankar M. Patil. "Fake News Detection
using Machine Learning Algorithms." International Journal Of Engineering
Research & Technology (IJERT) NTASU 9.03 (2020).
[15] Al-Ash, HerleyShaori, et al. "Ensemble learning approach on Indonesian
fake news classification." 2019 3rd International Conference on Informatics
and Computational Sciences (ICICoS). IEEE, 2019.
[16] Dyson, Lauren, and Alden Golab. "Fake News Detection Exploring the
Application of NLP Methods to Machine Identification of Misleading News
Sources." CAPP 30255 Adv. Mach. Learn. Public Policy (2017).
[17] Jain, Anjali, et al. "A smart system for fake news detection using machine
learning." 2019 International Conference on Issues and Challenges in
Intelligent Computing Techniques (ICICT). Vol. 1. IEEE, 2019.
[18] Bali, Arvinder Pal Singh, et al. "Comparative performance of machine
learning algorithms for fake news detection." International conference on
Fig 2. Comparison of the results obtained for TF-IDF advances in computing and data sciences. Springer, Singapore, 2019.
vectorizer with and without NLP [19] Khanam, Z., et al. "Fake News Detection Using Machine Learning
Approaches." IOP Conference Series: Materials Science and Engineering.
Vol. 1099. No. 1. IOP Publishing, 2021.
VI. CONCLUSION AND FUTURE SCOPE [20] Nagaraja, Arun, et al. "Fake News Detection Using Machine Learning
Methods." International Conference on Data Science, E-learning and
In the first section, the manuscript presents a review Information Systems 2021. 2021.
of popular research related to false news detection. As a [21] Manzoor, Syed Ishfaq, and Jimmy Singla. "Fake news detection using
future work, researchers can explore machine learning machine learning approaches A systematic review." 2019 3rd International
techniques not only in determining for article is fake or real Conference on Trends in Electronics and Informatics (ICOEI). IEEE, 2019.
but also in many other applications. This manuscript has
Authorized licensed use limited to: Khon Kaen University provided by UniNet. Downloaded on September 29,2022 at 04:52:29 UTC from IEEE Xplore. Restrictions apply.