Project Synopsis Report Format
Project Synopsis Report Format
Project
Synopsis Report (KCS 753)
On
Title of Project
Under the Supervision
of
(supervisor's name with Designation)
Submitted by:
Student Name (Uni. Roll No)
Session: 2024-25
TABLE OF CONTENTS
1. Introduction
2. Background/ Relevant Work\Existing System\ Literature Review
3. Proposed work
4. Methodology/Experimental Work
5. Conclusion and Future Scope
1. INTRODUCTION
News has been the provider of information since centuries. In traditional times, there were news
agencies which were the source of news and hence, reliability and confidentiality remained with
the official organizations itself. In recent times, internet grew rapidly from rural to urban areas.
With the growth of internet, more users from all over the world got access to internet and to
spread the information in their way [1].
According to Economic Times report of 2019, there are 627 million internet users in India which
means India is home to world’s second largest internet user base [2]. However, with the
increasing popularity of social media, the internet becomes ideal breeding ground for fake news.
A research by BBC shows that nearly 72% Indians struggled to distinguish between fake and
real news [3]. Websites like The Onion[4], News Thump[5], The Poke News[6], and The Mash
News[7] are among the top rankers of ‘Fake’ or ‘misleading’ news propagator [8]. Hence, many
online fact checking resources like Snopes[9], FactCheck.org[10], Factmata.com[11],
PolitiFact.com[12] and many more grew rapidly. Social networking sites such as Facebook,
Whatsapp, and Google addressed this particular concern but the efforts hardly contributed in
solving the issue.
II. Detection approaches based on deep learning: The two most widely implemented
paradigms in modern artificial neural networks are Recurrent Neural Networks (RNN)
and Convolutional Neural Networks (CNN) [13].
2. BACKGROUND
There are many models for fact checking and detecting fake news. PolitiFact[12] - A fact-
checking website operated by Poynter Institute in St. Petersburg, Florida which uses Truth-O-
Meter to determine truthfulness of a statement/article/event/Image/video. But the fact checking
is limited to political news and hence fails to cover broad spectrum of news. According to a
survey paper, Facebook fake news sources can be encountered using BS Detector[15]. Another
fact checking website, Factmata[11] provides platform to get better understanding of the content
by providing scores content on nine signals, including Hate speech and Political bias, to give us
a deep understanding of credibility and safety of any content on web. Messenger for businesses
Flock has launched Fake news detector that aims to stop false and misleading information from
being introduced in their environment [16].
In India, fact check has recently been launched by India Today, Times of India, and AFP India
but these resources do not provide platform for users to check whether the news article they are
viewing is fake or real. AltNews [17] has been successful in India to provide platform for user to
clear their doubt, though it is yet to get more efficient and reliable.
Models like Fact Finder, only check whether the news is fake or real. On the other hand,
AltNews website or app works on fake news and publish viral fake news articles. Our model,
performs both work simultaneously.
3. PROPOSED WORK
In this paper a model is build based on pre-processing data with the use of NLTK library,
removing all the stopwords such as “the”, “is”, and “are” and only using those words which are
unique and provide us with relevant information. We also removed punctuations, numbers and
converted our dataset into lowercase letters. Also we have used Count Vectorizer or TF-IDF
matrix which tallies to how often the word in used in a given article in our dataset, Figure 2
depicts the process from collecting News Articles Dataset to using News Classification
Algorithm. Since the problem concerns with text classification and information extraction, we
have used Naïve Bayes classifier for text-based classification. For training and testing, we have
used Multinomial NB and Passive Aggressive Classifier with 33% training dataset. We will also
remove rare words occurring in our corpus with the help of Count Vectorizer [18-20].
The goal of the project is to make a website and app for user so that whenever he/she selects a
text, the app reflects with floating window and provides user with the percentage of fake and
real news of the selected text. The advantage with the app or website is that without opening or
uploading any content in the app, the app will detect fake news.
Figure 3: Methodology
Figure 4: The landing page for technology news articles and its corresponding HTML structure [23]
The specific HTML tags can also be used which contain the textual content [24]. Hence, with
the help of libraries such as BeautifulSoup and requests, useful content will be scraped.
Collected dataset contains 6335 rows and 4 columns; the head of the dataset has been depicted
in the following Figure 5:
With the help of TF-IDF vectorizer, word importance in a given article in the entire corpus is
determined. [25]
Figure 6: Dataset Visualization of Fake news and Real news using Seaborn
X-axis represents label(fake or real), y-axis represents Index
With the help of Multinomial NB and Passive Aggressive Classifier, 33% of the dataset was
trained and testing rest 67%. Using confusion matrix, highest accuracy model will be achieved.
[26]
• The number of True Positives is the number of news articles correctly classified as Fake
News;
• The number of False Positives is the number of news articles incorrectly classified as
Fake News;
• The number of True Negatives is the number of news articles correctly classified as True
News;
• The number of True Positives is the number of news articles incorrectly classified as True
News;
The precision of a classifier is calculated as follows:
where:
tp – number of true positive examples;
fp – number of false positive examples.
The recall of a classifier is calculated as follows:
As depicted in figure 7, confusion matrix helps in evaluating the quality of the output of a
classifier, in this case being, Multinomial NB and Passive Aggressive Classifier, on the fake or
real news dataset. Diagonal elements of the matrix represents number of points where predicted
label is equal to true label while off-diagonal matrix of the matrix represents number of points
where prediction of the model fails.
The figure shows the matrix without normalization. Here the results of the matrix changes as the
classification models or vectorizers are changed.
The precision of the model represents the relevant instances among the retrieved instances,
while recall is the fraction of total amount of relevant instances that were actually retrieved.
In future, VADER for sentiment analysis can be used which is more efficient algorithm and a
text classification model that provides us with highest accuracy. Also, existing Fake News
Detection models have worked for news and politics only, scope in Stock Markets, where shares
rise and fall very frequently, still persists.
REFERENCES
1. Kuriakose, Ammu, et al. "ALIKAH-A Clickbait and Fake News Detection System using Natural
Language Processing." 2019 3rd International Conference on Trends in Electronics and Informatics
(ICOEI). IEEE, 2019.
2. “India has second highest number of Internet users after China” - economictimes.com, 2019[Online].
Available : https://economictimes.indiatimes.com
3. “Ordinary Indians are fueling the country’s fake-news crisis” – qz.com, 2018[Online]. Available:
https://qz.com/india
8. “Top 50 Fake News Websites And Blogs on the Web in 2019” – blog.feedspot.com, 2019[Online].
Available: https://blog.feedspot.com/fake_news_blogs/
14. “Protecting the EU Elections From Misinformation and Expanding Our Fact-Checking Program to
New Languages” – aboutfb.com[Online]. Available: https://about.fb.com/news
15. "B.S. Detector - Browser extension to identify fake news sites", Bsdetector.tech, 2018. [Online].
Available: http://bsdetector.tech/.
16. “Messenger platform Flock launches feature to identify fake news”, economictimes.com, 2019 [Online].
Available: https://m.economictimes.com/small-biz
17. “Alt News”, altnews.com [Online]. Available: https://www.altnews.in/
18. N. J. Conroy, V. L. Rubin, and Y. Chen, “Automatic deception detection: Methods for finding fake
news,” Proceedings of the Association for Information Science and Technology, vol. 52, no. 1, pp. 1–4,
2015.
19. S. Feng, R. Banerjee, and Y. Choi, “Syntactic stylometry for deception detection,” in Proceedings of the
th
50 Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2,
Association for Computational Linguistics, 2012, pp. 171–175.
20. Shlok Gilda,Department of Computer Engineering, Evaluating Machine Learning Algorithms for Fake
News Detection,2017 IEEE 15th Student Conference on Research and Development (SCOReD)
21. “Kaggle”, kaggle.com [Online]. Available: https://kaggle.com
22. “inshorts - stay informed”, inshorts.com [Online]. Available: https://inshorts.com
23. “A Practitioner's Guide to Natural Language Processing (Part I) — Processing & Understanding Text”,
towardsdatascience.com, 2019 [Online]. Available: https://towardsdatascience.com
24. M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings using
compositional n-gram features,” arXiv preprint arXiv:1703.02507, 2017.
25. H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, Y. Choi, and P. G. Allen, “Truth of Varying Shades:
Analyzing Language in Fake News and Political Fact-Checking,” in Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2931–2937.
26. M. Balmas, “When Fake News Becomes Real: Combined Exposure to Multiple News Sources and
Political Attitudes of Inefficacy, Alienation, and Cynicism,” Communic. Res., vol. 41, no. 3, pp. 430–
454, 2014.
27. Naive Bayes classifier. (n.d.) Wikipedia. [Online]. Available:
https://en.wikipedia.org/wiki/Naive_Bayes_classifier. Accessed Feb. 6, 2017.
Remark:
Instructions for Formatting the Project Report:
1. All text fonts should be in Times New Roman.
2. The heading should be in font size = 14 with bold.
3. The text should be of Font Size=12.
3. There should be 1.5 line spacing between the Texts.
4. Figure & its caption should be center justified with font size 10.
5. The table and caption should be center justified with font size 10.
6. All the text should be Justified (Select text -> Ctrl +J) in the
synopsis.