0% found this document useful (0 votes)
49 views47 pages

Bharathi Mini Project

Uploaded by

Swathi Rani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views47 pages

Bharathi Mini Project

Uploaded by

Swathi Rani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

FAKE NEWS DETECTION USING MACHINE LEARNING

Submitted by

BHARATHI K
(Register No:713522CA804)

PROJECT REPORT
Submitted to the
FACULTY OF INFORMATION AND COMMUNICATION ENGINEERING

In partial fulfillment for the award of the degree


Of

MASTER OF COMPUTER APPLICATIONS


in

SNS COLLEGE OF TECHNOLOGY, COIMBATORE-641 035


(AN AUTONOMOUS INSTITUTION)

DEPARTMENT OF COMPUTER APPLICATIONS

NOV/DEC 2023
BONAFIDE CERTIFICATE

It is certified that this project report “FAKE NEWS DETECTION USING MACHINE
LEARNING” is the bonafide work of Ms. BHARATHI who carried out of this project work
under my supervision. Certified further, that to the best of my knowledge the work reported
herein does not form part of any other project report or dissertation on the basis of which a
degree or award was conferred on an earlier occasion on this or any other candidate.

_______________________ ______________________
HEAD OF THE DEPARTMENT PROJECT GUIDE

Submitted for the Viva-Voice examination held at SNS COLLEGE OF TECHNOLOGY, held

on _______________

_____________________ ________________________

INTERNAL EXAMINER EXTERNAL EXAMINER


DECLARATION

I BHARATHI K declare that the project work entitled “Fake news detection using
machine learning” submitted to SNS COLLEGE OF TECHNOLOGY (AN
AUTONOMOUS INSTITUTION), Coimbatore of the requirements for the award of Degree
of Master of Computer Applications is a record of original work done by me under the
guidance of Dr.S.SUNDARARAJAN , Head of the Department of Master of Computer
Applications (PG), SNS COLLEGE OF TECHNOLOGY (AN AUTONOMOUS
INSTITUTION), Coimbatore.

Place: Coimbatore Signature of the Candidate


Date: BHARATHI K
ACKNOWLEDGEMENT

First of all, I solicit my humble thanks to god almighty for being with me, guiding me the
right way throughout my project work.

I express my heart-felt gratitude to the management of SNS College of Technology,


Coimbatore, Dr.S.N.SUBBRAMANIAN, Chairman, Dr.S.RAJALAKSHMI,
Correspondent, Dr.S.NALIN VIMAL KUMAR Technical Director and
Dr.V.S.VELUSWAMY, Founder Trustee for providing me with all sorts of support in
completion of this project.

I extend my sincere thanks to Dr.S.CHENTHUR PANDIAN, our Respected Principal,


SNS College of Technology, for the encouragement and facilities offered to complete this
project.

I express my heartfelt thanks to Dr. S. SUNDARARAJAN, Head of the Department of


Master of Applications/Guide, who paved the way for me to complete my project
successfully. His instruction and presence have guided me a lot for completing this project.

Finally, I thank our parents for their financial and moral support and also our friends who
helped us in completing of our project work successfully.
ABSTRACT

In recent years, due to the booming development of online social networks, fake news
for various commercial and political purposes has been appearing in large numbers and
widespread in the online world. With deceptive words, online social network users can get
infected by these online fake news easily, which has brought about tremendous effects on the
offline society already. An important goal in improving the trustworthiness of information in
online social networks is to identify the fake news timely. This paper aims at investigating the
principles, methodologies and algorithms for detecting fake news articles, creators and
subjects from online social networks and evaluating the corresponding performance.
Information preciseness on Internet, especially on social media, is an increasingly important
concern, but web-scale data hampers, ability to identify, evaluate and correct such data, or so
called "fake news," present in these platforms. In this paper, we propose a method for "fake
news" detection and ways to apply it on Facebook, one of the most popular online social
media platforms. This method uses Naive Bayes classification model to predict whether a post
on Facebook will be labeled as real or fake. The results may be improved by applying several
techniques that are discussed in the paper. Received results suggest, that fake news detection
problem can be addressed with machine learning methods.
TABLES OF CONTENTS

CHAPTER TITLE PAGE


NO NO

1. ABSTRACT
LIST OF FIGURES
INTRODUCTION 1
1.1 COMPANY PROFILE 2
1.3 ABOUT THE PROJECT 2
2. 2. MEDIA RICH FAKE NEWS DETECTION 3
2.1. WEEKLY SUPERVISED LEARNING FOR FAKE NEWS 3
2.2 FAKE NEWS DETECTION IN SOCIAL MEDIA 4
2.3 THE SPREA OF FAKE NEWS BY SOCIAL BOTS 5
2.4 MISLEADING ONLINE CONTENT 6
3. METHODOLOGY 8
3.1 EXISTING SYSTEM 8
3.2 PROPOSED SYSTEM 8
3.3 SOFTWARE ENVIRONMENT 10
3.3.1 PYTHON 10
3.3.2 PYTHON FEATURES 11
3.3.3 INTERACTIVE MODE PROGRAMMING 14
3.3.4 SCRIPT MODE PROGRAMMING 15
3.4 FLASK FRAMEWORK 15
3.5 MODULES 21
3.6 ALGORITHMS 23
4. RESULTS AND DISCUSSION 24
4.1 REQUIREMENTS ANALYSIS 24
4.2 FUNCTIONAL REQUIREMENTS 24
4.3 NON-FUNCTIONAL REQUIREMENTS 26
4.4 SYSTEM DESIGN AND TESTING PLAN 26
4.4.1 INPUT DESIGN 26
4.5 TEST PROCEDURE 29
4.5.1 SYSTEM TESTING 29
4.5.2 UNIT TESTING 30
4.5.3 INTEGRATING TESTING 30
4.5.4 FUNCTIONAL TESTING 30
4.5.5 WHITE BOX TESTING 31
4.5.6 ACCEPTANCE TESTING 31
5. CONCULSION AND FUTURE WORK 32
REFERENCES 33
APPENDIX 35
A. SOURCE CODE 35
B. SCREENSHOTS 37
LIST OF FIGURES

FIGURE NO TITLE PAGE NO


3.1 ARCHITECTURE DIAGRAM 9
3.2 LOGIN PAGE 19
3.3 DASHBOARD 20
4.1 DATAFLOW DIAGRAM 29
5.1 ADMIN 37
5.2 CHECKING STATEMENT WITH DATASET 38
5.3 DETECTING FAKE NEWS USING DATASET 38

5.4 ACCURACY LEVEL OF ALGORITHMS WITH 39


DATASETS
CHAPTER 1

INTRODUCTION

These days‟ fake news is creating different issues from sarcastic articles to a
fabricated news and plan government propaganda in some outlets. Fake news
and lack of trust in the media are growing problems with huge ramifications in
our society. Obviously, a purposely misleading story is “fake news “but lately
blathering social media’s discourse is changing its definition. Some of them now
use the term to dismiss the facts counter to their preferred viewpoints.

The importance of disinformation within American political discourse was the


subject of weighty attention, particularly following the American president
election. The term 'fake news' became common parlance for the issue,
particularly to describe factually incorrect and misleading articles published
mostly for the purpose of making money through page views. In this paper, it is
seeked to produce a model that can accurately predict the likelihood that a given
article is fake news. Facebook has been at the epicentre of much critique
following media attention. They have already implemented a feature to flag fake
news on the site when a user sees’ it; they have also said publicly they are
working on to distinguish these articles in an automated way. Certainly, it is not
an easy task. A given algorithm must be politically unbiased – since fake news
exists on both ends of the spectrum – and also give equal balance to legitimate
news sources on either end of the spectrum. In addition, the question of
legitimacy is a difficult one. However, in order to solve this problem, it is
necessary to have an understanding on what Fake News.

1
1.1 COMPANY PROFILE

 Mobility Intelligence Softwares – being a start-up, takes pride to achieve


your business goals with the highest degree of excellence. We are a team of expert
software consultants and talented programmers to deliver projects on time without
compromising in quality. We can fulfil your technology needs with our experienced
resources.
 Mobility Intelligence is an Indian-based Software company providing
exclusive projects in Web design & Development, Mobile Application Development,
Design Services and Software Development.

1.2 ABOUT THE PROJECT

With the advancement of technology, digital news is more widely exposed to users
globally and contributes to the increment of spreading and disinformation online. Fake news
can be found through popular platforms such as social media and the Internet. There have
been multiple solutions and efforts in the detection of fake news where it even works with
tools. However, fake news intends to convince the reader to believe false information which
deems these articles difficult to perceive. The rate of producing digital news is large and
quick, running daily at every second, thus it is challenging for machine learning to effectively
detect fake news

CHAPTER 2

2
MEDIA RICH FAKE NEWS DETECTION: A
SURVEY

In general, the goal is profiting through click baits. Click baits lure users and entice curiosity
with flashy headlines or designs to click links to increase advertisements revenues. This
exposition analyses the prevalence of fake news in light of the advances in communication
made possible by the emergence of social networking sites. The purpose of the work is to
come up with a solution that can be utilized by users to detect and filter out sites containing
false and misleading information. We use simple and carefully selected features of the title
and post to accurately identify fake posts. The experimental results show a 99.4% accuracy
using logistic classifier.

2.1. WEAKLY SUPERVISED LEARNING FOR FAKE NEWS


DETECTION ON TWITTER

The problem of automatic detection of fake news in social media, e.g., on Twitter, has
recently drawn some attention. Although, from a technical perspective, it can be regarded as a
straight-forward, binary classification problem, the major challenge is the collection of large
enough training corpora, since manual annotation of tweets as fake or non-fake news is an
expensive and tedious endeavour. In this paper, we discuss a weakly supervised approach,
which automatically collects a large-scale, but very noisy training dataset comprising
hundreds of thousands of tweets. During collection, we automatically label tweets by their
source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. We then
use that classifier for a different classification target, i.e., the classification of fake and
nonfaces tweets. Although the labels are not accurate according to the new classification
target (not all tweets by an untrustworthy source need to be fake news, and vice versa), we
show that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1
score of up to 0.9.

3
2.2 FAKE NEWS DETECTION IN SOCIAL MEDIA

Fake news and hoaxes have been there since before the advent of the Internet. The widely
accepted definition of Internet fake news is: fictitious articles deliberately fabricated to
deceive readers”. Social media and news outlets publish fake news to increase readership or
as part of psychological warfare. In general, the goal is profiting through click baits. Click
baits lure users and entice curiosity with flashy headlines or designs to click links to increase
advertisements revenues. This exposition analyses the prevalence of fake news in light of the
advances in communication made possible by the emergence of social networking sites. The
purpose of the work is to come up with a solution that can be utilized by users to detect and
filter out sites containing false and misleading information. We use simple and carefully
selected features of the title and post to accurately identify fake posts. The experimental
results show a 99.4% accuracy using logistic classifier.
Automatic Online Fake News Detection Combining Content and Social Signals

The proliferation and rapid diffusion of fake news on the Internet highlight the need of
automatic hoax detection systems. In the context of social networks, machine learning (ML)
methods can be used for this purpose. Fake news detection strategies are traditionally either
based on content analysis (i.e. analysing the content of the news) or - more recently - on
social context models, such as mapping the news‟ diffusion pattern. In this paper, we first
propose a novel ML fake news detection method which, by combining news content and
social context features, outperforms existing methods in the literature, increasing their already
high accuracy by up to 4.8%. Second, we implement our method within a Facebook
Messenger chatbot and validate it with a real-world application, obtaining a fake news
detection accuracy of
81.7%.

In recent years, the reliability of information on the Internet has emerged as a crucial issue
of modern society. Social network sites (SNSs) have revolutionized the way in which

4
information is spread by allowing users to freely share content. As a consequence, SNSs are
also increasingly used as vectors for the diffusion of misinformation and hoaxes. The amount
of disseminated information and the rapidity of its diffusion make it practically impossible to
assess reliability in a timely manner, highlighting the need for automatic hoax detection
systems. As a contribution towards this objective, we show that Facebook posts can be
classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who "liked"
them. We present two classification techniques, one based on logistic regression, the other on
a novel adaptation of Boolean crowdsourcing algorithms. On a dataset consisting of 15,500
Facebook posts and 909,236 users, we obtain classification accuracies exceeding 99% even
when the training set contains less than 1% of the posts. We further show that our techniques
are robust: they work even when we restrict our attention to the users who like both hoax and
non-hoax posts. These results suggest that mapping the diffusion pattern of information can
be a useful component of automatic hoax detection systems.

2.3 THE SPREAD OF FAKE NEWS BY SOCIAL BOTS

The massive spread of fake news has been identified as a major global risk and has been
alleged to influence elections and threaten democracies. Communication, cognitive, social,
and computer scientists are engaged in efforts to study the complex causes for the viral
diffusion of digital misinformation and to develop solutions, while search and social media
platforms are beginning to deploy countermeasures. However, to date, these efforts have been
mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14
million messages spreading 400 thousand claims on Twitter during and following the 2016
U.S. presidential campaign and election. We find evidence that social bots play a key role in
the spread of fake news. Accounts that actively spread misinformation are significantly more
likely to be bots. Automated accounts are particularly active in the early spreading phases of
viral claims, and tend to target influential users. Humans are vulnerable to this manipulation,
retweeting bots who post false news. Successful sources of false and biased claims are

5
heavily supported by social bots. These results suggests that curbing social bots may be an
effective strategy for mitigating the spread of online misinformation.

2.4 MISLEADING ONLINE CONTENT

Tabloid journalism is often criticized for its propensity for exaggeration,


sensationalization, scare-mongering, and otherwise producing misleading and low-quality
news. As the news has moved online, a new form of tabloidization has emerged: „click
baiting.‟ „Clickbait‟ refers to “content whose main purpose is to attract attention and
encourage visitors to click on a link to a particular web page” [„clickbait,‟ n.d.] and has been
implicated in the rapid spread of rumor and misinformation online. This paper examines
potential methods for the automatic detection of clickbait as a form of deception. Methods for
recognizing both textual and non-textual click baiting cues are surveyed, leading to the
suggestion that a hybrid approach may yield best results.

Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has
become important as many organizations both public and private have been collecting
massive amounts of domain-specific information, which can contain useful information about
problems such as national intelligence, cyber security, fraud detection, marketing, and
medical informatics. Companies such as Google and Microsoft are analysing large volumes of
data for business analysis and decisions, impacting existing and future technology. Deep
Learning algorithms extract high-level, complex abstractions as data representations through
a hierarchical learning process. Complex abstractions are learnt at a given level based on
relatively simpler abstractions formulated in the preceding level in the hierarchy. A key
benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised
data, making it a valuable tool for Big Data Analytics where raw data is largely unlabelled
and un-categorized. In the present study, we explore how Deep Learning can be utilized for
addressing some important problems in Big Data Analytics, including extracting complex
patterns from massive volumes of data, semantic indexing, data tagging, fast information

6
retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep

Learning research that need further exploration to incorporate specific challenges introduced

by Big Data Analytics, including streaming data, high-dimensional data, scalability of


models, and distributed computing. We conclude by presenting insights into relevant future
works by posing some questions, including defining data sampling criteria, domain adaptation
modelling, defining criteria for obtaining useful data abstractions, improving semantic
indexing, semi - supervised learning, and active learning.

7
CHAPTER 3
METHODOLOGY

3.1 EXISTING SYSTEM

There exists a large body of research on the topic of machine learning methods for deception
detection, most of it has been focusing on classifying online reviews and publicly available
social media posts. Particularly since late 2016 during the American Presidential election, the
question of determining 'fake news' has also been the subject of particular attention within the
literature. Conroy, Rubin, and Chen outline several approaches that seem promising towards
the aim of perfectly classify the misleading articles. They note that simple content-related n-
grams and shallow parts-of-speech tagging have proven insufficient for the classification task,
often failing to account for important context information. Rather, these methods have been
shown useful only in tandem with more complex methods of analysis. Deep Syntax analysis
using Probabilistic Context Free Grammars have been shown to be particularly valuable in
combination with n-gram methods. Feng, Banerjee, and Choi are able to achieve 85%-91%
accuracy in deception related classification tasks using online review corpora.

3.2 PROPOSED SYSTEM


In this paper a model is build based on the count vectorizer or a tiff matrix (i.e.) word
tallies relatives to how often they are used in other articles in your dataset) can help. Since
this problem is a kind of text classification, implementing a Naive Bayes classifier will be
best as this is standard for text-based processing. The actual goal is in developing a model
which was the text transformation (count vectorizer vs tiff vectorizer) and choosing which
type of text to use (headlines vs full text). Now the next step is to extract the most optimal
features for count vectorizer or tiff-vectorizer, this is done by using a n-number of the most

8
used words, and/or phrases, lower casing or not, mainly removing the stop words which are
common words such as “the”, “when”, and “there” and only using those words that appear at
least a given number of times in a given text dataset.
SYSTEM ARCHITECTURE :

Fig:3.1 Architecture diagram

9
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS:

 System - Pentium-IV
 Speed - 2.4GHZ
 Hard disk - 40GB
 Monitor - 15VGA color
 RAM - 512MB

SOFTWARE REQUIREMENTS:

 Operating System - Windows XP

 Coding language - PYTHON

3.3 SOFTWARE ENVIRONMENT

3.3.1 PYTHON

Python is a high-level, interpreted, interactive and object-oriented scripting language.


Python is designed to be highly readable. It uses English keywords frequently where as other
languages use punctuation, and it has fewer syntactical constructions than other languages.

10
 Python is Interpreted − Python is processed at runtime by the interpreter.

You do not need to compile your program before executing it. This is similar to PERL
and PHP.

 Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.

 Python is Object-Oriented − Python supports Object-Oriented style or technique of


programming that encapsulates code within objects.

 Python is a Beginner's Language − Python is a great language for the beginner-level


programmers and supports the development of a wide range of applications from
simple text processing to WWW browsers to games.

History of Python

Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-
68, Smalltalk, and Unix shell and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the GNU
General Public License (GPL).

Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.

3.3.2 PYTHON FEATURES

Python's features include −

 Easy-to-learn − Python has few keywords, simple structure, and a clearly

11
defined syntax. This allows the student to pick up the language quickly.

 Easy-to-read − Python code is more clearly defined and visible to the eyes.
 Easy-to-maintain − Python's source code is fairly easy-to-maintain.

 A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

 Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.

 Extendable − You can add low-level modules to the Python interpreter.

These modules enable programmers to add to or customize their tools to be more


efficient.

 Databases − Python provides interfaces to all major commercial databases.

 GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.

 Scalable − Python provides a better structure and support for large programs than
shell scripting.

Apart from the above-mentioned features, Python has a big list of good features, few are
listed below −

• It supports functional and structured programming methods as well as OOP.

12
• It can be used as a scripting language or can be compiled to byte-code for building
large applications.

• It provides very high-level dynamic data types and supports dynamic type checking.

• It supports automatic garbage collection.


• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

Python is available on a wide variety of platforms including Linux and Mac OS X.

Let's understand how to set up our Python environment.

Getting Python

The most up-to-date and current source code, binaries, documentation, news, etc., is
available on the official website of Python https://www.python.org.

Windows Installation

Here are the steps to install Python on Windows machine.

[1] Open a Web browser and go to https://www.python.org/downloads/.

[2] Follow the link for the Windows installer python-XYZ.msifile where XYZ is the
version you need to install.

[3] To use this installer python-XYZ.msi, the Windows system must support Microsoft
Installer 2.0. Save the installer file to your local machine and then run it to find out if
your machine supports MSI.

13
[4] Run the downloaded file. This brings up the Python install wizard, which is really easy
to use. Just accept the default settings, wait until the install is finished, and you are
done.

The Python language has many similarities to Perl, C, and Java. However, there are some
definite differences between the languages.

3.3.3 INTERACTIVE MODE PROGRAMMING


Invoking the interpreter without passing a script file as a parameter brings up the following
prompt −

$ python

Python2.4.3(#1,Nov112010,13:34:43)
[GCC 4.1.220080704(RedHat4.1.2-48)] on linux2

Type"help","copyright","credits"or"license"for more information.


>>>

Type the following text at the Python prompt and press the Enter −

>>>print “Hello, Python!"

If you are running new version of Python, then you would need to use print statement with
parenthesis as in print ("Hello, Python!"); However, in Python version 2.4.3, this produces
the following result −

Hello, Python!

14
3.3.4 SCRIPT MODE PROGRAMMING

Invoking the interpreter with a script parameter begins execution of the script and continues
until the script is finished. When the script is finished, the interpreter is no longer active.

Let us write a simple Python program in a script. Python files have extension .py.

Type the following source code in a test.py file −

print “Hello, Python!"

We assume that you have Python interpreter set in PATH variable. Now, try to run this
program as follows −

$ python test.py

This produces the following result −

Hello, Python!

3.4 FLASK FRAMEWORK

Flask is a web application framework written in Python. Armin Rencher, who leads an
international group of Python enthusiasts named Pocco, develops it. Flask is based on Herzeg
WSGI toolkit and Jinja2 template engine. Both are Pocco projects.

Http protocol is the foundation of data communication in world wide web. Different methods
of data retrieval from specified URL are defined in this protocol.

The following table summarizes different http methods −

15
Sr. No Methods & Description

1 GET

Sends data in unencrypted form to the server. Most common method.


2 HEAD

Same as GET, but without response body

3 POST

Used to send HTML form data to server. Data received by POST method is not
cached by server.

4 PUT

Replaces all current representations of the target resource with the uploaded content.

5 DELETE

Removes all current representations of the target resource given by a URL

By default, the Flask route responds to the GET requests. However, this preference can be
altered by providing methods argument to route () decorator.

In order to demonstrate the use of POST method in URL routing, first let us create an HTML
form and use the POST method to send form data to a URL.

Save the following script as login.html

<html>

<body>

16
<formation="http://localhost:5000/login “method="post">

<p>Enter Name:</p>

<p><input type="text “name="nm"/></p>

<p><input type="submit" value="submit"/></p>

</form>

</body>

</html>

Now enter the following script in Python shell.

from flask import Flask, redirect,url_for, request

17
app=Flask ( name )

@app. route('/success/<name>')

def success(name):

return ‘welcome %s'% name

@app. route ('/login’, methods=['POST','GET'])


def login
():

if request. method=='POST': user=request.

Form['nm'] return redirect (url_for ('success’,

name= user)) else:

user=request.args.get('nm')

return redirect (url_for ('success’, name=


user))

if name ==' main ':

app.run (debug =True)

After the development server starts running, open login.html in the browser, enter name in
the text field and click Submit

18
Fig:3.2 Login page

Form data is POSTed to the URL in action clause of form tag.

http://localhost/loginis mapped to the login () function. Since the server has received data by
POST method, value of „nm‟ parameter obtained from the form data is obtained by −

user = request.form['nm']

It is passed to „/success‟ URL as variable part. The browser displays a welcome message in
the window.

19
Fig:3.3 Dashboard

Change the method parameter to „GET‟ in login.html and open it again in the browser. The
data received on server is by the GET method. The value of „nm‟ parameter is now obtained
by −

User = request.args.get(„nm‟)

Here, args is dictionary object containing a list of pairs of form parameter and its
corresponding value. The value corresponding to „nm‟ parameter is passed on to
„/success‟ URL as before.

20
3.5 MODULES

A. Data Use
B. Preprocessing

C. Feature Extraction

D. Training the Classifier

MODULES DESCRIPTION
A. Data Use

So, in this project we are using different packages and to load and read the data set we are
using pandas. By using pandas, we can read the .csv file and then we can display the shape of
the dataset with that we can also display the dataset in the correct form. We will be training
and testing the data, when we use supervised learning, it means we are labelling the data. By
getting the testing and training data and labels we can perform different machine learning
algorithms but before performing the predictions and accuracies, the data is need to be
preprocessing i.e. the null values which are not readable are required to be removed from the
data set and the data is required to be converted into vectors by normalizing and tokening the
data so that it could be understood by the machine. Next step is by using this data, getting the
visual reports, which we will get by using the Mat Plot Library of Python and Sickit Learn.
This library helps us in getting the results in the form of histograms, pie charts or bar charts.

B. Preprocessing

The data set used is split into a training set and a testing set containing in Dataset 1
-3256 training data and 814 testing data and in Dataset II- 1882 training data and 471 testing
data respectively. Cleaning the data is always the first step. In this, those words are removed

21
from the dataset. That helps in mining the useful information. Whenever we collect data
online, it sometimes contains the undesirable characters like stop words, digits etc. which
creates hindrance while spam detection. It helps in removing the texts which are language
independent entities and integrate the logic which can improve the accuracy of the
identification task.

C. Feature Extraction
Feature extraction the process of selecting a subset of relevant features for use in model
construction. Feature extraction methods helps in to create an accurate predictive model. They
help in selecting features that will give better accuracy. When the input data to an algorithm is
too large to be handled and it’s supposed to be redundant then the input data will be
transformed into a reduced illustration set of features also named feature vectors. Altering the
input data to perform the desired task using this reduced representation instead of the full-size
input. Feature extraction is performed on raw data prior to applying any machine learning
algorithm, on the transformed data in feature space.

D. Training the Classifier

As In this project I am using Scikit-Learn Machine learning library for implementing the
architecture. Scikit Learn is an open-source python Machine Learning library which comes
bundled in 3rd distribution anaconda. This just needs importing the packages and you can
compile the command as soon as you write it. If the command doesn’t run, we can get the
error at the same time. I am using 4 different algorithms and I have trained these 4 models i.e.
Naïve Bayes, Support Vector Machine, K Nearest Neighbours and Logistic Regression wig
are very popular methods for document classification problem. Once the classifiers are traded,
we can c heck the performance of the models on test-set. We can extract the word count
vector for each mail in test-set and predict it class with the trained models.
3.6 Algorithms

22
Naive Bayes
• One of supervised learning algorithm based on probabilistic classification technique.
• It is a powerful and fast algorithm for predictive modelling.
• In this project, I have used the Multinomial Naive Bayes Classifier.

Support Vector Machine- SVM

• SVM‟s are a set of supervised learning methods used for classification, and
regression.
• Effective in high dimensional spaces.
• Uses a subset of training points in the support vector, so it is also memory efficient.

Logistic Regression

• Linear model for classification rather than regression.

• The expected values of the response variable are modelled based on combination of
values taken by the predictors.

23
CHAPTER 4
RESULTS AND DISCUSSION
• Algorithm’s accuracy depends on the type and size of your dataset. More the data,
more chances of getting correct accuracy.
• Machine learning depends on the variations and relations.
• Understanding what is predictable is as important as trying to predict it.
• While making algorithm choice, speed should be a consideration factor.

4.1 REQUIREMENT ANALYSIS


Requirement analysis, also called requirement engineering, is the process of determining
user expectations for a new modified product. It encompasses the tasks that determine the
need for analysing, documenting, validating and managing software or system requirements.
The requirements should be documentable, actionable, measurable, testable and traceable
related to identified business needs or opportunities and define to a level of detail, sufficient
for system design.

4.2 FUNCTIONAL REQUIREMENTS


It is a technical specification requirement for the software products. It is the first step in
the requirement analysis process which lists the requirements of particular software systems
including functional, performance and security requirements. The function of the system
depends mainly on the quality hardware used to run the software with given functionality.

Usability

It specifies how easy the system must be use. It is easy to ask queries in any format which
is short or long, porter stemming algorithm stimulates the desired response for user.

24
Robustness
It refers to a program that performs well not only under ordinary conditions but also
under unusual conditions. It is the ability of the user to cope with errors for irrelevant queries
during execution.

Security

The state of providing protected access to resource is security. The system provides
good security and unauthorized users cannot access the system there by providing high
security.

Reliability

It is the probability of how often the software fails. The measurement is often
expressed in MTBF (Mean Time Between Failures). The requirement is needed in order to
ensure that the processes work correctly and completely without being aborted. It can handle
any load and survive and survive and even capable of working around any failure.

Compatibility

It is supported by version above all web browsers. Using any web servers like
localhost makes the system real-time experience.

Flexibility

The flexibility of the project is provided in such a way that is has the ability to run on
different environments being executed by different users.

Safety

Safety is a measure taken to prevent trouble. Every query is processed in a secured manner
without letting others to know one’s personal information.

25
4.3 NON- FUNCTIONAL REQUIREMENTS

Portability

It is the usability of the same software in different environments. The project can be
run in any operating system.
Performance

These requirements determine the resources required, time interval, throughput and
everything that deals with the performance of the system.
Accuracy

The result of the requesting query is very accurate and high speed of retrieving
information. The degree of security provided by the system is high and effective.

Maintainability
Project is simple as further updates can be easily done without affecting its stability.
Maintainability basically defines that how easy it is to maintain the system. It means that how
easy it is to maintain the system, analyse, change and test the application. Maintainability of
this project is simple as further updates can be easily done without affecting its stability.

4.4 SYSTEM DESIGN AND TESTING PLAN

4.4.1 INPUT DESIGN

The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to
put transaction data in to a usable form for processing can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system. The design of input focuses on controlling the

26
amount of input required, controlling the errors, avoiding delay, avoiding extra steps and
keeping the process simple. The input is designed in such a way so that it provides security
and ease of use with retaining the privacy. Input Design considered the following things:
[12] What data should be given as input?

[13] How the data should be arranged or coded?

[14] The dialog to guide the operating personnel in providing input.

[15] Methods for preparing input validations and steps to follow when error occur.

OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and
direct source information to the user. Efficient and intelligent output design improves the
system’s relationship to help user decision-making.

The output form of an information system should accomplish one or more of the following
objectives.

• Convey information about past activities, current status or projections of the

• Future.

• Signal important events, opportunities, problems, or warnings.

• Trigger an action.

• Confirm an action.

27
SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened
by the system, instead must accept it as a necessity. The level of acceptance by the users
solely depends on the methods that are employed to educate the user about the system and to
make him familiar with it. His level of confidence must be raised so that he is also able to
make some constructive criticism, which is welcomed, as he is the final user of the system.

DATA FLOW DIAGRAM

• The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
• The data flow diagram (DFD) is one of the most important modelling tools. It is used
to model the system components. These components are the system process, the data
used by the process, an external entity that interacts with the system and the
information flows in the system.
• DFD shows how the information moves through the system and how it is modified by
a series of transformations. It is a graphical technique that depicts information flow
and the transformations that are applied as data moves from input to output.
• DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.

28
• It is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration.

Fig:4.1 Data Flow Diagram

4.5 TEST PROCEDURE

4.5.1 SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product It is the

29
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of tests. Each test type addresses a specific testing requirement.
4.5.2 UNIT TESTING
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at component level and test a specific business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process performs
accurately to the documented specifications and contains clearly defined inputs and expected
results.

4.5.3 INTEGRATION TESTING


Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing the
problems that arise from the combination of components.

4.5.4 FUNCTIONAL TESTING


Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals.
Functional testing is centred on the following items:

Valid Input: identified classes of valid input must be accepted.

30
Invalid Input: identified classes of invalid input must be rejected.

Function: identified functions must be exercised.

Output: identified classes of application outputs must be exercised.


Organization and preparation of functional tests is focused on requirements, key functions, or
special test cases. In addition, systematic coverage pertaining to identify Business process
flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.

4.5.5 WHITE BOX TESTING


White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose. It
is used to test areas that cannot be reached from a black box level. Black Box Testing is
testing the software without any knowledge of the inner workings, structure or language of
the module being tested. Black box tests, as most other kinds of tests, must be written from a
definitive source document, such as specification or requirements document, such as
specification or requirements document. It is a testing in which the software under test is
treated, as a black box. you cannot “see” into it. The test provides inputs and responds to
outputs without considering how the software works

4.5.6 ACCEPTANCE TESTING


User Acceptance Testing is a critical phase of any project and requires significant participation by the end
user. It also ensures that the system meets the functional requirements.

31
CHAPTER 5

CONCLUSION AND FUTURE WORK

Many people consume news from social media instead of traditional news media. However,
social media has also been used to spread fake news, which has negative impacts on
individual people and society. In this paper, an innovative model for fake news detection
using machine learning algorithms has been presented. This model takes news events as an
input and based on twitter reviews and classification algorithms it predicts the percentage of
news being fake or real.

The feasibility of the project is analysed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the
proposed system is not a burden to the company. For feasibility analysis, some understanding
of the major requirements for the system is essential. This study is carried out to check the
economic impact that the system will have on the organization. The amount of fund that the
company can pour into the research and development of the system is limited. The
expenditures must be justified. Thus, the developed system as well within the budget and this
was achieved because most of the technologies used are freely available. Only the customized
products had to be purchased.

32
REFERENCES

[1]. Parikh, S. B., & Atrey, P. K. (2018, April). Media-Rich Fake News Detection: A
Survey. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval
(MIPR) (pp. 436-441). IEEE.

[2]. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015, November). Automatic deception
detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual
Meeting: Information Science with Impact: Research in and for the Community (p.
82). American Society for Information Science.

[3]. Helmstetter, S., & Paulheim, H. (2018, August). Weakly supervised learning for fake
news detection on Twitter. In 2018 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining (ASONAM) (pp. 274-277). IEEE.
[4]. Stahl, K. (2018). Fake News Detection in social media.

[5]. Della Vedova, M. L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., & de Alfaro, L.
(2018, May). Automatic Online Fake News Detection Combining Content and Social Signals.
In 2018 22nd Conference of Open Innovations Association (FRUCT) (pp. 272-279). IEEE.

33
[6] Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S., & de Alfaro, L. (2017). Some
like it hoax: Automated fake news detection in social networks. arXiv preprint
arXiv:1704.07506.

[7]. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The spread
of fake news by social bots. arXiv preprint arXiv:1707.07592, 96-104.

[8]. Chen, Y., Conroy, N. J., & Rubin, V. L. (2015, November). Misleading online content:
Recognizing clickbait as false news. In Proceedings of the 2015 ACM on Workshop on
Multimodal Deception Detection (pp. 15-19). ACM.
[9]. Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., &
Muharemagic, E. (2015). Deep learning applications and challenges in big data analytics.
Journal of Big Data, 2(1), 1.

[10]. Haiden, L., & Althuis, J. (2018). The Definitional Challenges of Fake News.

34
APPENDIX
A) SOURCE CODE

from PyQt5 import QtCore, QtGui, QtWidgets


from Admin import Ui_Admin import pandas
as pd class Ui_Dialog(object):
def admin(self, event):
try:
self.admn = QtWidgets.QDialog()
self.ui = Ui_Admin(self.admn)
self.ui.setupUi(self.admn)
self.admn.show()
except Exception as e:
print(e.args[0]) tb =
sys.exc_info()[2]
print(tb.tb_lineno)
event.accept()

def setupUi(self, Dialog):


Dialog.setObjectName("Dialog")
Dialog.resize(702, 435)
Dialog.setStyleSheet("background-color: rgb(0, 85, 127);")
self.label = QtWidgets.QLabel(Dialog)
self.label.setGeometry(QtCore.QRect(60, 60, 601, 41))
self.label.setStyleSheet("color: rgb(255, 255, 255);\n"
"font: 75 18pt \"Tahoma\";")
self.label.setObjectName("label")

35
self.label_2 =
QtWidgets.QLabel(Dialog)
self.label_2.setGeometry(QtCore.QRect(200, 150, 261, 181))
self.label_2.setStyleSheet("image: url(../N-Grams/images/admin.png);")
self.label_2.setText("")
self.label_2.setObjectName("label_2")
self.label_2.mousePressEvent = self.admin

self.retranslateUi(Dialog)
QtCore.QMetaObject.connectSlotsByName(Dialog)

def retranslateUi(self, Dialog):


_translate = QtCore.QCoreApplication.translate
Dialog.setWindowTitle(_translate("Dialog", "Online Fake News"))
self.label.setText(_translate("Dialog", "Detection of Online Fake News Using N-
Gram Analysis"))

if _name_ == "_main_":
import sys
app =
QtWidgets.QApplication(sys.argv)
Dialog = QtWidgets.QDialog() ui =
Ui_Dialog() ui.setupUi(Dialog)
Dialog.show() sys.exit(app.exec_())

36
B) SCREENSHOTS

Fig:5.1 Admin Page

37
Fig: 5.2 Checking statement with Dataset

Fig:5.3 Detecting Fake News using Dataset

38
Fig:5.4 Accuracy level of Algorithms with dataset

39

You might also like