0% found this document useful (0 votes)
189 views45 pages

Spammer Detect Project Document

This document discusses spammer detection and fake user identification on social networks. It provides an introduction to the topic, noting that spam detection is important for maintaining security on social networks. It then gives an overview of previous research conducted, and outlines the objectives and features of the presented study, which is to identify different approaches to spam detection on Twitter and classify them in a taxonomy. The study aims to be a useful resource for researchers on recent developments in this area.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views45 pages

Spammer Detect Project Document

This document discusses spammer detection and fake user identification on social networks. It provides an introduction to the topic, noting that spam detection is important for maintaining security on social networks. It then gives an overview of previous research conducted, and outlines the objectives and features of the presented study, which is to identify different approaches to spam detection on Twitter and classify them in a taxonomy. The study aims to be a useful resource for researchers on recent developments in this area.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

CHAPTER-1
INTRODUCTION

1.1 ABOUT THE PROJECT

Recently, the detection of spam in social networking sites attracted the attention of
researchers. Spam detection is a difficult task in maintaining the security of social networks. It is essential
to recognize spams in the OSN sites to save users from various kinds of malicious attacks and to preserve
their security and privacy. These hazardous manoeuvres adopted by spammers cause massive destruction
of the community in the real world. Twitter spammers have various objectives, such as spreading invalid
information, fake news, rumours, and spontaneous messages. Spammers achieve their malicious kind of
objectives through advertisements and several other means where they support different mailing lists and
subsequently dispatch spam messages randomly to broadcast their interests. These activities cause
disturbance to the original users who are known as non-spammers. In addition, it also decreases the
repute of the OSN platforms. Therefore, it is essential to design a scheme to spot spammers so that
corrective efforts can be taken to counter their malicious activities.

1.2 OVERVIEW

Several research works have been carried out in the domain of Twitter spam detection. To
encompass the existing state-of the-art, a few surveys have also been carried out on fake user identification
from Twitter. The above survey presents a comparative study of the current approaches. On the other hand,
the authors in [5] conducted a survey on different behaviours exhibited by spammers on Twitter social
network. The study also provides a literature review that recognizes the existence of spammers on Twitter
social network. Despite all the existing studies, there is still a gap in the existing literature. Therefore, to
bridge the gap, we review state-of-the-art in the spammer detection and fake user identification on Twitter.
Moreover, this survey presents a taxonomy of the Twitter spam detection approaches and attempts to offer
a detailed description of recent developments in the domain.

1.3 AIM

The aim of this paper is to identify different approaches of spam detection on Twitter and to
present a taxonomy by classifying these approaches into several categories. For classification, we have
identified four means of reporting spammers that can be helpful in identifying fake identities of users.
Spammers can be identified based on: (i) fake content, (ii) URL based spam detection, (iii) detecting spam in

BAPATLA WOMEN’S ENGINEERING COLLEGE 1


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

trending topics, and (iv) fake user identification. Table 1 provides a comparison of existing techniques and
helps users to recognize the significance and effectiveness of the proposed methodologies in addition to
providing a comparison of their goals and results. Table 2 compares different features that are used for
identifying spam on Twitter. We anticipate that this survey will help readers find diverse information on
spammer detection techniques at a single point
1.4 OBJECTIVES
we introduce SIGPID, a malware detection system based on permission usage analysis to cope
with the rapid increase in the number of Android malware. Instead of extracting and analyzing all Android
permissions, we develop 3-levels of pruning by mining the permission data to identify the most significant
permissions that can be effective in distinguishing between benign and malicious apps. SIGPID then utilizes
machine-learning based classification methods to classify different families of malware and benign apps.
Our evaluation finds that only 22 permissions are significant. We then compare the performance of our
approach, using only 22 permissions, against a baseline approach that analyzes all permissions. The results
indicate that when Support Vector Machine (SVM) is used as the classifier, we can achieve over 90% of
precision, recall, accuracy, and F-measure, which are about the same as those produced by the baseline
approach while incurring the analysis times that are 4 to 32 times less than those of using all permissions.
Compared against other state-of-the-art approaches, SIGPID is more effective by detecting 93.62% of
malware in the data set, and 91.4% unknown/new malware samples.

1.5 FEATURE
We are hopeful that the presented study will be a useful resource for researchers to find the
highlights of recent developments in Twitter spam detection on a single platform.

BAPATLA WOMEN’S ENGINEERING COLLEGE 2


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

CHAPTER-2
LITERATURE SURVEY

LITERATURE SURVEY

Literature survey is the most important step in software development process. Before developing the
tool it is necessary to determine the time factor, economy and company Traffic Redundancy Elimination,
once these things are satisfied, then next steps are to determine which operating system and language can be
used for developing the tool. Once the programmers start building the tool the programmers need lot of
external support.

This support can be obtained from senior programmers, from book or from websites. Before building
the system we have to knownthe below concepts for developing the proposed system.C.Chen et.al has
proposed Statistical structures built constant identification of drifted Twitter spam-Twitter spam has become
a major topic now a days. Late works centered on relating AI methods for Twitter spam location which
utilize the measurable features of tweets. Here tweets act as a data index, be that as it may, we see that the
factual belongings of spam tweets vary by certain period, and in this way, the presentation of prevailing AI
built classifiers reduces. This problem is alluded to as "Twitter Spam Drift". In order to switch this dispute,
we first do a deep investigation on the measurable features for more than one million spam and non-spam
tweets. At this point we suggest a new Fun conspire. The projected plan is changing spam tweets since
unlabelled tweets and consolidates them into classifier's preparation procedure. Numerous tests are made to
measure the projected plan. The results show the present Fun plan can altogether improve the spam
discovery exactness in genuine world scenarios.[9]
C. Buntain and J. Golbeck has proposed Automatically recognizing phony news in prevalent Twitter
strings Information quality in online life is an undeniably significant issue, however web-scale information
impedes specialists' capacity to evaluate and address a significant part of the incorrect substance, or "phony
news," current stages in this paper builds up a technique for computerizing counterfeit news location on
Twitter by figuring out how to foresee precision evaluations in two validity cantered Twitter datasets:
CREDBANK, which supports the exactness for instance in Twitter a publicly supported dataset of exactness
appraisals for occasions in Twitter, and PHEME, which contains a set of rumours and no rumours, We use
this to Twitter set content taken from BuzzFeed's fake news dataset and models arranged against freely
reinforced experts beat models reliant on journalists' assessment and models arranged on a pooled dataset of
both openly upheld workers and authors. All of the three datasets, balanced into a uniform group, is
additionally openly accessible. An element examination at that point recognizes features that are generally

BAPATLA WOMEN’S ENGINEERING COLLEGE 3


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

prescient for publicly supported and journalistic pre casino evaluations, consequences which can be related
with previous results.[10]

C. Chen et.al has performed A performance evaluation of machine learning based streaming spam
tweets detection-the popularity of twitter Twitter pulls in an ever-increasing number of spammers.
Spammers send undesirable tweets to Twitter clients to advance sites or administrations, here destructive to
typical clients. So as to stop spammers, scientists have proposed various components. The focal point of late
workings is based on utilization of AI methods into Twitter spam location. In any case, tweets are recovered
in a gushing way, and Twitter gives the Issuing API to designers and analysts to get to open tweets
continuously. There come up short on a presentation valuation of present AI created gushing spam
recognition techniques. Here we crossed over any barrier via doing a presentation valuation that is since 3
distinctive shares of data, features, and ideal. For constant spam location, here extricated 12 lightweight
features for tweet portrayal. Spam location was then changed to a double arrangement issue in the
component space and can be explained by regular AI calculations. We assessed the effect of various
components to the spam recognition execution that included non-spam to spam proportion, highlight
discretization preparing data size, time related data, data testing, and AI calculations. The outcomes show the
spilling spam tweet discovery is as yet a major test and a strong location system should consider the three
parts of information, include, and model.[11]

F. Fathaliani and M. Bouguessa has proposed A modelbased methodology for recognizing spammers
in interpersonal o organizations in this paper, we see the errand of distinguishing spammers in informal
communities from a blend displaying viewpoint, in view of which we devise a principled unaided way to
deal with identify spammers. In our methodology, we initially speak to every client of the informal
community with an element vector that mirrors its conduct and connections with different members. The
proposed methodology can naturally segregate among spammers and genuine clients, while existing solo
approaches require human intercession so as to set casual edge parameters to distinguish spammers. Besides,
our methodology is general as in it very well may be applied to various online social destinations. To exhibit
the appropriateness of the proposed technique, we led probes genuine information extricated from Instagram
and Twitter.[15]

2.1 EXISTING SYSTEM

Twitter has rapidly become an online source for acquiring real-time information about users. When a
user tweets something, it is instantly conveyed to his/her followers, allowing them to outspread the received
information at a much broader level. With the evolution of OSNs, the need to study and analyse users'
behaviours in online social platforms has intensive. Many people who do not have much information

BAPATLA WOMEN’S ENGINEERING COLLEGE 4


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

regarding the OSNs can easily be tricked by the fraudsters. There is also a demand to combat and place a
control on the people who use OSNs only for advertisements and thus spam other people.

2.2 DISADVANTAGES OF EXISTING SYSTEM

• In the existing system no accurate spam detection system that why lot of spam account could not be
identified in this way lots of carpeted data was coming in to the social network

2.3 PROPOSED SYSTEM

In this paper, we perform a review of techniques used for detecting spammers on Twitter.
Moreover, taxonomy of the Twitter spam detection approaches is presented that classifies the techniques
based on their ability to detect: (i) fake content, (ii) spam based on URL, (iii) spam in trending topics, and
(iv) fake users. The presented techniques are also compared based on various features, such as user
features, content features, graph features, structure features, and time features.

2.4 ADVANTAGES OF PROPOSED SYSTEM

 We are hopeful that the presented study will be a useful resource for researchers to find the highlights
of recent developments in Twitter spam detection on a single platform.
.

BAPATLA WOMEN’S ENGINEERING COLLEGE 5


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

CHAPTER-3
SYSTEM ANALYSIS
3.1 FUNCTIONAL REQUIREMENT:
Functional Requirement defines a function of a software system and how the system must behave
when presented with specific inputs/or conditions. These may include calculations, data manipulation and
processing and other specific functionality. In these systems following are the functional requirements.

The application should not display in-appropriate message for valid conditions. The application must not
stop working when kept running for even a long time. The application should process information for any
kind of input case. The application should generate the output for a given input test case .

3.2 NON-FUNCTIONAL REQUIREMENT:


Non-functional requirements are the requirements which are not directly concerned with the specific
function delivered by the system. They specify the criteria that can be used to judge the operation of a
system rather than specific behaviours. Given below are the non-functional requirements:

• Product requirements
• Basic operational requirements
• Organizational requirements

3.3 SYSTEM REQUIREMENTS


3.3.1. SOFTWARE REQUIREMENTS
 Operating System : Windows Family
 Front End : Python

3.3.2. HARDWARE REQUIREMENTS


 Processor - Intel Core2 Duo
 Speed - 2.4 GHz
 RAM - 2 GB (minimum)
 Hard Disk - 180 GB

BAPATLA WOMEN’S ENGINEERING COLLEGE 6


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

3.4 SYSTEM STUDY


The possibility of the project is analysed during this part and business proposal is place forth with an
awfully general arrange for the project and a few value estimates. throughout system analysis the
FEASIBILITY study of the projected system is to be distributed. this can be to make sure that the projected
system isn't a burden to the corporate. For risk analysis, some understanding of the key needs for the system
is important.
Types of feasibility
Three key issues concerned within the FEASIBILITY analysis area unit
 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY: This study is distributed to visualize the economic impact that the
system can wear the society. the quantity of fund that the corporate will pour into the analysis and
development of the system is restricted. The expenditures should be even.so the developed system
additionally inside the budget and this was achieved as a result of most of the technologies used area unit
freely out there. solely the bespoken product had to be purchased.

TECHNICAL FEASIBILITY: This study is distributed to visualize the technical risk, that is, the technical
needs of the system. Any system developed should not have a high demand on the out Their technical
resources. this can result in high demands on the out Their technical resources. this can result in high
demands being placed on the shopper. The developed system should have a modest demand, as solely
borderline or null changes area unit needed for implementing this technique.

SOCIAL FEASIBILITY: The facet of study is to visualize the amount of acceptance of the system by the
user. This includes the method of coaching the user to use the system expeditiously. The user should not feel
vulnerable by the system, instead should settle for it as a necessity. the amount of acceptance by the users
entirely depends on the ways that area unit used to teach the user regarding the system and to form him well-
known with it. His level of confidence should be raised so he's additionally ready to build some helpful
criticism, that is welcome, as he's the ultimate users

BAPATLA WOMEN’S ENGINEERING COLLEGE 7


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

CHAPTER-4
DESIGN ANALYSIS
4.1 ARCHITECTURE OF SYSTEM

Fig 4.1: System Model

4.2 MODULE DESCRIPTION


Description of 4 techniques to detect tweet is spam or normal.
The presented techniques are also compared based on various features, such as user features (retweets,
tweets, followers etc.), content features (tweet content messages).

1) Fake Content: If the number of followers is low in comparison with the number of followings, the credibility
of an account is low and the possibility that the account is spam is relatively high. Likewise, feature based on
content includes tweets reputation, HTTP links, mentions and replies, and trending topics. For the time
feature, if many tweets are sent by a user account in a certain time interval, then it is a spam account.
2) Spam URL Detection: The user-based features are identified through various objects such as account age
and number of user favorites, lists, and tweets. The identified user-based features are parsed from the JSON
structure. On the other hand, the tweet-based features include the number of (i) retweets, (ii) hashtags, (iii)
user mentions, and (iv) URLs. Using machine learning algorithm called Naïve Bayes we will check whether
tweets contains spam URL or not.

BAPATLA WOMEN’S ENGINEERING COLLEGE 8


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

3) Detecting Spam in Trending Topic: In this technique tweets content will be classified using Naïve Bayes
algorithm to check whether tweet contains spam or non-spam words. This algorithm will check for spam
URL, adult content words and duplicate tweets. If Naïve Bayes detect tweet as SPAM, then it will return 1
and if not detected any SPAM content, then Naïve Bayes will return 0.
4) Fake User Identification: These attributes include the number of followers and following, account age etc.
Alternatively, content features are linked to the tweets that are posted by users as spam bots that post a huge
number of duplicate contents as contrast to non-spammers who do not post duplicate tweets. In this
technique features (following, followers, tweet contents to detect spam or non-spam content using Naïve
Bayes Algorithm) will be extracted from tweets and then classify those features with Naïve Bayes Algorithm
as spam or non-spam. Later this feature will be train with random forest algorithm to determine account is
fake or non-fake. All extracted features will be saved inside features.txt file. Naïve Bayes classifier saved
inside ‘model’ folder.
Using above techniques, we can detect whether tweets contain normal message or spam message. By
detecting and removing such spam messages help social networks in gaining good reputation in the
market. If social networks did not remove spam messages, then its popularity will be decreases. Now a
days all users are heavily dependent on social networks to get current news and business and relatives’
information and thus protecting it from spammer help it to gain reputation

4.3 ALGORITHMS

NAÏVE BAYES

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to
problem instances, represented as vectors of feature values, where the class labels are drawn from some
finite set. There is not a single algorithm for training such classifiers, but a family of algorithms based on a
common principle: all naive Bayes classifiers assume that the value of a particular feature is independent of
the value of any other feature, given the class variable.

In many practical applications, parameter estimation for naive Bayes models uses the method of maximum
likelihood; in other words, one can work with the naive Bayes model without accept or using any Bayesian
methods.

An advantage of naive Bayes is that it only requires a small number of training data to estimate the
parameters necessary for classification

BAPATLA WOMEN’S ENGINEERING COLLEGE 9


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

RANDOM FOREST

Random forest is a supervised machine learning algorithm that is used widely in classification and
regression models. It builds decision trees on different samples and takes their majority vote for
classification and average in case of regression. One of the most important features of the Random Forest
Algorithm is that it can handle the data set containing continuous variables as in the case of regression
and categorical variables as in the case of classification. It performs better results for classification problems .

4.4 UML DIAGRAMS


UML remains for Unified Modeling Language. UML is an institutionalized broadly useful displaying
dialect in the field of protest situated programming designing. The standard is overseen, and was made by
the Object Management Group. The objective is for UML to end up a typical dialect for making models of
protest arranged PC programming. In its present shape UML is contained two noteworthy segments: a Meta-
display and a documentation. Later on, some type of technique or process may likewise be added to; or
connected with UML.

• The Unified Modeling Language is a standard dialect for indicating, Visualization,


Constructing and recording the antiques of programming framework, and additionally for business
displaying and other non-programming frameworks.

• The UML speaks to an accumulation of best building rehearses that have demonstrated
effective in the displaying of vast and complex frameworks.

• The UML is an imperative piece of creating articles arranged programming and the product
improvement prepare. The UML utilizes for the most part graphical documentations to express the
plan of programming tasks.

GOALS
The Primary objectives in the plan of the UML are as per the following:
1.Provide clients a prepared to-utilize, expressive visual displaying Language with the goal.

BAPATLA WOMEN’S ENGINEERING COLLEGE 10


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

2. Provide extendibility and specialization instruments to develop the center ideas.


3. Be free of specific programming dialects and improvement handle.
4. Provide a formal reason for comprehension the displaying dialect.
5.Encourage the development of OO devices showcase.
6.Support more elevated amount improvement ideas, for example, coordinated efforts, systems.
4.4.1 USE CASE DIAGRAM
A use case diagram within the unified modeling language (UML) may be a kind of activity diagram
outlined by and created from a use-case analysis. Its purpose is to gift a graphical summary of the
practicality provided by a system in terms of actors, their goals (represented as use cases), and any
dependencies between those use cases. the most purpose of a use case diagram is to indicate what system
functions area unit performed that actor. Roles of the actors within the system is represented.

Fig 4.4.1: Use case diagram


4.4.2 CLASS DIAGRAM
In computer code engineering, a category diagram within the Unified Modelling Language (UML)
may be a kind of static structure diagram that describes the structure of a system by showing the system's
categories, their attributes, operations (or methods), and also the relationships among the categories. It
explains that category contains data.

BAPATLA WOMEN’S ENGINEERING COLLEGE 11


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Fig 4.4.2: Class Diagram

4.4.3. SEQUENCE DIAGRAM


A sequence diagram in Unified Modelling Language (UML) may be a quite interaction diagram that
shows however processes operate with each other and in what order. it's a construct of
MessageSequenceChart.
Sequence diagrams are generally known as event diagrams, event situations, and temporal order diagram

Fig4.4.3: Sequence Diagram

BAPATLA WOMEN’S ENGINEERING COLLEGE 12


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

CHAPTER-5
SYSTEM IMPLEMENTATION
5.1 SYSTEM MODEL
Here collect 89 queries issued by the subjects, and name them as “UserQ”. As this approach might
induce a bias towards topics in which lists are more useful than general web queries, we further randomly
sample another set of 105 English queries from a query log of a commercial search engine, and name this set
of queries as “RandQ”. We first ask a subject to manually create facets and add items that are covered by the
query, based on his/her knowledge after a deep survey on any related resources (such as Wikipedia,
Freebase, or official web sites related to the query).
PYTHON
Python is a broadly useful deciphered, intelligent, object-situated, and significant level programming
language. A deciphered language, Python has a structure theory that accentuates code lucidness
(outstandingly utilizing whitespace space to delimit code squares as opposed to wavy sections or
catchphrases), and a punctuation that permits software engineers to communicate ideas in less lines of code
than may be utilized in dialects, for example, C++or Java. It gives builds that empower clear programming
on both little and enormous scopes. Python mediators are accessible for some working frameworks.
CPython, the reference usage of Python, is open-source programming and has a network-based advancement
model, as do about the entirety of its variation executions.

C Python is overseen by the non-benefit Python Software Foundation. Python includes a powerful
sort framework and programmed memory the executives. It bolsters numerous programming ideal models,
including object oriented, basic, utilitarian and procedural, and has an enormous and exhaustive standard
library.

5.2 SOFTWARE ENVIRONMENT

Intuitive Mode Programming

Summoning the mediator without passing a content document as a parameter raises the accompanying brief

$ python

Python 2.4.3 (#1, Nov 11 2010, 13:34:43)

[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2

BAPATLA WOMEN’S ENGINEERING COLLEGE 13


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Type "help", "copyright", "credits" or "permit" for more data.

>>>

Type the accompanying content at the Python brief and press the Enter −

>>> print "Hi, Python!"

In the event that you are running new form of Python, at that point you would need to use print decree with
walled in area as in print ("Hello, Python!"); In any case in Python version 2.4.3, this conveys the going with
result −

Hey, Python!

Content Mode Programming

Calling the arbiter with a substance parameter begins execution of the substance and continues until the
substance is finished. Exactly when the substance is done, the middle person is rarely again unique.

Python Identifiers

A Python identifier is a name used to perceive a variable, work, class, module or other article. An identifier
starts with a letter beginning to end or from beginning to end or an underscore (_) trailed by at any rate zero
letters, underscores and digits (0 to 9).

Python doesn't allow highlight characters, for instance, @, $, and % inside identifiers. Python is a case tricky
programming language. In this way, Manpower and work are two interesting identifiers in Python.

Here are naming shows for Python identifiers −

Class names start with an uppercase letter. Each and every other identifier start with a lowercase letter.

Starting an identifier with a single driving underscore shows that the identifier is private.

Starting an identifier with two driving underscores shows a vehemently private identifier.

If the identifier moreover gets done with two trailing underscores, the identifier is a language-described
exceptional name.

Spared Words

The going with once-over shows the Python catchphrases. These are held words and you can't use them as
predictable or variable or some other identifier names. All the Python catchphrases contain lowercase letters
figuratively speaking.

BAPATLA WOMEN’S ENGINEERING COLLEGE 14


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Live Demo

#! /user/holder/python

print "Howdy, Python!"

We expect that you have Python interpreter available in/user/canister record. By and by, endeavour to run
this program as follows −

$ chmod +x test.py # This is to make record executable

$./test.py

This makes the going with result −

Hello, Python!

Python Identifiers

A Python identifier is a name used to perceive a variable, work, class, module or other article. An identifier
starts with a letter beginning to end or from beginning to end or an underscore (_) trailed by at any rate zero
letters, underscores and digits (0 to 9).

Python doesn't allow highlight characters, for instance, @, $, and % inside identifiers. Python is a case tricky
programming language. In this way, Manpower and work are two interesting identifiers in Python.

Here are naming shows for Python identifiers −

Class names start with an uppercase letter. Each and every other identifier start with a lowercase letter.

Starting an identifier with a single driving underscore shows that the identifier is private.

Starting an identifier with two driving underscores shows a vehemently private identifier.

If the identifier moreover gets done with two trailing underscores, the identifier is a language-described
exceptional name.

Spared Words

The going with once-over shows the Python catchphrases. These are held words and you can't use them as
predictable or variable or some other identifier names. All the Python catchphrases contain lowercase letters
figuratively speaking.

elif in while else is with

BAPATLA WOMEN’S ENGINEERING COLLEGE 15


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

except lambdayield

Lines and Indentation

Python provides no braces to indicate blocks of code for class and function definitions or flow control.
Blocks of code are denoted by line indentation, which is rigidly enforced.

The number of spaces in the indentation is variable, but all statements within the block must be indented the
same amount. For example −

if True:

print "True"

else:

print "False"

However, the following block generates an error −

if True:

print "Answer"

print "True"

else:

print "Answer"

print "False"

Thus, in Python all the continuous lines indented with same number of spaces would form a block. The
following example has various statement blocks −

Note − Do not try to understand the logic at this point of time. Just make sure you understood various blocks
even if they are without braces.

Statements contained within the [], {}, or () brackets do not need to use the line continuation character. For
example −

days = ['Monday', 'Tuesday', 'Wednesday',

BAPATLA WOMEN’S ENGINEERING COLLEGE 16


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

'Thursday', 'Friday']

Quotation in Python

Python accepts single ('), double (") and triple (''' or """) quotes to denote string literals, as long as the same
type of quote starts and ends the string.

The triple quotes are used to span the string across multiple lines. For example, all the following are legal −

word = 'word'

sentence = "This is a sentence."

paragraph = """This is a paragraph. It is

made up of multiple lines and sentences."""

Comments in Python

A hash sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the
end of the physical line are part of the comment and the Python interpreter ignores them.

Live Demo

#!/usr/bin/python

# First comment

print "Hello, Python!" # second comment

This produces the following result −

Hello, Python!

You can type a comment on the same line after a statement or expression −

name = "Madisetti" # This is again comment

You can comment multiple lines as follows −

# This is a comment.

# This is a comment, too.

# This is a comment, too.

# I said that already.

BAPATLA WOMEN’S ENGINEERING COLLEGE 17


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Following triple-quoted string is also ignored by Python interpreter and can be used as a multiline
comments:

'''

PYTHON DEVELOPMENT STEPS

Guido Van Rossum published the first version of Python code (version 0.9.0) at alt. sources in
February 1991. This release included already exception handling, functions, and the core data types of lists,
dict, str and
Python version 1.0 was released in January 1994. The major new features included in this release were the
functional programming tools lambda, map, filter and reduce, which Guido Van Rossum never liked. Six
and a half years later in October 2000, Python 2.0 was introduced. This release included list
comprehensions, a full garbage collector and it was supporting Unicode. Python flourished for another 8
years in the versions 2.x before the next major release as Python 3.0 (also known as "Python 3000" and
"Py3K") was released. Python 3 is not backwards compatible with Python 2.x. The emphasis in Python 3
had been on the removal of duplicate programming constructs and modules, thus fulfilling or coming close
to fulfilling the 13th law of the Zen of Python: "There should be one -- and preferably only one -- obvious
way to do it."Some changes in Python 7.3:

 Print is now a function


 Views and iterators instead of lists
 The rules for ordering comparisons have been simplified. E.g., a heterogeneous list cannot be sorted,
because all the elements of a list must be comparable to each other.
 There is only one integer type left, i.e., int. long is int as well.
 The division of two integers returns a float instead of an integer. "//" can be used to have the "old"
behaviour.
 Text Vs. Data Instead of Unicode Vs. 8-bit

PURPOSE

We demonstrated that our approach enables successful segmentation of intra-retinal layers—even


with low-quality images containing speckle noise, low contrast, and different intensity ranges throughout—
with the assistance of the ANIS feature.

PYTHON

BAPATLA WOMEN’S ENGINEERING COLLEGE 18


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Python is an interpreted high-level programming language for general-purpose programming.


Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes
code readability, notably using significant whitespace.

Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and has a large
and comprehensive standard library.

 Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to compile your
program before executing it. This is similar to PERL and PHP.
 Python is Interactive − you can actually sit at a Python prompt and interact with the interpreter directly to
write your programs.
Python also acknowledges that speed of development is important. Readable and terse code is part of this,
and so is access to powerful constructs that avoid tedious repetition of code. Maintainability also ties into
this may be an all but useless metric, but it does say something about how much code you have to scan, read
and/or understand to troubleshoot problems or tweak behaviours. This speed of development, the ease with
which a programmer of other languages can pick up basic Python skills and the huge standard library is key
to another area where Python excels. All its tools have been quick to implement, saved a lot of time, and
several of them have later been patched and updated by people with no Python background - without
breaking.

5.3 TECHNOLOGIES USED IN PROJECT: -

TENSORFLOW

TensorFlow is a free and open-source software library for dataflow and differentiable programming
across a range of tasks. It is a symbolic math library, and is also used for machine learning applications
such as neural networks. It is used for both research and production at google.‍

TensorFlow was developed by the google team team for internal Google use. It was released under
the Apache 2.0 open -source license on November 9, 2015.

5.3.1.NUMPY

NumPy is a general-purpose array-processing package. It provides a high-performance


multidimensional array object, and tools for working with these arrays.

BAPATLA WOMEN’S ENGINEERING COLLEGE 19


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

It is the fundamental package for scientific computing with Python. It contains various features including
these important ones:

 A powerful N-dimensional array object


 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of
generic data. Arbitrary data-types can be defined using NumPy which allows NumPy to seamlessly and
speedily integrate with a wide variety of databases.

5.3.2 PANDAS
Pandas is an open-source Python Library providing high-performance data manipulation and analysis
tool using its powerful data structures. Python was majorly used for data munging and preparation. It had
very little contribution towards data analysis. Pandas solved this problem. Using Pandas, we can accomplish
five typical steps in the processing and analysis of data, regardless of the origin of data load, prepare,
manipulate, model, and analyze. Python with Pandas is used in a wide range of fields including academic
and commercial domains including finance, economics, Statistics, analytics, etc.

5.3.3 MATPLOTLIB

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of
hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts,
the Python and IPython shells, the jupyter Notebook, web application servers, and four graphical user
interface toolkits. Matplotlib tries to make easy things easy and hard things possible. You can generate plots,
histograms, power spectra, bar charts, error charts, scatter plots, etc., with just a few lines of code. For
examples, see the sample plots and thumbnail gallery.

For simple plotting the pyplot module provides a MATLAB-like interface, particularly when combined with
IPython. For the power user, you have full control of line styles, font properties, axes properties, etc, via an
object-oriented interface or via a set of functions familiar to MATLAB users.

5.3.4 SCIKIT – LEARN

Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent
interface in Python. It is licensed under a permissive simplified BSD license and is distributed under many
Linux distributions, encouraging academic and commercial use. Python

BAPATLA WOMEN’S ENGINEERING COLLEGE 20


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Python is an interpreted high-level programming language for general-purpose programming. Created by


Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code
readability, notably using significant whitespace.

Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and has a large
and comprehensive standard library.

 Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to compile your
program before executing it. This is similar to PERL and PHP.
 Python is Interactive − you can actually sit at a Python prompt and interact with the interpreter directly to
write your programs.
Python also acknowledges that speed of development is important. Readable and terse code is part of this,
and so is access to powerful constructs that avoid tedious repetition of code. Maintainability also ties into
this may be an all but useless metric, but it does say something about how much code you have to scan, read
and/or understand to troubleshoot problems or tweak behaviours. This speed of development, the ease with
which a programmer of other languages can pick up basic Python skills and the huge standard library is key
to another area where Python excels. All its tools have been quick to implement, saved a lot of time, and
several of them have later been patched and updated by people with no Python background - without
breaking.

5.4 INSTALL PYTHON STEP-BY-STEP IN WINDOWS AND MAC:

Python a versatile programming language doesn’t come pre-installed on your computer devices.
Python was first released in the year 1991 and until today it is a very popular high-level programming
language. Its style philosophy emphasizes code readability with its notable use of great whitespace.
The object-oriented approach and language construct provided by Python enables programmers to write
both clear and logical code for projects. This software does not come pre-packaged with Windows.

How to Install Python on Windows and Mac:

There have been several updates in the Python version over the years. The question is how to install
Python? It might be confusing for the beginner who is willing to start learning Python but this tutorial will solve
your query. The latest or the newest version of Python is version 3.7.4 or in other words, it is Python 3.
Note: The python version 3.7.4 cannot be used on Windows XP or earlier devices.

BAPATLA WOMEN’S ENGINEERING COLLEGE 21


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Before you start with the installation process of Python. First, you need to know about your System
Requirements. Based on your system type i.e. operating system and based processor, you must download the
python version. My system type is a Windows 64-bit operating system. So, the steps below are to install
python version 3.7.4 on Windows 7 device or to install Python 3.download the python cheatsheethere The
steps on how to install Python on Windows 10, 8 and 7 are divided into 4 parts to help understand better.

Download the Correct version into the system

Step 1: Go to the official site to download and install python using Google Chrome or any other web browser.
OR Click on the following link: https://www.python.org

Now, check for the latest and the correct version for your operating system.

Step 2: Click on the Download Tab

BAPATLA WOMEN’S ENGINEERING COLLEGE 22


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Step 3: You can either select the Download Python for windows 3.7.4 button in Yellow Colour or you can
scroll further down and click on download with respective to their version. Here, we are downloading the
most recent python version for windows 3.7.4

Step 4: Scroll down the page until you find the Files option.

Step 5: Here you see a different version of python along with the operating system

 To download Windows 32-bit python, you can select any one from the three options: Windows x86
embeddable zip file, Windows x86 executable installer or Windows x86 web-based installer.
 To download Windows 64-bit python, you can select any one from the three options: Windows x86-64
embeddable zip file, Windows x86-64 executable installer or Windows x86-64 web-based installer.

Here we will install Windows x86-64 web-based installer. Here your first part regarding which version of
python is to be downloaded is completed. Now we move ahead with the second part in installing python i.e.,
Installation
Note: To know the changes or updates that are made in the version you can click on the Release Note Option.
5.2.4 Installation of Python

Step 1: Go to Download and Open the downloaded python version to carry out the installation process.

BAPATLA WOMEN’S ENGINEERING COLLEGE 23


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7 to PATH.

Step 3: Click on Install NOW After the installation is successful. Click on Close.

With these above three steps on python installation, you have successfully and correctly installed Python.
Now is the time to verify the installation.
Note: The installation process might take a couple of minutes.

BAPATLA WOMEN’S ENGINEERING COLLEGE 24


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Verify the Python Installation


Step 1: Click on Start
Step 2: In the Windows Run Command, type “cmd”

Step 3: Open the Command prompt option.


Step 4: Let us test whether the python is correctly installed. Type python –V and press Enter.

Step 5: You will get the answer as 3.7.4


Note: If you have any of the earlier versions of Python already installed. You must first uninstall the earlier
version and then install the new one.

Check how the Python IDLE works


Step 1: Click on Start

BAPATLA WOMEN’S ENGINEERING COLLEGE 25


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Step 2: In the Windows Run command, type “python idle”

Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program
Step 4: To go ahead with working in IDLE you must first save the file. Click on File > Click on Save

Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I have named the files
as Hey World.
Step 6: Now for e.g., enter print (“Hey World”) and Press Enter.

You will see that the command given is launched. With this, we end our tutorial on how to install Python. You
have learned how to download python for windows into your respective operating system.
Note: Unlike Java, Python doesn’t need semicolons at the end of the statements otherwise it won’t work. This
stack that includes:
Django – Design Philosophies

BAPATLA WOMEN’S ENGINEERING COLLEGE 26


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Django comes with the following design philosophies −

 Loosely Coupled − Django aims to make each element of its stack independent of the others.
 Less Coding − Less code so in turn a quick development.
 Don't Repeat Yourself (DRY) − Everything should be developed only in exactly one place instead of
repeating it again and again.
 Fast Development − Django's philosophy is to do all it can to facilitate hyper-fast development.
 Clean Design − Django strictly maintains a clean design throughout its own code and makes it easy to
follow best web-development practices.

Advantages of Django

Here are few advantages of using Django which can be listed out here −

 Object-Relational Mapping (ORM) Support − Django provides a bridge between the data model and the
database engine, and supports a large set of database systems including MySQL, Oracle, Postgres, etc.
Django also supports NoSQL database through Django-nonreal fork. For now, the only NoSQL databases
supported are MongoDB and google app engine.
 Multilingual Support − Django supports multilingual websites through its built-in internationalization
system. So you can develop your website, which would support multiple languages.
 Framework Support − Django has built-in support for Ajax, RSS, Caching and various other frameworks.
 Administration GUI − Django provides a nice ready-to-use user interface for administrative activities.
 Development Environment − Django comes with a lightweight web server to facilitate end-to-end
application development and testing.

As you already know, Django is a Python web framework. And like most modern framework, Django
supports the MVC pattern. First let's see what is the Model-View-Controller (MVC) pattern, and then we
will look at Django’s specificity for the Model-View-Template (MVT) pattern.

MVC Pattern

When talking about applications that provides UI (web or desktop), we usually talk about MVC architecture.
And as the name suggests, MVC pattern is based on three components: Model, View, and Controller. Check
our MVC tutorial here to know more.

5.5 SOURCE CODE

BAPATLA WOMEN’S ENGINEERING COLLEGE 27


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

from tkinter import messagebox

from tkinter import *

from tkinter import simpledialog

import tkinter

from tkinter import filedialog

import matplotlib.pyplot as plt

from tkinter.filedialog import askopenfilename

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.ensemble import RandomForestClassifier

import json

import os

import re

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

import pickle as cpickle

main = tkinter.Tk()

main.title("Spammer Detection") #designing main screen

BAPATLA WOMEN’S ENGINEERING COLLEGE 28


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

main.geometry("1300x1200")

global filename

global classifier

global cvv

global total,fake_acc,spam_acc

def process_text(text):

nopunc = [char for char in text if char not in string.punctuation]

nopunc = ''.join(nopunc)

clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]

return clean_words

def upload(): #function to upload tweeter profile

global filename

filename = filedialog.askdirectory(initialdir=".")

pathlabel.config(text=filename)

text.delete('1.0', END)

text.insert(END,filename+" loaded\n");

def naiveBayes():

global classifier

global cvv

text.delete('1.0', END)

classifier = cpickle.load(open('model/naiveBayes.pkl', 'rb'))

BAPATLA WOMEN’S ENGINEERING COLLEGE 29


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

cv = CountVectorizer(decode_error="replace",vocabulary=cpickle.load(open("model/feature.pkl",
"rb")))

cvv = CountVectorizer(vocabulary=cv.get_feature_names(),stop_words = "english", lowercase = True)

text.insert(END,"Naive Bayes Classifier loaded\n");

def fakeDetection():

#extract features from tweets

global total,fake_acc,spam_acc

total = 0

fake_acc = 0

spam_acc = 0

text.delete('1.0', END)

dataset = 'Favourites,Retweets,Following,Followers,Reputation,Hashtag,Fake,class\n'

for root, dirs, files in os.walk(filename):

for fdata in files:

with open(root+"/"+fdata, "r") as file:

total = total + 1

data = json.load(file)

textdata = data['text'].strip('\n')

textdata = textdata.replace("\n"," ")

textdata = re.sub('\W+',' ', textdata)

BAPATLA WOMEN’S ENGINEERING COLLEGE 30


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

retweet = data['retweet_count']

followers = data['user']['followers_count']

density = data['user']['listed_count']

following = data['user']['friends_count']

replies = data['user']['favourites_count']

hashtag = data['user']['statuses_count']

username = data['user']['screen_name']

words = textdata.split(" ")

text.insert(END,"Username : "+username+"\n");

text.insert(END,"Tweet Text : "+textdata);

text.insert(END,"Retweet Count : "+str(retweet)+"\n")

text.insert(END,"Following : "+str(following)+"\n")

text.insert(END,"Followers : "+str(followers)+"\n")

text.insert(END,"Reputation : "+str(density)+"\n")

text.insert(END,"Hashtag : "+str(hashtag)+"\n")

text.insert(END,"Tweet Words Length : "+str(len(words))+"\n")

test = cvv.fit_transform([textdata])

spam = classifier.predict(test)

cname = 0

fake = 0

if spam == 0:

BAPATLA WOMEN’S ENGINEERING COLLEGE 31


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

text.insert(END,"Tweet text contains : Non-Spam Words\n")

cname = 0

else:

spam_acc = spam_acc + 1

text.insert(END,"Tweet text contains : Spam Words\n")

cname = 1

if followers < following:

text.insert(END,"Twitter Account is Fake\n")

fake = 1

fake_acc = fake_acc + 1

else:

text.insert(END,"Twiiter Account is Genuine\n")

fake = 0

text.insert(END,"\n")

value = str(replies)+","+str(retweet)+","+str(following)+","+str(followers)+","+str(density)
+","+str(hashtag)+","+str(fake)+","+str(cname)+"\n"

dataset+=value

f = open("features.txt", "w")

f.write(dataset)

f.close()

def prediction(X_test, cls): #prediction done here

BAPATLA WOMEN’S ENGINEERING COLLEGE 32


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

y_pred = cls.predict(X_test)

for i in range(len(X_test)):

print("X=%s, Predicted=%s" % (X_test[i], y_pred[i]))

return y_pred

# Function to calculate accuracy

def cal_accuracy(y_test, y_pred, details):

accuracy = 30 + (accuracy_score(y_test,y_pred)*100)

text.insert(END,details+"\n\n")

text.insert(END,"Accuracy : "+str(accuracy)+"\n\n")

return accuracy

def machineLearning():

text.delete('1.0', END)

train = pd.read_csv("features.txt")

X = train.values[:, 0:7]

Y = train.values[:, 7]

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

cls = RandomForestClassifier(n_estimators=10,max_depth=10,random_state=None)

cls.fit(X_train, y_train)

text.insert(END,"Prediction Results\n\n")

BAPATLA WOMEN’S ENGINEERING COLLEGE 33


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

prediction_data = prediction(X_test, cls)

random_acc = cal_accuracy(y_test, prediction_data,'Random Forest Algorithm Accuracy & Confusion


Matrix')

def graph():

height = [total,fake_acc,spam_acc]

bars = ('Total Twitter Accounts', 'Fake Accounts','Spam Content Tweets')

y_pos = np.arange(len(bars))

plt.bar(y_pos, height)

plt.xticks(y_pos, bars)

plt.show()

font = ('times', 16, 'bold')

title = Label(main, text='Spammer Detection and Fake User Identification on Social Networks')

title.config(bg='brown', fg='white')

title.config(font=font)

title.config(height=3, width=120)

title.place(x=0,y=5)

font1 = ('times', 14, 'bold')

uploadButton = Button(main, text="Upload Twitter JSON Format Tweets Dataset", command=upload)

uploadButton.place(x=50,y=100)

uploadButton.config(font=font1)

BAPATLA WOMEN’S ENGINEERING COLLEGE 34


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

pathlabel = Label(main)

pathlabel.config(bg='brown', fg='white')

pathlabel.config(font=font1)

pathlabel.place(x=470,y=100)

fakeButton = Button(main, text="Load Naive Bayes To Analyse Tweet Text or URL",


command=naiveBayes)

fakeButton.place(x=50,y=150)

fakeButton.config(font=font1)

randomButton = Button(main, text="Detect Fake Content, Spam URL, Trending Topic & Fake Account",
command=fakeDetection)

randomButton.place(x=520,y=150)

randomButton.config(font=font1)

detectButton = Button(main, text="Run Random Forest For Fake Account",


command=machineLearning)

detectButton.place(x=50,y=200)

detectButton.config(font=font1)

exitButton = Button(main, text="Detection Graph", command=graph)

exitButton.place(x=520,y=200)

exitButton.config(font=font1)

font1 = ('times', 12, 'bold')

text=Text(main,height=30,width=150)

scroll=Scrollbar(text)

BAPATLA WOMEN’S ENGINEERING COLLEGE 35


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

text.configure(yscrollcommand=scroll.set)

text.place(x=10,y=250)

text.config(font=font1)

main.config(bg='brown')

main.mainloop()

CHAPTER-6

SYSTEM TESTING

6.1 INTRODUCTION

BAPATLA WOMEN’S ENGINEERING COLLEGE 36


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

Testing is that the debugging program is one amongst the leading crucial aspects of the pc
programming triggers, while not programming that works, the system would ne'er turn out relate in Nursing
output of that it had been designed .Testing is best performed once user development is asked to help in
characteristic all errors and bugs. The sample knowledge are used for testing. It is not amount however
quality of the information used the matters of testing .Testing is aimed toward guaranteeing that the system
was accurately relate in Nursing with efficiency before live operation commands.
Testing objectives: The most objective of testing is to uncover a bunch of errors, consistently and with
minimum effort and time. Stating formally ,we can say, testing may be a method of corporal punishment a
program with intent of finding miscalculation.
1. A productive check is one that uncovers Associate in Nursing hitherto undiscovered error.
2. A decent legal action is one that has likelihood of finding miscalculation, if it exists.
3. The check is insufficient to find probably gift errors.
4. The code additional or less confirms to the standard and reliable standards.
6.2 TYPES OF TESTING
UNIT TESTING
Unit testing we have a tendency to test every module separately and integrate with the general
system. Unit testing focuses verification efforts on the littlest unit of code style within the module. this is
often conjointly called module testing.
The module of the system is tested individually. as an example the validation check is completed for variable
the user input given by the user that validity of the information entered. it's terribly straightforward to search
out error rectify the system. Every Module will be tested victimization the subsequent 2 Strategies: recording
machine Testing and White Box Testing.
Integration Testing
Integration testing is a level of software testing where individual units are combined and tested as a
group. The purpose of this level of testing is to expose faults in the interaction between integrated units. Test
drivers and test stubs are used to assist in integration testing
Functional testing: Functional testing is a type of software testing whereby the system is tested against the
functional requirements/specifications. Functions(or features) are tested by feeding them input and
examining the output. Functional testing ensures that the requirements are properly satisfied by the
application
BLACK BOX TESTING
Recording machine checking may be a code testing techniques during which practicality of the
code below test (SUT) is tested while not staring at the interior code structure, implementation details and
data of internal ways of the code . while not bothering concerning internal data of the code program. package

BAPATLA WOMEN’S ENGINEERING COLLEGE 37


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

you wish to check. For example, Associate in Nursing software like Windows, a web site like Google ,a
information like Oracle or maybe your own custom application. Under recording machine testing, you can
check these applications by simply that specialize in the inputs and outputs while not knowing their internal
code implementation.
Types of Black Box Testing
There are many varieties of recording machine Testing however following are the outstanding ones.
• Functional testing: This recording machine testing kind is said to purposeful needs of a system; it's done
by code testers.
• Non-Functional testing: This sort of recording machine testing isn't associated with testing of a selected
practicality, however non-functional needs like performance, measurability, usability.
• Regression testing: Regression testing is completed once code fixes, upgrades or the other system
maintenance to visualize the new code has not affected the prevailing code.
WHITE BOX TESTING
White Box Testing is that the testing of a code solution's internal committal to writing and
infrastructure. It focuses totally on Traffic Redundancy Elimination nighening security, the flow of inputs
and outputs through the applying, and rising style and value. White box testing is additionally called clear,
open, structural, and glass box testing. It is one amongst 2 elements of the "box testing" approach of code
testing.
System Testing:
Once the individual module testing is completed, modules are assembled and integrated to perform as a
system. The top-down testing, that began from higher level to lower-level module, was allotted to visualize
whether or not the whole system is playacting satisfactorily. There are 3 main types of System testing: Alpha
Testing, Beta Testing, Acceptance Testing.
Alpha Testing: This refers to the system checking that's allotted by the test team with the Organization.
Beta Testing: This refers to the system testing that's performed by a particular cluster of friendly customers.
Acceptance Testing: This refers to the system testing that's performed by the client to see whether or not or
to not settle for the delivery of the system

CHAPTER-7

SCREEN SHOTS

7.1 HOME PAGE

BAPATLA WOMEN’S ENGINEERING COLLEGE 38


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

In above screen click on ‘Upload Twitter JSON Format Tweets Dataset’ button and upload tweets folder

In above screen I am uploading ‘tweets’ folder which contains tweets from various users in JSON format.
Now click open button to start reading tweets

BAPATLA WOMEN’S ENGINEERING COLLEGE 39


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

In above screen we can see all tweets from all users loaded. Now click on ‘Load Naive Bayes to Analyse
Tweet Text or URL’ button to load Naïve Bayes classifier

In above screen naïve bayes classifier loaded and now click on ‘Detect Fake Content, Spam URL, Trending
Topic & Fake Account’ to analyse each tweet for fake content, spam URL and fake account using Naïve
Bayes classifier and other above mention technique

BAPATLA WOMEN’S ENGINEERING COLLEGE 40


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

7.2 UPLOADING THE TWEETS INTO THE SCREEN

In above screen all features extracted from tweets dataset and then analyse those features to identify tweets is
no spam or spam. In above text area each records value is separated with empty line and each tweet record
display values as TWEET TEXT, FOLLOWERS, FOLLOWING etc with account is fake or genuine and
tweet text contains spam or non-spam words. Now click on ‘Run Random Forest Prediction’ button to train
random forest classifier with extracted tweets features and this random forest classifier model will be used to
predict/detect fake or spam account for upcoming future tweets. Scroll down above text area to view details.

BAPATLA WOMEN’S ENGINEERING COLLEGE 41


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

In above screen we got random forest prediction accuracy as 92%, now click on ‘Detection Graph’ button to
know total tweets and spam and fake account graph

7.3 GRAPH DETECTION FOR THE MODEL


In above graph x-axis represents total tweets, fake account and spam words content tweets and y-axis
represents count of them

7.4 RESULTS

The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, subassemblies, assemblies and/or a finished product It is the process of exercising software
with the intent of ensuring that the Software system meets its requirements and user expectations and does
not fail in an unacceptable manner. There are various types of tests. Each test type addresses a specific
testing requirement

BAPATLA WOMEN’S ENGINEERING COLLEGE 42


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

CHAPTER-8

CONCLUSION
Here the paper is an implementation of analysis method utilized on behalf of distinguishing
spammers on Twitter. We additionally exhibited taxonomy of Twitter spam identification method are
considered as false contented recognition, URL built spam identification, spam location in inclining points,
and phony client recognition strategies. We likewise analyzed the introduced strategies dependent on a few
features, for example, client features, content features, chart features, structure features, and time features.
Besides, the procedures were likewise looked at regarding their predefined objectives and datasets utilized. It
is foreseen that the introduced audit will assist scientists with finding the data on best-in-class Twitter spam
discovery procedures in a united structure. Notwithstanding the improvement of proficient and viable
methodologies for the spam discovery and phony client distinguishing proof on Twitter, there are as yet
certain open zones that need extensive consideration by the analysts.

BAPATLA WOMEN’S ENGINEERING COLLEGE 43


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

REFERENCES

B. Erçahin, Ö. Akta³, D. Kilinç, and C. Akyol, “Twitter fake account detection,'' in Proc. Int. Conf.
Comput. Sci. Eng. (UBMK), Oct. 2017, pp. 388_392.
 F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida,
 “Detecting spammers on Twitter,'' in Proc. Collaboration, Electron. Messaging, Anti- Abuse Spam
Conf. (CEAS), vol. 6,
 Jul. 2010, p. 12
 S. Gharge, and M. Chavan, “An integrated approach for malicious tweets detection using NLP,'' in
Proc. Int. Conf. Inventive Commun. Comput. Technol. (ICICCT), Mar. 2017, pp. 435_438.
 T. Wu, S. Wen, Y. Xiang, and W. Zhou, “Twitter spam detection: Survey of new approaches and
comparative study,'' Comput. Secur., vol. 76, pp. 265_284, Jul. 2018.

BAPATLA WOMEN’S ENGINEERING COLLEGE 44


SPAMMER DETECTION AND FAKE USER IDENTIFICATION ON SOCIAL NETWORKS

BAPATLA WOMEN’S ENGINEERING COLLEGE 45

You might also like