0% found this document useful (0 votes)
14 views

Computer Science Project

mca ;rokect

Uploaded by

kumarankutty52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Computer Science Project

mca ;rokect

Uploaded by

kumarankutty52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

lOMoARcPSD|43124770

COMPUTER SCIENCE PROJECT

Computer Science (SRM Institute of Science and Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Kumaran Kutty ([email protected])
lOMoARcPSD|43124770

DETECTING PHISHING ATTACKS USING NATURAL LANGUAGE


PROCESSING AND MACHINE LEARNING

A PROJECT REPORT SUBMITTED TO


SRM INSTITUTE OF SCIENCE & TECHNOLOGY

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF THE


DEGREE OF

BACHELOR OF COMPUTER APPLICATIONS

BY

ARUNESH M
REG NO RA1731241040018

UNDER THE GUIDANCE OF

Dr.J.PADMAVATHI
Associate Professor & Head
Department of Computer Science and Application

DEPARTMENT OF COMPUTER SCIENCE AND APPLICATIONS


FACULTY OF SCIENCE AND HUMANITIES

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY


JAWAHARLAL NEHRU RD, VADAPALANI, CHENNAI - 600026

JUNE 2020

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

BONAFIDE CERTIFICATE

Certified that this project report titled <Detecting Phishing Attacks Using Natural
Language Processing and Machine Learning= is the bonafide work of ARUNESH M
(Reg.No: RA1731241040018), who carried out the project under my supervision. Certified
further, that to the best of my knowledge the work reported here in does not form part of any
other project report or dissertation on the basis of which a degree or award was conferred on
an earlier occasion or any other candidate.

Signature of the Guide Signature of the HOD


Dr.J.PADMAVATH, Ph.d., Dr.J.PADMAVATHI, Ph.d.,

Associate Professor & Head Associate Professor & Head


Department of Computer Science and Department of Computer Science and
Applications, FSH, Applications, FSH,
SRM Institute of Science and Technology, SRM Institute of Science and Technology,
Vadapalani- 600026 Vadapalani- 600026

Submitted for Project Work Viva-voce Examination held on_____________________

Place: VADAPALANI
Date :

INTERNALEXAMINER EXTERNAL EXAMINER

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

ACKNOWLEDGEMENT

First and foremost, I would like to thank heartfelt and with deep sense of gratitude the
Management of SRM Institute of Science & Technology for their constant support and
endorsement.

I wish to express our sincere gratitude to our Dean-Incharge,


Dr. Ananthapadmanaban, Faculty of Science & Humanities for his constant support and
encouragement.

I express my gratitude to my guide Dr. J.Padmavathi , Associate Professor and


Head of Department of Computer Science and Applications, SRM Institute of Science &
Technology for permitting to do my work in department and provide necessary
computational and laboratory facilities to all of us.

I extend our sincere gratitude to our Coordinator Ms.K.Kowsalya, Assistant


Professor, Department of Computer Science and Applications, Faculty of Science &
Humanities, SRMIST, for her stimulant guidance.

I am highly indebted to my Dr. J.Padmavathi ,Associate Professor and Head of


Department of Computer Science and Applications, who generously accepted me under
her valuable guidance and for the endless help and inspiration provided to me in the tenure of
theproject.

Special thanks are recorded to our Class-Coordinator Mr.M.Ramesh. Assistant


Professor, for his constant support and guidance to all of us throughout the project.

Finally, my gratefulness goes to my parents who were my strength and driving


force in completion of my project report.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

ABSTRACT

Cloud computing has revolutionized the IT industry for the past decade and is
still developing creative ways to solve current problems. Innovations such as cloud storage
and non-native applications are unimaginable 10years ago but a reality now a days.
Companies and research institutes are slowly moving to the cloud to address their computing
needs. Services and applications are also common in the cloud right now.

The cloud takes advantage of a centralized system and updates information


real-time. Businesses with time sensitive data are quick to grab this opportunity and harness
the efficiency of the cloud. For example, medical researches that needed months of in-house
number crunching moved to distributed systems, significantly reducing computing time and
expenses. Thus the analysis of reducing the time consumption in Cloud Computing with
minimum maximum scheduling algorithm can enable more energy efficient use of computing
environment.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

LIST OF FIGURES

PAGE
S.NO. FIG.NO. TITLE OF THE FIGURE
NO.
1.
4.3.1 .Net Framework Architecture 11
2.
5.1 System Architecture 12
3.
5.2.1 Data Flow Diagram Level0 13
4.
5.3 Class Diagram 14
5.
6.1 Module Description 15
6.
9.2.1 Cloud Controller 51
7.
9.2.2 Client1 52
8.
9.2.3 Client2 53
9.
9.2.4 Client3 54

10. Cloud Controller started to access the files from


9.2.5 55
clients
11.
9.2.6 Client 2 sending file to the cloud 56

12. Cloud controller started to access the data from clients


9.2.7 57
with size of the file
13.
9.2.8 Client1 sending data continuously 58

14. Client 2 Continuously sending data to the cloud


9.2.9 59

15. Client 3 Continuously sending data to the cloud


9.2.10 60

16. Cloud controller access the data with the same


9.2.11 61
execution Time from All clients
17.
9.2.12 Performance Analysis 62

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

TABLE OF CONTENTS

PAGE
CHAPTER TITLE
NO.
BONAFIDE CERTIFICATE i
ACKNOWLEDGEMENT ii
ABSTRA v
CT
LIST OF FIGURES vi
1 LIST OF CONTENT 1
1.1 Overview 1
2 LITERATURE SURVEY 3
The Survey and Future Evolution of Green
2.1 3
Computing
A Study On Green Computing: The Future
2.2 3
Computing And Eco-Friendly Technology
Cloud Load Balancing Techniques : A Step
2.3 3
Towards Green Computing
Holistic Approach to Cloud Service
2.4 4
Computing: Balancing Energy in Processing,
2.5 4
and Transport
3 SYSTEM ANALYSIS 6
3.1 OBJECTIVE 6
3.2 EXISTNG SYSTEM 6
3.2.1 Disadvantage Of Existing System 6
3.3 PROPOSED SYTEM 6
3.3.1 Advantage Of Proposed System 7
3.4 FEASIBILITY STUDY 7
3.4.1 Operational Feasibility 7
3.4.2 Technical Feasibility 7
3.4.3 Economical Feasibility 7
4 SYSTEMREQUIREMENTS 8
4.1 Hardware Requirements 8
4.2 Software Requirements 8
4.3 Software Description 8
5 SYSTEM DESIGN 12
5.1 System Architecture 12

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Data Flow Diagram


5.2 5.2.1. DFD Level-0 Login 13
5.2.1. Level -1 User
5.3 Class Diagram 14

6 IMPLEMENTATION 15
6.1 Module Description 15
6.2 Techniques 16
7 SYSTEM TESTING 18

8 CONCLUSION AND FUTURE ENHANCEMENT

9 REFERENCES 19
10 APPENDICES 20
10.1 Appendix – A (Coding) 20
10.2 Appendix – B (Screen Shot) 50

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 1
INTRODUCTION

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 1
INTRODUCTION

1. INTRODUCTION

While the Internet has brought unprecedented convenience to many people for managing
their finances and investments, it also provides opportunities for conducting fraud on a
massive scale with little cost to the fraudsters. Fraudsters can manipulate users instead of
hardware/software systems, where barriers to technological compromise have increased
significantly. Phishing is one of the most widely practiced Internet frauds. It focuses on the
theft of sensitive personal information such as passwords and credit card details. Phishing
attacks take two forms:

• Attempts to deceive victims to cause them to reveal their secrets by pretending to be


trustworthy entities with a real need for such information.
• Attempts to obtain secrets by planting malware onto victim9s machines.

The specific malware used in phishing attacks is subject of research by the virus and malware
community and is not addressed in this thesis. Phishing attacks that proceed by deceiving
users are the research focus of this thesis and the term 8phishing attack9 will be used to refer
to this type of attack.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 2
WORKING ENVIRONMENT

HARDWARE REQUIREMENTS:

• System : Pentium Dual Core.


• Hard Disk : 120GB.
• Monitor : 1599LED
• Input Devices : Keyboard, Mouse
• RAM : 1GB

SOFTWARE REQUIREMENTS:

• Operating system : Windows7.


• Coding Language : Python2.7
• Tool : Anaconda navigator
• Database : MYSQL

SYSTEM SOFTWARE:

• Basic utilities
• Web browser

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 3
SYSTEM ANALYSIS

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 3
SYSTEM ANALYSIS

Feasibility Study

People often purchase products online and make payment through e- banking. There
are many E banking phishing websites. In order to detect the e- banking phishing website our
system uses an effective heuristic algorithm. The e-banking phishing website can be detected
based on some important characteristics like URL and Domain Identity, and security and
encryption criteria

Malicious Web sites largely promote the growth of Internet criminal activities and constrain
the development of Web services. As a result, there has been strong motivation to develop
systemic solution to stopping the user from visiting such Web sites. We propose a learning
based approach to classifying Web sites into 3 classes: Benign, Spam and Malicious. Our
mechanism only analyzes the Uniform Resource Locator (URL) itself without accessing the
content of Websites. Thus, it eliminates the run-time latency and the possibility of exposing
users to the browser based vulnerabilities. By employing learning algorithms, our scheme
achieves better performance on generality and coverage compared with blacklisting service.

URLs of the websites are separated into 3 classes:

• Benign: Safe websites with normal services


• Spam: Website performs the act of attempting to flood the user with advertising or
sites such as fake surveys and online dating etc.
• Malware: Website created by attackers to disrupt computer operation, gather sensitive
information, or gain access to private computer systems.

Existing System:

A poorly structured NN model may cause the model to under fit the training dataset. On
the other hand, exaggeration in restructuring the system to suit every single item in the
training dataset may cause the system to be over fitted. One possible solution to avoid the
Over fitting problem is by restructuring the NN model in terms of tuning some parameters,
adding new neurons to the hidden layer or sometimes adding a new layer to the

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

network.ANN with a small number of hidden neurons may not have a satisfactory
representational power to model the complexity and diversity inherent in the data. On the
other hand, networks with too many hidden neurons could over fit the data. However, a
certain stage the model can no longer be improved, therefore, the structuring process should
be terminated. Hence, an acceptable error rate should be specified when creating any NN
model, which itself is considered a problem since it is difficult to determine the acceptable
error rate a priori. For instance , the model designer may set the acceptable error rate to a
value that is unreachable which causes the model to stick in local minima or sometimes the
model designer may set the acceptable error rate to a value that can further be improved.

Disadvantage:

1. It will take time to load all the dataset.

2. Process is not accuracy.

3. It will analyze slowly.

Proposed System:

Lexical features are based on the observation that the URLs of many illegal sites look
different, compared with legitimate sites. Analyzing lexical features enables us to capture the
property for classification purposes. We first distinguish the two parts of a URL:the hostname
and the path, from which we extract bag-of-words (strings delimited by 8/9, 8?9, 8.9, 8=9, 8-9
and89).

We find that phishing website prefers to have longer URL, more levels (delimited by
dot), more tokens in domain and path, longer token. Besides, phishing and malware websites
could pretend to be a benign one by containing popular brand names as tokens other than
those in second-level domain. Considering phishing websites and malware websites may use
IP address directly so as to cover the suspicious URL, which is very rare in benign case. Also,
phishing URLs are found to contain several suggestive word tokens (confirm, account,
banking, secure, ebayisapi, webscr, login, sign in),we check the presence of these security
sensitive words and include the binary value in our features. Intuitively, malicious sites are
always less popular than benign ones. For this reason, site popularity can be considered as an
important feature. Traffic rank feature is acquired from Alexa.com. Host-based features are
based on the observation that malicious sites are always registered in less reputable hosting
centre or regions.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Advantage:

1. All of URLs in the dataset are labeled.


2. We used two supervised learning algorithms random forest and support vector
machine to train using scikit-learn library.

Scope of the Project:

Though there is much phishing detection, the scope of the project is limited to feature
based phishing detection techniques. It extracts the discriminative features from the websites
which help in identifying the website class. In this process, rules play an important role as
they are easily understood by humans. The rules are formed in such a way that IF a condition
THEN class category where class category represents the category to which a class belongs
to. This rule induction helps to facilitate the decision making process which ensures
reliability and completeness.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 4
SYSTEM DESIGN

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 4
SYSTEM DESIGN

UMLDIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major components:
a Meta-model and a notation. In the future, some form of method or process may also be
added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven successful
in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphic a notations to express the
design of software projects.

GOALS:

The Primary goals in the design of the UML are as follows:


1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

USE CASEDIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. Roles of the actors
in the system can be depicted.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CLASSDIAGRAM

In software engineering, a class diagram in the Unified Modeling Language


(UML)is a type of static structure diagram that describes the structure of a system by showing
the system's classes, the attributes, operations(or methods),and the relationships among the
classes. It explains which class contains information

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

SEQUENCEDIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction


diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 5
DATA FLOWDIAGRAM

CHAPTER 5

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

DATA FLOWDIAGRAM

DATA FLOW DIAGRAM

LEVEL 0

Dataset
Collectio

Pre-
processin

Rando
m

Trained
&Testing
dataset

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

LEVEL 1

Dataset
collectio

Pre-
processin

Feature

Extractio

Apply

Algorith

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

LEVEL 2

Classif
y
the

Accurac
y

Detectio
n
malicious
and

Fin
d
possibility

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 6
IMPLEMENTATION

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 6
IMPLEMENTATION

OBJECTIVE

The main objective of this paper is to detect the Begin, Malicious and Malware URLs with
the use of NLP.

Modules

1. Phishing Websites Features

One of the challenges faced by our research was the unavailability of reliable training data
sets. In fact, this challenge faces any researcher in the field. However, although plenty of
articles about predicting phishing websites using data mining techniques have been
disseminated these days, no reliable training dataset has been published publically, maybe
because there is no agreement in literature on the definitive features that characterize phishing
websites, hence it is difficult to shape a dataset that covers all possible features.

In this article, we shed light on the important features that have proved to be sound and
effective in predicting phishing websites. In addition, we proposed some new features,
experimentally assign new rules to some well-known features and update some other features.

2. Address Bar based Features

Using the IP Address I an IP address issued as an alternative of the domain name in the
URL, such as <http://125.98.3.123/fake.html=, users can be sure that someone is trying to
steal their personal information. Sometimes, the IP address is even transformed into
hexadecimal code as show in the following link
<http://0x58.0xCC.0xCA.0x62/2/paypal.ca/index.html=.

Rule: IF { If The Domain Part has an IP Address → Phishing Otherwise → Legitimate

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

3. Using Pop-up Window

It is unusual to find a legitimate website asking users to submit their personal information
through a pop-up window. On the other hand, this feature has been used in some legitimate
websites and its main goal is to warn users about fraudulent activities or broadcast a welcome
announcement, though no personal information was asked to be filled in through these pop-
up windows.

4. Classification

To ensure that our approach works well irrespective of the underlying classiûer chosen for
the task, we performed the experiments using two different classiûers: Random Forest and
Support vector machine, as these are some of the most commonly used classiûers for the task
of text-data classiûcation. Scikit-learn implementation of these classiûer with their default
parameter settings are used for our experiments. The tf-idf feature is used to represent each
URL in the database.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 7
SYSTEM TESTING

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 7
SYSTEM TESTING

Testing

The various levels of testing are

1. White Box Testing


2. Black Box Testing
3. Unit Testing
4. Functional Testing
5. Performance Testing
6. Integration Testing
7. Objective
8. Integration Testing
9. Validation Testing
10.System Testing
11.Structure Testing
12.Output Testing
13.User Acceptance Testing

White Box Testing

White-box testing (also known as clear box testing, glass box testing,
transparent box testing, and structural testing) is a method of testing software that tests
internal structures or workings of an application, as opposed to its functionality (i.e. black-
box testing). In white-box testing an internal perspective of the system, as well as
programming skills, are used to design test cases. The tester chooses inputs to exercise paths
through the code and determine the appropriate outputs. This is analogous to testing nodes in
a circuit,
e.g. in-circuit testing (ICT).

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

While white-box testing can be applied at the unit,


integration and system levels of the software testing process, it is usually done at the unit
level. It can test paths within a unit, paths between units during integration, and between
subsystems during a system–level test. Though this method of test design can uncover many
errors or problems, it might not detect unimplemented parts of the specification or missing
requirements.

White-box test design techniques include:

• Control flow testing


• Data flow testing
• Branch testing
• Path testing
• Statement coverage
• Decision coverage

White-box testing is a method of testing the application at the level of the source code.
The test cases are derived through the use of the design techniques mentioned above: control
flow testing, data flow testing, branch testing, path testing, statement coverage and decision
coverage as well as modified condition/decision coverage. White-box testing is the use of
these techniques as guidelines to create an error free environment by examining any fragile
code.

These White-box testing techniques are the building blocks of white-box testing, whose
essence is the careful testing of the application at the source code level to prevent any hidden
errors later on. These different techniques exercise every visible path of the source code to
minimize errors and create an error-free environment. The whole point of white-box testing is
the ability to know which line of the code is being executed and being able to identify what
the correct output should be.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Levels

1. Unit testing. White-box testing is done during unit testing to ensure that the code is
working as intended, before any integration happens with previously tested code.
White-box testing during unit testing catches any defects early on and aids in any
defects that happen later on after the code is integrated with the rest of the application
and therefore prevents any type of errors later on.

2. Integration testing. White-box testing at this level are written to test the interactions of
each interface with each other. The Unit level testing made sure that each code was
tested and working accordingly in an isolated environment and integration examines
the correctness of the behavior in an open environment through the use of white-box
testing for any interactions of interfaces that are known to the programmer.

3. Regression testing. White-box testing during regression testing is the use of recycled
white-box test cases at the unit and integration testing levels.
White-box testing's basic procedures involve the understanding of the source code that
you are testing at a deep level to be able to test them. The programmer must have a deep
understanding of the application to know what kinds of test cases to create so that every
visible path is exercised for testing. Once the source code is understood then the source code
can be analyzed for test cases to be created. These are the three basic steps that white-box
testing takes in order to create test cases:

1. Input, involves different types of requirements, functional specifications, detailed


designing of documents, proper source code, security specifications. This is the
preparation stage of white-box testing to layout all of the basic information.
2. Processing Unit, involves performing risk analysis to guide whole testing process,
proper test plan, execute test cases and communicate results. This is the phase of
building test cases to make sure they thoroughly test the application the given results
are recorded accordingly.
3. Output, prepare final report that encompasses all of the above preparations and
results.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Black Box Testing

Black-box testing is a method of software testing that examines the functionality of


an application (e.g. what the software does) without peering into its internal structures or
workings (see white-box testing).This method of test can be applied to virtually
every level of software testing: unit, integration, system and acceptance. It
typically comprises most if not all higher level testing, but can also dominate unit testing as
well.

Test procedures

Specific knowledge of the application's code/internal structure and programming


knowledge in general is not required. The tester is aware of what the software is supposed
to do but is not aware of how it does it. For instance, the tester is aware that a particular input
returns a certain, invariable output but is not aware of how the software produces the output
in the first place.

Test cases
Test cases are built around specifications and requirements, i.e., what the application
is supposed to do. Test cases are generally derived from external descriptions of the software,
including specifications, requirements and design parameters.

Although the tests used are primarily functional in nature, non- functional tests may
also be used. The test designer selects both valid and invalid inputs and determines the correct
output without any knowledge of the test object's internal structure.

Test design techniques


Typical black-box test design techniques include:

• Decision table testing


• All-pairs testing
• State transition tables
• Equivalence partitioning
• Boundary value analysis

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Unit testing

In computer programming, unit testing is a method by which individual units of


source code, sets of one or more computer program modules together with associated control
data, usage procedures, and operating procedures are tested to determine if they are fit for
use. Intuitively, one can view a unit as the smallest testable part of an application. In
procedural programming, a unit could be an entire module, but is more commonly an
individual function or procedure. In object-oriented programming, a unit is often an entire
interface, such as a class, but could be an individual method. Unit tests are created by
programmers or occasionally by white box testers during the development process.

Ideally, each test case is independent from the others. Substitutes such as method
stubs, mock objects, fakes, and test harnesses can be used to assist testing a module in
isolation. Unit tests are typically written and run by software developers to ensure that code
meets its design and behaves as intended. Its implementation can vary from being very
manual (pencil and paper) to being formalized as part of build automation.

Testing will not catch every error in the program, since it cannot evaluate every
execution path in any but the most trivial programs. The same is true for unit testing.
Additionally, unit testing by definition only tests the functionality of the units themselves.
Therefore, it will not catch integration errors or broader system-level errors (such as functions
performed across multiple units, or non- functional test areas such as performance).

Unit testing should be done in conjunction with other software testing activities, as they
can only show the presence or absence of particular errors; they cannot prove a complete
absence of errors. In order to guarantee correct behavior for every execution path and every
possible input, and ensure the absence of errors, other techniques are required, namely the
application of formal methods to proving that a software component has no unexpected
behavior.

Software testing is a combinatorial problem. For example, every Boolean decision


statement requires at least two tests: one with an outcome of "true" and one with an outcome
of "false". As a result, for every line of code written, programmers often need 3 to 5 lines of
test code.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

This obviously takes time and its investment may not be worth the effort. There are
also many problems that cannot easily be tested at all – for example those that are
nondeterministic or involve multiple threads. In addition, code for a unit test is likely to be at
least as buggy as the code it is testing. Fred Brooks in The Mythical Man-Month quotes :
never take two chronometers to sea. Always take one or three. Meaning, if two chronometers
contradict, how do you know which one is correct?
Another challenge related to writing the unit tests is the difficulty of setting up
realistic and useful tests. It is necessary to create relevant initial conditions so the part of the
application being tested behaves like part of the complete system. If these initial conditions
are not set correctly, the test will not be exercising the code in a realistic context, which
diminishes the value and accuracy of unit test results.
To obtain the intended benefits from unit testing, rigorous discipline is needed
throughout the software development process. It is essential to keep careful records not only
of the tests that have been performed, but also of all changes that have been made to the
source code of this or any other unit in the software. Use of a version control system is
essential. If a later version of the unit fails a particular test that it had previously passed, the
version-control software can provide a list of the source code changes (if any) that have been
applied to the unit since that time.

It is also essential to implement as stainable process for ensuring that test case failures
are reviewed daily and addressed immediately if such a process is not implemented and
ingrained into the team's workflow, the application will evolve out of sync with the unit test
suite, increasing false positives and reducing the effectiveness of the test suite.

Unit testing embedded system software presents a unique challenge: Since the
software is being developed on a different platform than the one it will eventually run on, you
cannot readily run a test program in the actual deployment environment, as is possible with
desktop programs

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Functional testing

Functional testing is a quality assurance (QA) process and a type of black box testing
that bases its test cases on the specifications of the software component under test. Functions
are tested by feeding them input and examining the output, and internal program structure is
rarely considered (not like in white- box testing). Functional Testing usually describes what
the system does.

Functional testing differs from system testing in that functional testing" verifies a program
by checking it against ... design document(s) or specification(s)", while system testing
"validate a program by checking it against the published user or system requirements" (Kane,
Falk, Nguyen 1999, p.52).
Functional testing typically involves five steps. The identification of functions that the
software is expected to perform

1. The creation of input data based on the function's specifications


2. The determination of output based on the function's specifications
3. The execution of the test case
4. The comparison of actual and expected outputs

Performance testing
In software engineering, performance testing is in general testing performed to determine
how a system performs in terms of responsiveness and stability under a particular
workload. It can also serve to investigate , measure, validate or verify other quality
attributes of the system, such as scalability, reliability and resource usage.

Performance testing is a subset of performance engineering, an emerging computer


science practice which strives to build performance into the implementation, design and
architecture of a system.

Testing types

Load testing
Load testing is the simplest form of performance testing. A load test is usually
conducted to understand the behavior of the system under a specific expected load. This load
can be the expected concurrent number of users on the application performing a specific
number of transactions within the set duration.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

This test will give out the response times of all the important business critical
transactions. If the database, application server, etc. are also monitored, then this simple test
can itself point towards bottlenecks in the application software.

Stress testing
Stress testing is normally used to understand the upper limits of capacity within the
system. This kind of test is done to determine the system's robustness in terms of extreme
load and helps application administrators to determine if the system will perform sufficiently
if the current load goes well above the expected maximum.

Soak testing
Soak testing, also known as endurance testing, is usually done to determine if the
system can sustain the continuous expected load. During soak tests, memory utilization is
monitored to detect potential leaks. Also important, but often overlooked is performance
degradation. That is, to ensure that the throughput and/or response times after some long
period of sustained activity are as good as or better than at the beginning of the test. It
essentially involves applying a significant load to a system for an extended, significant period
of time. The goal is to discover how the system behaves under sustained use.

Spike testing
Spike testing is done by suddenly increasing the number of or load generated by, users
by a very large amount and observing the behavior of the system. The goal is to determine
whether performance will suffer, the system will fail, or it will be able to handle dramatic
changes in load.

Configuration testing
Rather than testing for performance from the perspective of load, tests are created to
determine the effects of configuration changes to the system's components on the system's
performance and behavior. A common example would be experimenting with different
methods of load-balancing.

Isolation testing
Isolation testing is not unique to performance testing but involves repeating a test
execution that resulted in a system problem. Often used to isolate and confirm the fault
domain.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Integration testing

Integration testing (sometimes called integration and testing, abbreviated I&T) is


the phase in software testing in which individual software modules are combined and tested
as a group. It occurs after unit testing and before validation testing. Integration testing takes
as its input modules that have been unit tested, groups them in larger aggregates, applies tests
defined in an integration test plan to those aggregates, and delivers as its output the integrated
system ready for system testing.

Purpose

The purpose of integration testing is to verify functional, performance, and reliability


requirements placed on major design items. These "design items", i.e. assemblages (or groups
of units), are exercised through their interfaces using black box testing, success and error

Cases being simulated via appropriate parameter and data inputs. Simulated usage of
shared data areas and inter- process communication is tested and individual subsystems are
exercised through their input interface.
Test cases are constructed to test whether all the components within assemblages
interact correctly, for example across procedure calls or process activations, and this is done
after testing individual modules, i.e. unit testing. The overall idea is a "building block"
approach, in which verified assemblages are added to a verified base which is then used to
support the integration testing of further assemblages.

Some different types of integration testing are big bang, top-down, and bottom-up.
Other Integration Patterns are: Collaboration Integration, Backbone Integration, Layer
Integration, Client/Server Integration, Distributed Services Integration and High-frequency
Integration.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Big Bang

In this approach, all or most of the developed modules are coupled together to form a
complete software system or major part of the system and then used for integration testing.
The Big Bang method is very effective for saving time in the integration testing process.
However, if the test cases and their results a recorded properly, the entire integration process
will be more complicated and may prevent the testing team from achieving the goal of
integration testing.
A type of Big Bang Integration testing is called Usage Model testing. Usage Model
Testing can be used in both software and hardware integration testing. The basis behind this
type of integration testing is to run user-like workloads in integrated user-like environments.
In doing the testing in this manner, the environment is proofed, while the individual
components are proofed indirectly through their use.

Usage Model testing takes an optimistic approach to testing, because it expects to


have few problems with the individual components. The strategy relies heavily on the
component developers to do the isolated unit testing for their product. The goal of the
strategy is to avoid redoing the testing done by the developers, and instead flesh-out problems
caused by the interaction of the components in the environment.

For integration testing, Usage Model testing can be more efficient and provides better
test coverage than traditional focused functional integration testing. To be more efficient and
accurate, care must be used in defining the user- like workloads for creating realistic
scenarios in exercising the environment. This gives confidence that the integrated
environment will work as expected for the target customers.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Top-down and Bottom-up


Bottom Up Testing is an approach to integrated testing where the lowest level
components are tested first, then used to facilitate the testing of higher level components. The
process is repeated until the component at the top of the hierarchy is tested.
All the bottom or low-level modules, procedures or functions are integrated and then
tested. After the integration testing of lower level integrated modules, the next level of
modules will be formed and can be used for integration testing. This approach is helpful only
when all or most of the modules of the same development level are ready. This method also
helps to determine the levels of software developed and makes It easier to report testing
progress in the form of a percentage.
Top Down Testing is an approach to integrated testing where the top integrated
modules are tested and the branch of the module is tested step by step until the end of the
related module.
Sandwich Testing is an approach to combine top down testing with bottom up
testing.
The main advantage of the Bottom-Up approach is that bugs are more easily found. With
Top-Down, it is easier to find a missing branch link
Verification and validation

Verification and Validation are independent procedures that are used together for
checking that a product, service, or system meets requirements and specifications and that it
full fills its intended purpose. These are critical components of a quality management system
such as ISO 9000. The words "verification" and "validation" are sometimes preceded with
"Independent" (or IV&V), indicating that the verification and validation is to be performed
by a disinterested third party.
It is sometimes said that validation can be expressed by the query" Are you
building the right thing?" and verification by "Are you building it right?"In practice, the usage
of these terms varies. Sometimes they are even used interchangeably.

The PMBOK guide, an IEEE standard, defines them as follows in its 4th edition

• "Validation. The assurance that a product, service, or system meets the needs of the
customer and other identified stakeholders. It often involves acceptance and suitability
with external customers. Contrast with verification."

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

• "Verification. The evaluation of whether or not a product, service, or system complies


with a regulation, requirement, specification, or imposed condition. It is often an internal
process. Contrast with validation."

• Verification is intended to check that a product, service, or system (or portion thereof,
or set thereof) meets a set of initial design specifications. In the development phase,
verification procedures involve performing special tests to model or simulate a portion,
or the entirety, of a product, service or system, then performing a review or analysis of
the modeling results. In the post-development phase, verification procedures involve
regularly repeating tests devised specifically to ensure that the product, service, or system
continues to meet the initial design requirements, specifications, and regulations as time
progresses.
• Validation is intended to check that development and verification procedures for a
product, service, or system (or portion thereof, or set thereof) result in a product,
service, or system (or portion thereof, or set thereof) that meets initial requirements.
For a new development flow or verification flow, validation procedures may involve
modeling either flow and using simulations to predict faults or gaps that might lead to
invalid or incomplete verification or development of a product, service, or system(or
portion thereof, or set thereof). A set of validation requirements, specifications, and
regulations may then be used as a basis for qualifying a development flow or
verification flow for a product, service, or system(or portion thereof, or set thereof).
Additional validation procedures also include those that are designed specifically to
ensure that modifications made to an existing qualified development flow or
verification flow will have the effect of producing a product, service, or system (or
portion thereof, or set thereof) that meets the initial design requirements,
specifications, and regulations; these validations help to keep the flow qualified.
• It is a process of establishing evidence that provides a high degree of assurance that a
product, service, or system accomplishes its intended requirements. This often
involves acceptance of fitness for purpose with end users and other product
stakeholders. This is often an external process.
• It is sometimes said that validation can be expressed by the query" Are you building
the right thing?" and verification by "Are you building it right?". "Building the right
thing" refers back to the user's needs, while "building it right" checks that the
specifications are correctly implemented by the system. In some contexts, it is
required to have written requirements for both as well as formal procedures or
protocols for determining compliance.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

• It is entirely possible that a product passes when verified but fails when validated.
This can happen when, say, a product is built as per the specifications but the
specifications themselves fail to address the user9s needs.

Activities

Verification of machinery and equipment usually consists of design qualification


(DQ), installation qualification (IQ), operational qualification (OQ), and performance
qualification (PQ). DQ is usually a vendor's job. However, DQ can also be performed by the
user, by confirming through review and testing that the equipment meets the written
acquisition specification. If the relevant document or manuals of machinery/ equipment are
provided by vendors, the later 3Q needs to be thoroughly performed by the users who work in
an industrial regulatory environment. Otherwise, the process of IQ, OQ and PQ is the task of
validation. The typical example of such a case could be the loss or absence of vendor's
documentation for legacy equipment or do-it-yourself(DIY)assemblies (e.g., cars, computers
etc.) and, therefore, users should endeavor to acquire DQ document beforehand.

Each template of DQ, IQ, OQ and PQ usually can be found on the internet
respectively, whereas the DIY qualifications of machinery/equipment can be assisted either
by the vendor's training course materials and tutorials, or by the published guidance books,
such as step-by-step series if the acquisition of machinery/equipment is not bundled with on-
site qualification services. This kind of the DIY approach is also applicable to the
qualifications of software, computer operating systems and a manufacturing process. The
most important and critical task as the last step of the activity is to generating and archiving
machinery/equipment qualification reports for auditing purposes, if regulatory compliances
are mandatory.
Qualification of machinery/equipment is venue dependent, in particular items that are
shock sensitive and require balancing or calibration, and re- qualification needs to be
conducted once the objects are relocated.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

The full scales of some equipment qualifications are even time dependent as
consumables are used up (i.e. filters) or springs stretch out, requiring recalibration, and hence
re- certification is necessary when a specified due time lapse Re- qualification of
machinery/equipment should also be conducted when replacement of parts, or coupling with
another device, or installing a new application software and restructuring of the computer
which affects especially the pre-settings, such as on BIOS, registry, disk drive partition table,
dynamically-linked (shared) libraries, or an in file etc., have been necessary. In such a
situation, the specifications of the parts /devices/software and restructuring proposals should
be appended to the qualification document whether the parts/devices/software are genuine or
not.
Torres and Hyman have discussed the suitability of non-genuine parts for clinical use
and provided guidelines for equipment users to select appropriate substitutes which are
capable to avoid adverse effects. In the case when genuine parts/devices/software are
demanded by some of regulatory requirements, then re-qualification does not need to be
conducted on the non-genuine assemblies. Instead, the asset has to be recycled for non-
regulatory purposes.

When machinery/equipment qualification is conducted by a standard endorsed third


party such as by an ISO standard accredited company for a particular division, the process is
called certification. Currently, the coverage of ISO/IEC 15408 certification by an ISO/IEC
27001 accredited organization is limited; the scheme requires a fair amount of efforts to get
popularized.

System testing

System testing of software or hardware is testing conducted on a complete, integrated


system to evaluate the system's compliance with its
specified requirements. System testing falls within the scope of black box testing, and as
such, should require no knowledge of the inner design of the code or logic. As a rule, system
testing takes, as its input, all of the" Integrated" software components that have passed
integration testing and also the software system itself integrated with any applicable
hardware system(s). The purpose of integration testing is to detect any inconsistencies
between the software units that are integrated together (called assemblages) or between
any of the assemblages and the hardware. System testing is a more limited type of testing; it
seeks to detect defects both within the "inter-assemblages" and also within the system as a
whole.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

System testing is performed on the entire system in the context of a Functional


Requirement Specification(s) (FRS) and/or a System Requirement Specification (SRS).
System testing tests not only the design, but also the behavior and event he believed
expectations of the customer. It is also intended to test up to and beyond the bounds defined
in the software/hardware requirements specification

Types of tests to include in system testing

The following examples are different types of testing that should be considered during
System testing:

o Graphical user interface testing


o Usability testing
o Software performance testing
o Compatibility testing
o Exception handling
o Load testing
o Volume testing
o Stress testing
o Security testing
o Scalability testing

o Sanity testing
o Smoke testing
o Exploratory testing
o Ad hoc testing
o Regression testing
o Installation testing
o Maintenance testing Recovery testing and failover testing.
o Accessibility testing, including compliance with:
o Americans with Disabilities Act of1990
o Section 508 Amendment to the Rehabilitation Act of1973
o Web Accessibility Initiative (WAI) of the World Wide Web Consortium(W3C)

Although different testing organizations may prescribe different tests as part of System
testing, this list serves as a general framework or foundation to begin with.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Structure Testing:

It is concerned with exercising the internal logic of a program and traversing


particular execution paths.

Output Testing:

• Output of test cases compared with the expected results created during design of test
cases.
• Asking the user about the format required by them tests the output generated or
displayed by the system under consideration.
• Here, the output format is considered into two was, one is on screen and another one
is printed format.
• The output on the screen is found to be correct as the format was designed in the
system design phase according to user needs.
• The output comes out as the specified requirements as the user9s hard copy.

User acceptance Testing:

• Final Stage, before handling over to the customer which is usually carried out by the
customer where the test cases are executed with actual data.
• Thesystemunderconsiderationistestedforuseracceptanceandconstantly keeping touch
with the prospective system user at the time of developing and making changes when
ever required.
• It involves planning and execution of various types of test in order to demonstrate that
the implemented software system satisfies the requirements stated in the requirement
document.

Two set of acceptance test to be run:

1. Those developed by quality assurance group.


2. Those developed by customer

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 8
SAMPLE CODING

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 8
SAMPLE CODING

Decision_tree

• import pandas aspd


• urls_data =pd.read_csv("url_dataset.csv")
• urls_data.head(10)
• urls_data.columns
• urls =urls_data.drop(urls_data.columns[[0,1,2]],axis=1)
• urls =urls.sample(frac=1).reset_index(drop=True)
• urls_without_labels = urls.drop('statistical_report',axis=1)
urls_without_labels.columns
labels = urls['statistical_report']

• from sklearn.model_selection import train_test_splitdata_train, data_test,


labels_train,labels_test=train_test_split(urls_without_labels, labels, test_size=0.20,
random_state=100)
• train_test=len(data_train),len(data_test),len(labels_train),len(labels_test)
• import matplotlib.pyplot as plt; plt.rcdefaults() import numpy asnp
importmatplotlib.pyplot as plt

objects = ('X_train', 'X_test', 'Y_train', 'Y_test') y_pos


= np.arange(len(objects))
performance = train_test

plt.bar(y_pos, performance, align='center', alpha=0.5)


plt.xticks(y_pos, objects)
plt.ylabel('Data Values')
plt.title('Splitted Data as X and Y')

plt.show()

• train_data=labels_train.value_counts() import
matplotlib.pyplot as plt;plt.rcdefaults()

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

importnumpy as np
importmatplotlib.pyplot as plt

objects = ('Yes', 'No')


y_pos = np.arange(len(objects))
performance =train_data

plt.bar(y_pos, performance, align='center', alpha=0.5)


plt.xticks(y_pos,objects)
plt.ylabel('Data Values')
plt.title('Train Data')
plt.show()

• test_data=labels_test.value_counts()
objects = ('Yes', 'No')
y_pos = np.arange(len(objects))
performance = test_data

plt.bar(y_pos, performance, align='center', alpha=0.5)


plt.xticks(y_pos, objects)
plt.ylabel('Data Values')
plt.title('Test Data')

• plt.show()
fromsklearn.tree import DecisionTreeClassifier model
= DecisionTreeClassifier()
model.fit(data_train,labels_train)
• pred_label =model.predict(data_test)
• import matplotlib.pyplot as
pltdefplot_confusion_matrix(cm,classes,

normalize=False,
title='Confusion matrix',

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

cmap=plt.cm.Blues):

plt.imshow(cm, interpolation='nearest', cmap=cmap)


plt.title(title)
plt.colorbar()

tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

if normalize:

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]


print("Normalized confusion matrix")
else:

print('Confusion matrix, without normalization')

thresh = cm.max() /2.

fori, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i,


cm[i,j],
horizontalalignment="center",

color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')

• from sklearn importmetrics

fromsklearn.metrics import confusion_matrix,accuracy_score cm =


confusion_matrix(labels_test,pred_label)
print(cm)

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

accuracy_score(labels_test,pred_label)

• import seaborn as sns import


matplotlib.pyplot asplt

labels = [0,1]

sns.heatmap(cm, annot=True, cmap="YlGnBu", fmt=".3f", xticklabels=labels,


yticklabels=labels)

plt.show()

Feature_Extraction

• import numpy as np
import pandas aspd
• raw_data= pd.read_csv('1000.txt',sep="delimeter", header=None, engine='python')
• raw_data.head()
• raw_data.columns=["websites"] raw_data.head()
• seperation_of_protocol = raw_data['websites'].str.split("://",expand =True)
• seperation_of_protocol.head()
• type(seperation_of_protocol)
• seperation_of_protocol.columns=["Protocol","domain_name","address"]
seperation_of_protocol.head()
• seperation_domain_name =
seperation_of_protocol['domain_name'].str.split("/",1,expand =True)
• seperation_domain_name.columns=["Domain_name","Address"]
• seperation_domain_name.head()
• splitted_data =
pd.concat([seperation_of_protocol['Protocol'],seperation_domain_name],axis
=1)
• splitted_data.columns =['protocol','domain_name','address']
• splitted_data.head()
• type(splitted_data)
• splitted_data.isnull().sum()
• splitted_data =splitted_data.dropna()

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

• splitted_data.isnull().sum()
• deflong_url(l):
• """This function is defined in order to differntiate website based on the length of
theURL"""
iflen(l) <54: return0
eliflen(l) >= 54 and len(l) <= 75:
return2
return 1
• splitted_data['long_url'] =splitted_data['address'].apply(long_url)
• splitted_data[splitted_data.long_url ==0].head()
• defhave_at_symbol(l):
"""This function is used to check whether the URL contains @symbol or not"""
if "@" in l:
return 1
return 0
• splitted_data['having_@_symbol'] =
splitted_data['address'].apply(have_at_symbol)
• splitted_data.head()
• defredirection(l):
"""If the url has symbol(//) after protocol then such URL is to be classified
as phishing """
if "//" in l:
return 1
return 0
• splitted_data['redirection_//_symbol'] =
splitted_data['domain_name'].apply(redirection)
• splitted_data.head()
• defredirection(l):
"""If the url has symbol(//) after protocol then such URL is to be classified
as phishing """
if "//" in l:
return 1
return 0
• splitted_data['redirection_//_symbol'] =
splitted_data['domain_name'].apply(redirection)
• splitted_data.head()

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

• defsub_domains(l):
ifl.count('.') <3: return0
elifl.count('.') == 3:
return2
return 1
• splitted_data['sub_domains'] =
splitted_data['domain_name'].apply(sub_domains)
• splitted_data.head()
• re
• defhaving_ip_address(url):
match=re.search('(([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-
5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\/)|'#IPv4
'((0x[0-9a-fA-F]{1,2})\\.(0x[0-9a-fA-F]{1,2})\\.(0x[0-9afA-F]{1,2})\\.(0x[0-
9a-fA-F]{1,2})\\/)' #IPv4 in hexadecimal
'(?:[a-fA-F0-9]{1,4}:){7}[a-fA-F0-9]{1,4}',url) #Ipv6
if match:
#print match.group()
return 1
else:
#print 'No matching pattern found'
return 0

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 9
OUTPUT SCREENS

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 9

OUTPUT SCREEN

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Confusion matrix

Fake 300

200

100

60

Predicted label

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 10
CONCLUSION AND
FUTURE ENHANCEMENT

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 10

CONCLUSION AND FUTURE ENHANCEMENT

Conclusion

Finally, phishing attacks are a major problem. It is important that they are countered. The
work reported in this thesis indicates how understanding of the nature of phishing may be
increased and provides a method to identify phishing problems in systems. It also contains a
prototype of a system that catches those phishing attacks that evaded other defences, i.e. those
attacks that have <slipped through the net=. An original contribution has been made in this
important field, and the work reported here has the potential to make the internet world a
safer place for a significant number of people.

Future Work

In the future we provide some technical solution by improve the efficiency of spam filters. By
which too many mails are classified correctly and properly. By this legitimate user can surf
internet with less fear. The user-phishing interaction model was derived from application of
cognitive walkthroughs. A large-scale controlled user study and follow on interviews could
be carried out to provide a more rigorous conclusion. The current model does not describe
irrational decision making nor address influence by other external factors such as emotion,
pressure, and other human factors. It would be very useful to expand the model to
accommodate these factors. we have theoretically and experimentally evaluated of Phish
Limiter. We have evaluated the trustworthiness of each SDN üow to identify any potential
hazards based on each deep packet inspection. Likewise, we have observed how the proposed
inspection approach of two SF and FI modes within Phish Limiter detects and mitigates
phishing attacks before reaching end users if the üow has been determined untrustworthy.
Using our real-world experimental evaluation on GENI and phishing dataset, we have
demonstrated that Phish Limiter is an effective and efûcient solution to detect and mitigate
phishing attacks with its accuracy of98.39%.

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 11
BIBLIOGRAPHY & REFERENCE

Downloaded by Kumaran Kutty ([email protected])


lOMoARcPSD|43124770

CHAPTER 11
BIBLIOGRAPHY AND REFERENCES

1]APAC. (Dec. 2018). Fishing Website Processing Bulletin. Accessed:2019. [Online].

Available:http://www.apac.cn/gzdt/M. Khonji, Y. Iraqi, and A. Jones, ``Phishing


detection: A literature survey,'' IEEE Commun. Surveys Tuts., vol. 15, no. 4,
pp.20912121,4th Quart., 2013.
[2] A. Acquisti, I. Adjerid, R. Balebako, L. Brandimarte, L. F.Cranor,S.
Komanduri, P. G. Leon, N. Sadeh, F. Schaub, M. Sleeper, Y. Wang, andS. Wilson,
``Nudges for privacy and security: Understanding and assisting users' choices online,''
ACM Computing Surv., vol. 50, no. 3, 2017,Art. no. 44.
[3] M. M. Moreno-Fernández, F. Blanco, P. Garaizar, and H. Matute, ``Fishing for
phishers. Improving Internet users' sensitivity to visual deceptioncuesto prevent
electronic fraud,'' Comput. Hum. Behav., vol. 69, pp. 421436, Apr. 2017.
[4] M. Junger, L. Montoya, and F.-J. Overink, ``Priming and warnings are not
effective to prevent social engineering attacks,'' Comput. Hum.Behav.,vol. 66, pp.
7587, Jan. 2017.
[5] E.-S. M. El-Alfy, ``Detection of phishing websites based on probabilistic neural
networks and K-medoids clustering,'' Comput. J., vol. 60, no.12,pp. 17451759, 2017.

[6] C. Huang, S. Hao, L. Invernizzi, Y. Fang, C. Kruegel, and G.Vigna,


``Gossip: Automatically identifying malicious domains from mailing list
discussions,'' in Proc. ACM Asia Conf. Comput. Commun. Secur. (ASIA CCS), Abu
Dhabi, United Arab Emirates, Apr. 2017, pp. 494505.

[7] F. Vanhoenshoven, G. Nápoles, R. Falcon, K. Vanhoof, and M.Köppen,


``Detecting malicious URLs using machine learning techniques,'' in Proc.
IEEE Symp. Ser. Comput. Intell. (SSCI), Dec. 2016, pp. 18.
[8] J. Saxe, R. Harang, C. Wild, and H. Sanders, ``A deep learningapproach to fast,
format-agnostic detection of malicious Web content,'' inProc.
IEEE Symp. Secur. Privacy Workshops (SPW), San Francisco, CA , USA, Aug. 2018, pp.
814.
[9] L.Wu, X. Du, and J.Wu, ``Effective defense schemes for phishing attacks on
mobile computing platforms,'' IEEE Trans. Veh. Technol., vol. 65, no.8,pp. 66786691,
Aug. 2016.

Downloaded by Kumaran Kutty ([email protected])

You might also like