0% found this document useful (0 votes)
147 views56 pages

Phani Project

This document is a project report submitted to Jawaharlal Nehru Technological University for a Master of Technology degree in Computer Science and Engineering. It describes a project on "A Bi-objective Hyper-heuristic Support Vector Machines for Big Data Cyber Security" carried out by Phani Krishna Yadlapati under the guidance of Dr. C. Prakasa Rao. The report includes an introduction, literature review, system analysis, design, implementation details, testing procedures, screenshots and conclusions.

Uploaded by

Mens PRO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views56 pages

Phani Project

This document is a project report submitted to Jawaharlal Nehru Technological University for a Master of Technology degree in Computer Science and Engineering. It describes a project on "A Bi-objective Hyper-heuristic Support Vector Machines for Big Data Cyber Security" carried out by Phani Krishna Yadlapati under the guidance of Dr. C. Prakasa Rao. The report includes an introduction, literature review, system analysis, design, implementation details, testing procedures, screenshots and conclusions.

Uploaded by

Mens PRO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

A Major Project Report On

“A Bi-objective Hyper-heuristic Support Vector


Machines for Big Data Cyber Security”

A Project report submitted to Jawaharlal Nehru Technological


University, Kakinada. In the partial fulfillment for the award of the Degree in

MASTER OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted By:

PHANI KRISHNA YADLAPATI (19F91D5808)

Under the noble guidance of

Dr. C. PRAKASA RAO M. tech, Ph.D.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


PRAKASAM ENGINEERING COLLEGE

(An ISO and NAAC 9001-2008 Certified Institution)

(Affiliated to Jawaharlal Nehru Technological University, Kakinada)


O.V. ROAD, KANDUKUR-523105, A.P.

2019-2021

1
PRAKASAM ENGINEERING COLLEGE
(An ISO and NAAC 9001-2008 Certified Institution)
(Affiliated to Jawaharlal Nehru Technological University, Kakinada)

O.V. ROAD, KANDUKUR- 523105, A.P.

DEPARTMENT
OF
COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that the project entitled “A Bi-objective Hyper-heuristic Support


Vector machines for Big Data Cyber Security “is a Bonafide work PHANI
KRISHANA.Y (19F91D5808) in the partial fulfillment of the requirement for the award of
the degree in COMPUTER SCIENCE AND ENGINEERING forthe academic year
2020-2021. This work is done under my supervision and guidance.

Signature of the GUIDE Signature of the HOD

Dr. C. PRAKASA RAO M. tech, Ph.D. Mr.M.RAYADU MTech, PhD


HOD of CSE

Signature of the External Examiner

2
DECLARATION

We do here by declare that the project work entitled “A Bi-objective Hyper-


heuristic Support Vector machines for Big Data Cyber Security” is a genuine
work carried out by me under the guidance of Dr. C. PRAKASA RAO M. tech, Ph.D.

in partial fulfillment for the award of the degree of "Master of Technology in


COMPUTER SCIENCE AND ENGINEERING" of Jawaharlal Nehru
Technological University, Kakinada.

PHANI KRISHNA.Y(19F91D5808)

3
ACKNOWLEDGEMENT

I feel to render my thankful acknowledgement to the following


distinguished personalities, who stretched their helping hand to me, in
completing my project work.

I am very grateful and my sincere thanks to my secretary &


correspondent Dr.K.RAMAIAH of PRAKASAM ENGINEERING
COLLEGE for giving this opportunity.

I hereby, express my regards and extend my gratitude to my


PRINCIPAL, Dr.M.LAKSHMAN RAO, for giving this opportunity
to do the thesis as a part of our course.

I express my deep sense of gratitude to Mr. M. RAYUDU


M.tech,Ph.D Head of the Department, Department of CSE

I would like to express my sincere thanks to my internal guide


Dr. C. PRAKASA RAO M.tech,Ph.D ,for having shown keen interest at
every stage of development to fourths is and guiding in every aspect.

I would like to thank all Faculties in Prakasam Engineering


College for their constant encouragement and for being a great group of
knowledgeable and cooperative people to work with.

I would also like to thank all my friends in the College for their
constant encouragement and to my parents for being a great opportunity
to provide the facility economically and cooperatively as the friends.

4
Table of Contents
CHAPTER PAGE NO
Abstract .................................................................................................................................i
1. Introduction ........................................................................................................................ 1
2. LITREATURE SURVEY ........................................................................................... 2-5
2.1 Novel feature extraction, selection and fusion .................................................................................... 2
2.2 Efficient string matching: An aid to bibliographic search .................................................................. 2
2.3 A meta-learning approach to automatic kernel ................................................................................... 4
2.4 Automatic model selection for the optimization of support
Vector machine Kernals ..................................................................................................................... 5
3. SYSTEM ANALYSIS.................................................................................................. 6-9
3.1 Existing System ................................................................................................................................ 6
3.2 Proposed System ............................................................................................................................... 6
3.3 System requirements ......................................................................................................................... 7
3.4 System study… ................................................................................................................................. 8
4. SYSTEM DESIGN.................................................................................................... 10-17

4.1 System Architecture ........................................................................................................................ 10


4.2 UML diagrams ................................................................................................................................. 11
4.3 Implementations .............................................................................................................................. 14
5. SOFTWARE ENVIRONMENT .............................................................................. 18-30
5.1 Modules Used in Project ................................................................................................................. 18
6. SYSTEM TESTS ...................................................................................................... 31-34
6.1 Types of tests .................................................................................................................................. 31
SOURCE CODE .............................................................................................................................. 35-39

7. SCREEN SHOTS .................................................................................................... 40-46

8. CONCLUSIONS.......................................................................................................... 47
9. REFERENCE .......................................................................................................... 48-49

5
LIST OF FIGURES

Figures Page No
4.1 Proposed Methodology ....................................................................... 10
4.2 Use Case Diagram ............................................................................... 13
4.3 Class Diagram ..................................................................................... 14
4.4 Sequence Diagram .............................................................................. 15
4.5 Activity Diagram ................................................................................ 16
4.6 ER User Diagram ............................................................................... 17
4.7 ER Admin Diagram ........................................................................... 17

List Of Tables
Tables Page No
Table 1 ................................................................................................... 34
Table 2 ................................................................................................... 34

LIST OF SCREENSHOTS

Screenshots Page No
7.1 Screenshot 1 .................................................................................. 40
7.2 Screenshot 2 .................................................................................. 41
7.3 Screenshot 3 .................................................................................. 42
7.4 Screenshot 4 .................................................................................. 43
7.5 Screenshot 5 .................................................................................. 44
7.6 Screenshot 6 .................................................................................. 45
7.7 Screenshot 7 .................................................................................. 46

6
ABSTRACT

Cyber security in the context of big data is known to be a critical problem and
presents a great challenge to the research community. Machine learning
algorithms have been suggested as candidates for handling big data security
problems. Among these algorithms, support vector machines (SVMs) have
achieved remarkable success on various classification problems. However, to
establish an effective SVM, the user needs to define the proper SVM
configuration in advance, which is a challenging task that requires expert
knowledge and a large amount of manual effort for trial and error. In this work,
we formulate the SVM configuration process as a bi-objective optimization
problem in which accuracy and model complexity are considered as two
conflicting objectives. We propose a novel hyper heuristic framework for bi-
objective optimization that is independent of the problem domain. This is the
first time that a hyper-heuristic has been developed for this problem. The
proposed hyper-heuristic framework consists of a high-level strategy and low-
level heuristics. The high-level strategy uses the search performance to control
the selection of which low-level heuristic should be used to generate a new
SVM configuration. The low-level heuristics each use different rules to
effectively explore the SVM configuration search space. To address bi-objective
optimization, the proposed framework adaptively integrates the strengths of
decomposition- and Pareto-based approaches to approximate the Pareto set of
SVM configurations. The effectiveness of the proposed framework has been
evaluated on two cyber security problems: Microsoft malware big data
classification and anomaly intrusion detection. The obtained results demonstrate
that the proposed framework is very effective, if not superior, compared with its
counterparts and other algorithms.
Index Terms—Hyper-heuristics, Big data, Cyber security, Optimization.

7
CHAPTER 1

INTRODUCTION
1.1 MOTIVATION
This work presents a novel bi-objective hyper-heuristic framework for SVM
configuration optimization. Hyper-heuristics are more effective than other
methods because they are independent of the particular task at hand and can
often obtain highly competitive configurations.
1.2 PROBLEM DEFINITION
Designing an effective detection method using machine learning algorithm is a
challenging task due to the large number of possible design options and the lack
of intelligent way for how to choose and/or combine existing options. This work
addresses these challenges through proposing a hyper-heuristic framework to
search the space of the design options and their values, and iteratively combine
and adapt different options for different problem instances
1.3 OBJECTIVE OF PROJECT
To address the bi-objective optimization problem, we propose a population-
based hyper-heuristic framework that operates on a population of solutions and
uses an archive to save the non-dominated solutions. The proposed framework
combines the strengths of decomposition- and Pareto (dominance)- based
approaches to effectively approximate the Pareto set of SVM configurations.
Our idea is to combine the diversity ability of the decomposition approach with
the convergence power of the dominance approach. The decomposition
approach operates on the population of solutions, whereas the dominance
approach uses the archive. The hyper-heuristic framework generates a new
population of solutions using either the old population, the archive, or both the
old population and the archive. This allows the search to achieve a proper
balance between convergence and diversity.

1|Page
CHAPTER 2
LITERATURE SURVEY

2.1 Novel feature extraction, selection and fusion for effective malware
family classification

AUTHORS: Mansour Ahmadi, Dmitry Ulyanov, Stanislav Semenov,


Mikhail Trofimov, and Giorgio Giacinto.
Modern malware is designed with mutation characteristics, namely
polymorphism and metamorphism, which causes an enormous growth in the
number of variants of malware samples. Categorization of malware samples
on the basis of their behaviors is essential for the computer security
community, because they receive huge number of malware everyday, and the
signature extraction process is usually based on malicious parts
characterizing malware families. Microsoft released a malware classification
challenge in 2015 with a huge dataset of near 0.5 terabytes of data,
containing more than 20K malware samples. The analysis of this dataset
inspired the development of a novel paradigm that is effective in
categorizing malware variants into their actual family groups. This paradigm
is presented and discussed in the present paper, where emphasis has been
given to the phases related to the extraction, and selection of a set of novel
features for the effective representation of malware samples. Features can be
grouped according to different characteristics of malware behavior, and their
fusion is performed according to a per-class weighting paradigm. The
proposed method achieved a very high accuracy ($\approx$ 0.998) on the
Microsoft Malware Challenge dataset.

2|Page
2.2 Efficient string matching: An aid to bibliographic search
AUTHORS:Alfred V Aho and Margaret J Corasick.
This paper describes a simple, efficient algorithm to locate all
occurrences of any of a finite number of keywords in a string of text. The
algorithm consists of constructing a finite state pattern matching machine
from the keywords and then using the pattern matching machine to
process the text string in a single pass. Construction of the pattern
matching machine takes time proportional to the sum of the lengths of
the keywords. The number of state transitions made by the pattern
matching machine in processing the text string is independent of the
number of keywords. The algorithm has been used to improve the speed
of a library bibliographic search program by a factor of 5 to 10.

2.3 A meta-learning approach to automatic kernel selection for


support vector machines.

AUTHORS: Shawkat Ali and Kate A Smith-Miles.


Appropriate choice of a kernel is the most important ingredient of the kernel-
based learning methods such as support vector machine (SVM). Automatic
kernel selection is a key issue given the number of kernels available, and the
current trial-and-error nature of selecting the best kernel for a given problem.
This paper introduces a new method for automatic kernel selection, with
empirical results based on classification. The empirical study has been
conducted among five kernels with 112 different classification problems,
using the popular kernel based statistical learning algorithm SVM. We
evaluate the kernels’ performance in terms of accuracy measures. We then
focus on answering the question: which kernel is best suited to which type of

3|Page
classification problem? Our meta-learning methodology involves measuring
the problem characteristics using classical, distance and distribution-based
statistical information. We then combine these measures with the empirical
results to present a rule-based method to select the most appropriate kernel
for a classification problem. The rules are generated by the decision tree
algorithm C5.0 and are evaluated with 10 fold cross validation. All generated
rules offer high accuracy ratings.

2.4 Automatic model selection for the optimization of


support vector machine kernels.
AUTHORS: Nedjem-Eddine Ayat, Mohamed Cheriet,
and Ching Y Suen
This approach aims to optimize the kernel parameters and to efficiently reduce
the number of support vectors, so that the generalization error can be reduced
drastically. The proposed methodology suggests the use of a new model
selection criterion based on the estimation of the probability of error of the
SVM classifier. For comparison, we considered two more model selection
criteria: GACV (‘Generalized Approximate Cross-Validation’) and VC
(‘Vapnik-Chernovenkis’) dimension. These criteria are algebraic estimates of
upper bounds of the expected error. For the former, we also propose a new
minimization scheme. The experiments conducted on a bi-class problem show
that we can adequately choose the SVM hyper-parameters using the empirical
error criterion. Moreover, it turns out that the criterion produces a less complex
model with fewer support vectors. For multi-class data, the optimization
strategy is adapted to the one-against-one data partitioning. The approach is
then evaluated on images of handwritten digits from the USPS database.

4|Page
2.5 A PSO and Pattern Search based Memetic
Algorithm for SVMs Parameters Optimization

AUTHORS:Yukun Bao, Zhongyi Hu, and Tao Xiong.


Addressing the issue of SVMs parameters optimization, this study proposes an
efficient memetic algorithm based on Particle Swarm Optimization algorithm
(PSO) and Pattern Search (PS). In the proposed memetic algorithm, PSO is
responsible for exploration of the search space and the detection of the potential
regions with optimum solutions, while pattern search (PS) is used to produce an
effective exploitation on the potential regions obtained by PSO. Moreover, a
novel probabilistic selection strategy is proposed to select the appropriate
individuals among the current population to undergo local refinement, keeping a
well balance between exploration and exploitation. Experimental results
confirm that the local refinement with PS and our proposed selection strategy
are effective, and finally demonstrate effectiveness and robustness of the
proposed PSO-PS based MA for SVMs parameters optimization.

5|Page
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM:


❖ the existing pool of low-level heuristics, applies it to the current solution
to produce a new solution and then decides whether to accept the new
solution. The low-level heuristics constitute a set of problem-specific
heuristics that operate directly on the solution space of a given problem

3.2 PROPOSED SYSTEM:


❖ Our proposed hyper-heuristic framework integrates several key
components that differentiate it from existing works to find an effective
SVM configuration for big data cyber security. First, the framework
considers a bi-objective formulation of the SVM configuration problem,
in which the accuracy and model complexity are treated as two
conflicting objectives. Second, the framework controls the selection of
both the kernel type and kernel parameters as well as the soft margin
parameter. Third, the hyper-heuristic framework combines the strengths
of decomposition- and Pareto-based approaches in an adaptive manner to
find an approximate Pareto set of SVM configurations. The performance
of the proposed framework is validated and compared with that of state-
of-the-art algorithms on two cyber security problems: Microsoft malware
big data classification and anomaly intrusion detection. The empirical
results fully demonstrate the effectiveness of the proposed framework on
both problems.

6|Page
3.3 SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:

• System : Pentium IV 2.4 GHz.


• Hard Disk : 40 GB.
• Floppy Drive : 1.44 MB.
• RAM : 512 MB.

SOFTWARE REQUIREMENTS:

• Operating System: Windows

• Coding Language: Python 3.7

• Database: MySql

3.4 SYSTEM STUDY

FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be
carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major
requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

7|Page
 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system
will have on the organization. The amount of fund that the company can pour
into the research and development of the system is limited. The expenditures
must be justified. Thus the developed system as well within the budget and this
was achieved because most of the technologies used are freely available. Only
the customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a
high demand on the available technical resources. This will lead to high
demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this
system.

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by


the user. This includes the process of training the user to use the system
efficiently. The user must not feel threatened by the system, instead must accept

8|Page
it as a necessity. The level of acceptance by the users solely depends on the
methods that are employed to educate the user about the system and to make
him familiar with it. His level of confidence must be raised so that he is also
able to make some constructive criticism, which is welcomed, as he is the final
user of the system.

9|Page
CHAPTER 4

SYSTEM DESIGN

4.1 SYSTEM ARCHITECTURE:

The flowchart of the proposed methodology (abbreviated as HH-SVM) is


depicted in Figure 1. The methodology has two parts: the SVM and the hyper-
heuristic framework. The main role of the hyper-heuristic framework is to
generate a configuration (C, kernel type and kernel parameters) and send it to
the SVM. The SVM uses the generated configuration to solve a given problem
instance and then sends the cost function (mean values of err and NSV ) to the
hyper-heuristic framework. This process is repeated for a certain number of
iterations. In the following subsections, we discuss the proposed hyper heuristic
framework along with its main components.

Fig 4.1:- the proposed methodology


4.2 UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized


general-purpose modeling language in the field of object-oriented software

10 | P a g e
engineering. The standard is managed, and was created by, the Object
Management Group.
The goal is for UML to become a common language for creating models
of object oriented computer software. In its current form UML is comprised of
two major components: a Meta-model and a notation. In the future, some form
of method or process may also be added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as
well as for business modeling and other non-software systems.
The UML represents a collection of best engineering practices that have
proven successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented
software and the software development process. The UML uses mostly
graphical notations to express the design of software projects.

GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so
that they can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.

11 | P a g e
USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type
of behavioral diagram defined by and created from a Use-case analysis. Its
purpose is to present a graphical overview of the functionality provided by a
system in terms of actors, their goals (represented as use cases), and any
dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the
actors in the system can be depicted.

12 | P a g e
Fig 4.2 :-Use Case Diagram

13 | P a g e
CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language
(UML) is a type of static structure diagram that describes the structure of a
system by showing the system's classes, their attributes, operations (or
methods), and the relationships among the classes. It explains which class
contains information.

Fig 4.3:-Class Diagram

SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in
what order. It is a construct of a Message Sequence Chart. Sequence diagrams
are sometimes called event diagrams, event scenarios, and timing diagrams.
14 | P a g e
Fig 4.4:- Sequence Diagram

ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. In the

15 | P a g e
Unified Modeling Language, activity diagrams can be used to describe the
business and operational step-by-step workflows of components in a system. An
activity diagram shows the overall flow of control.

Fig 4.5:-Activity Diagram

16 | P a g e
4.3 ER Diagram

a.User

Fig 4.6:- User ER Diagram

b. Admin

Fig 4.7:- Admin ER Diagram


17 | P a g e
4.4 IMPLEMENTATION:
MODULES:

CYBER ANALYSIS
Cyber threat analysis is a process in which the knowledge of internal and external
information vulnerabilities pertinent to a particular organization is matched against real-world
cyber-attacks. With respect to cyber security, this threat-oriented approach to combating cyber-
attacks represents a smooth transition from a state of reactive security to a state of proactive one.
Moreover, the desired result of a threat assessment is to give best practices on how to maximize
the protective instruments with respect to availability, confidentiality and integrity, without
turning back to usability and functionality conditions. CYPER ANALYSIS. A threat could be
anything that leads to interruption, meddling or destruction of any valuable service or item
existing in the firm’s repertoire. Whether of “human” or “nonhuman” origin, the analysis must
scrutinize each element that may bring about conceivable security risk.
DATASET MODIFICATION

If a dataset in your dashboard contains many dataset objects, you can hide specific dataset
objects from display in the Datasets panel. For example, if you decide to import a large amount
of data from a file, but do not remove every unwanted data column before importing the data into
Web, you can hide the unwanted attributes and metrics,

DATA REDUCTION

Improve storage efficiency through data reduction techniques and capacity optimization
using data reduplication, compression, snapshots and thin provisioning. Data reduction via
simply deleting unwanted or unneeded data is the most effective way to reduce a storing’s data

RISKY USER DETECTION

False alarm immunity to prevent customer embarrassment, High detection rate to protect
all kinds of goods from theft, Wide-exit coverage offers greater flexibility for entrance/exit
layouts, Wide range of attractive designs complement any store décor, Sophisticated digital
controller technology for optimum system performance.

18 | P a g e
19 | P a g e
CHAPTER-5
SOFTWARE ENVIRONMENT

5.1 Modules Used in Project :-

TensorFlow

TensorFlow is a free and open-source software library for dataflow and


differentiable programming across a range of tasks. It is a symbolic math
library, and is also used for machine learning applications such as neural
networks. It is used for both research and production at Google. TensorFlow
was developed by the Google Brain team for internal Google use. It was
released under the Apache 2.0 open-source license on November 9, 2015.

Numpy

Numpy is a general-purpose array-processing package. It provides a high-


performance multidimensional array object, and tools for working with these
arrays.

It is the fundamental package for scientific computing with Python. It


contains various features including these important ones:

▪ A powerful N-dimensional array object


▪ Sophisticated (broadcasting) functions
▪ Tools for integrating C/C++ and Fortran code
▪ Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, Numpy can also be used as an efficient
multi-dimensional container of generic data. Arbitrary data-types can be
defined using Numpy which allows Numpy to seamlessly and speedily
integrate with a wide variety of databases.

20 | P a g e
Pandas

Pandas is an open-source Python Library providing high-performance data


manipulation and analysis tool using its powerful data structures. Python
was majorly used for data munging and preparation. It had very little
contribution towards data analysis. Pandas solved this problem. Using
Pandas, we can accomplish five typical steps in the processing and analysis
of data, regardless of the origin of data load, prepare, manipulate, model, and
analyze. Python with Pandas is used in a wide range of fields including
academic and commercial domains including finance, economics, Statistics,
analytics, etc.
Matplotlib

Matplotlib is a Python 2D plotting library which produces publication


quality figures in a variety of hardcopy formats and interactive environments
across platforms. Matplotlib can be used in Python scripts, the Python
and IPython shells, the Jupyter Notebook, web application servers, and four
graphical user interface toolkits. Matplotlib tries to make easy things easy
and hard things possible. You can generate plots, histograms, power spectra,
bar charts, error charts, scatter plots, etc., with just a few lines of code. For
examples, see the sample plots and thumbnail gallery.

For simple plotting the pyplot module provides a MATLAB-like interface,


particularly when combined with IPython. For the power user, you have full
control of line styles, font properties, axes properties, etc, via an object
oriented interface or via a set of functions familiar to MATLAB users.

Scikit – learn

Scikit-learn provides a range of supervised and unsupervised learning


algorithms via a consistent interface in Python. It is licensed under a
permissive simplified BSD license and is distributed under many Linux
21 | P a g e
distributions, encouraging academic and commercial use. Python

Python is an interpreted high-level programming language for general-


purpose programming. Created by Guido van Rossum and first released in
1991, Python has a design philosophy that emphasizes code readability,
notably using significant whitespace.

Python features a dynamic type system and automatic memory


management. It supports multiple programming paradigms, including
object-oriented, imperative, functional and procedural, and has a large and
comprehensive standard library.

• Python is Interpreted − Python is processed at runtime by the interpreter.


You do not need to compile your program before executing it. This is similar
to PERL and PHP.
• Python is Interactive − you can actually sit at a Python prompt and interact
with the interpreter directly to write your programs.
Python also acknowledges that speed of development is important.
Readable and terse code is part of this, and so is access to powerful
constructs that avoid tedious repetition of code. Maintainability also ties
into this may be an all but useless metric, but it does say something about
how much code you have to scan, read and/or understand to troubleshoot
problems or tweak behaviors. This speed of development, the ease with
which a programmer of other languages can pick up basic Python skills and
the huge standard library is key to another area where Python excels. All its
tools have been quick to implement, saved a lot of time, and several of
them have later been patched and updated by people with no Python
background - without breaking.

Install Python Step-by-Step in Windows and Mac :

22 | P a g e
Python a versatile programming language doesn’t come pre-installed on your
computer devices. Python was first released in the year 1991 and until today it
is a very popular high-level programming language. Its style philosophy
emphasizes code readability with its notable use of great whitespace.
The object-oriented approach and language construct provided by Python
enables programmers to write both clear and logical code for projects. This
software does not come pre-packaged with Windows.

How to Install Python on Windows and Mac :

There have been several updates in the Python version over the years. The
question is how to install Python? It might be confusing for the beginner who
is willing to start learning Python but this tutorial will solve your query. The
latest or the newest version of Python is version 3.7.4 or in other words, it is
Python 3.
Note: The python version 3.7.4 cannot be used on Windows XP or earlier
devices.

Before you start with the installation process of Python. First, you need to
know about your System Requirements. Based on your system type i.e.
operating system and based processor, you must download the python version.
My system type is a Windows 64-bit operating system. So the steps below are
to install python version 3.7.4 on Windows 7 device or to install Python
3. Download the Python Cheat sheet here. The steps on how to install Python
onWindows 10, 8 and 7 are divided into 4 parts to help understand better.

Download the Correct version into the system

Step 1: Go to the official site to download and install python using Google
23 | P a g e
Chrome or any other web browser. OR Click on the following
link: https://www.python.org

24 | P a g e
Now, check for the latest and the correct version for your operating system.

Step 2: Click on the Download Tab.

Step 3: You can either select the Download Python for windows 3.7.4 button

in Yellow Color or you can scroll further down and click on download with
respective to their version. Here, we are downloading the most recent python
version for windows 3.7.4

25 | P a g e
Step 4: Scroll down the page until you find the Files option.

Step 5: Here you see a different version of python along with the operating
system.

• To download Windows 32-bit python, you can select any one from the three
options: Windows x86 embeddable zip file, Windows x86 executable installer
or Windows x86 web-based installer.
•To download Windows 64-bit python, you can select any one from the three
options: Windows x86-64 embeddable zip file, Windows x86-64 executable
installer or Windows x86-64 web-based installer.
26 | P a g e
Here we will install Windows x86-64 web-based installer. Here your first part
regarding which version of python is to be downloaded is completed. Now we
move ahead with the second part in installing python i.e. Installation
Note: To know the changes or updates that are made in the version you can
click on the Release Note Option.
Installation of Python
Step 1: Go to Download and Open the downloaded python version to carry
out the installation process.

Step 2: Before you click on Install Now, Make sure to put a tick on Add Python

3.7 to PATH.

27 | P a g e
Step 3: Click on Install NOW After the installation is successful. Click on

Close.

With these above three steps on python installation, you have successfully and
correctly installed Python. Now is the time to verify the installation.
Note: The installation process might take a couple of minutes.

28 | P a g e
Verify the Python Installation
Step 1: Click on Start

Step 2: In the Windows Run Command, type “cmd”.

Step 3: Open the Command prompt option.

Step 4: Let us test whether the python is correctly installed. Type python –
V and press Enter.

Step 5: You will get the answer as 3.7.4

29 | P a g e
Note: If you have any of the earlier versions of Python already installed. You
must first uninstall the earlier version and then install the new one.

Check how the Python IDLE works


Step 1: Click on Start
Step 2: In the Windows Run command, type “python idle”.

Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program

Step 4: To go ahead with working in IDLE you must first save the file. Click
on File > Click on Save

Step 5: Name the file and save as type should be Python files. Click on SAVE.

Here I have named the files as Hey World.


Step 6: Now for e.g. enter print

30 | P a g e
CHAPTER 6

SYSTEM TEST
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a
way to check the functionality of components, sub assemblies, assemblies
and/or a finished product It is the process of exercising software with the intent
of ensuring that the Software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There are various
types of test. Each test type addresses a specific testing requirement.
6.1 TYPES OF TESTS

Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is
the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing,
that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and
contains clearly defined inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is
more concerned with the basic outcome of screens or fields. Integration tests

31 | P a g e
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct
and consistent. Integration testing is specifically aimed at exposing the
problems that arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be
exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on
requirements, key functions, or special test cases. In addition, systematic
coverage pertaining to identify Business process flows; data fields, predefined
processes, and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective
value of current tests is determined.
System Test
System testing ensures that the entire integrated software system
meets requirements. It tests a configuration to ensure known and predictable
results. An example of system testing is the configuration oriented system
integration test. System testing is based on process descriptions and flows,
emphasizing pre-driven process links and integration points.
White Box Testing
White Box Testing is a testing in which in which the software

32 | P a g e
tester has knowledge of the inner workings, structure and language of the
software, or at least its purpose. It is purpose. It is used to test areas that cannot
be reached from a black box level.
Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as
most other kinds of tests, must be written from a definitive source document,
such as specification or requirements document, such as specification or
requirements document. It is a testing in which the software under test is treated,
as a black box .you cannot “see” into it. The test provides inputs and responds to
outputs without considering how the software works.
Unit Testing
Unit testing is usually conducted as part of a combined code and
unit test phase of the software lifecycle, although it is not uncommon for coding
and unit testing to be conducted as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests
will be written in detail.
Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.
Integration Testing
Software integration testing is the incremental integration testing
of two or more integrated software components on a single platform to produce

33 | P a g e
failures caused by interface defects.
The task of the integration test is to check that components or software
applications, e.g. components in a software system or – one step up – software
applications at the company level – interact without error.
Test Results:All the test cases mentioned above passed successfully. No defects
encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system meets
the functional requirements.
Test Results:All the test cases mentioned above passed successfully. No defects
encountered.
Table 1: users_useradd_model
Columns (9)
Field Type
Id int(11) NOT NULL
Cname varchar(100) NOT NULL
Dept varchar(10000) NOT NULL
Description varchar(1000) NOT NULL
Website varchar(1000) NOT NULL
Method varchar(100) NOT NULL
Record varchar(500) NOT NULL
Attackresult varchar(500) NOT NULL
uregid_id int(11) NOT NULL

Table 2: users_userregister_model
Columns (7)
Field Type
id int(11) NOT NULL
name varchar(50) NOT NULL
email varchar(30) NOT NULL
password varchar(10) NOT NULL
phoneno varchar(15) NOT NULL
address varchar(500) NOT NULL
location varchar(20) NOT NULL

34 | P a g e
SOURCE CODE

from itertools import count

from MySQLdb import Date


from django.contrib import messages
from django.db.models import Count

from django.shortcuts import render, redirect, get_object_or_404

# Create your views here.


from admins.models import Sendquery
from cyber_alert import forms
from cyber_alert.forms import AdminForm, GiverForm
from cyber_alert.models import GiverTransaction, AdminRegister

def admin_login(request):
if request.method == "POST":
name = request.POST.get('name')
password = request.POST.get('password')
try:
check = AdminRegister.objects.get(name=name, password=password)
request.session['name'] = check.id

return redirect('giver_transaction')
except:
pass

return render(request, "admin_login.html")

def admin_register(request):
if request.method == "POST":
forms = AdminForm(request.POST)
if forms.is_valid():
forms.save()
messages.success(request, 'You have been successfully registered')
35 | P a g e
return redirect('admin_login')
else:
forms = AdminForm()

return render(request,'admin_register.html',{'form':forms})

def giver_transaction(request):
sd=''
aas=''
sw=''
q=''
name = request.session['name']
obj = AdminRegister.objects.get(id=name)
if request.method == "POST":
name = request.POST.get('name')
aadhar = request.POST.get('aadharno')
address = request.POST.get('address')
mobile = request.POST.get('mobileno')
bank = request.POST.get('bankname')
account= request.POST.get('accountno')
branch=request.POST.get('branchname')
amount=request.POST.get('amount')
ifsc= request.POST.get('ifsccode')
micr=request.POST.get('micrcode')
date=request.POST.get('date')
time= request.POST.get('time')
transaction=request.POST.get('transactionid')
sd=date.split("-")

GiverTransaction.objects.create(userid=obj,day=sd[0],month=sd[1],year=sd[2],nam
e=name,aadharno=aadhar,address=address,mobileno=mobile,bankname=bank,accou
ntno=account,branchname=branch,amount=amount,ifsccode=ifsc,micrcode=micr,da
te=date,time=time,transationid=transaction)

36 | P a g e
return render(request,'giver_transaction.html',{'form':sd,'we':q})

def analyze_page(request):
name = request.session['name']
admin_obj = AdminRegister.objects.get(id=name)
to_name = admin_obj.name
obj = GiverTransaction.objects.filter(name=to_name, )

return render(request, 'analyze_page.html', {'objv': obj})

def viewer(request,chart_type):
chart =
GiverTransaction.objects.values('month').annotate(dcount=Count('month'))

return render(request,"viewer.html",{'form':chart,'chart_type':chart_type})

def update(request):
name = request.session['name']
obj = AdminRegister.objects.get(id=name)
if request.method == "POST":
Admin_Id = request.POST.get('adminid', '')
Name = request.POST.get('name', '')
Email = request.POST.get('email', '')
Password = request.POST.get('password', '')
Phone_Number = request.POST.get('phoneno', '')
Address = request.POST.get('address', '')

obj = get_object_or_404(AdminRegister, id=name)


obj.adminid = Admin_Id
obj.name = Name
obj.email = Email
obj.password = Password
obj.phoneno = Phone_Number
obj.address = Address
obj.save(update_fields=["adminid", "name", "email", "password",
"phoneno", "address" ])
37 | P a g e
return redirect('admin_login')
return render(request, 'update.html',{'objc':obj})

def logout_page(request):
return redirect(admin_login)

def mydetails(request):
name = request.session["name"]
obj= AdminRegister.objects.get(id=name)
if request.method == "POST":
Admin_Id = request.POST.get('adminid','')
Name = request.POST.get('name', '')
Email = request.POST.get('email', '')
Password = request.POST.get('password', '')
Phone_Number = request.POST.get('phoneno', '')
Address = request.POST.get('address', '')

obj= get_object_or_404(AdminRegister, id=name)


obj.adminid = Admin_Id
obj.name = Name
obj.email = Email
obj.password = Password
obj.phoneno = Phone_Number
obj.address = Address

return render(request, 'mydetails.html', {'objc': obj})

def show(request):
return render(request,'show.html' )
def receivealert(request):
name = request.session['name']
admin_obj = AdminRegister.objects.get(id=name)
to_name = admin_obj.name
obj=Sendquery.objects.filter(name=to_name)

return render(request, 'receivealert.html',{'de':obj})

38 | P a g e
Table 1: users_useradd_model

Columns (9)

Field Type
id int(11) NOT NULL
cname varchar(100) NOT NULL
dept varchar(10000) NOT NULL
description varchar(1000) NOT NULL
website varchar(1000) NOT NULL
method varchar(100) NOT NULL
record varchar(500) NOT NULL
attackresult varchar(500) NOT NULL
uregid_id int(11) NOT NULL

Table 2: users_userregister_model

Columns (7)

Field Type
id int(11) NOT NULL
name varchar(50) NOT NULL
email varchar(30) NOT NULL
password varchar(10) NOT NULL
phoneno varchar(15) NOT NULL
address varchar(500) NOT NULL
location varchar(20) NOT NULL

39 | P a g e
CHAPTER-7
SCREENSHOTS

SCREENSHOT 1

40 | P a g e
SCREENSHOT 2

41 | P a g e
SCREENSHOT 3

42 | P a g e
SCREENSHOT 4

43 | P a g e
SCREENSHOT 5

44 | P a g e
SCREENSHOT 6

45 | P a g e
SCREENSHOT 7

46 | P a g e
CHAPTER-8

CONCLUSION

In this work, we proposed a hyper-heuristic SVM optimization framework for


big data cyber security problems. We formulated the SVM configuration
process as a bio objective optimization problem in which accuracy and model
complexity are treated as two conflicting objectives. This bi-objective
optimization problem can be solved using the proposed hyper-heuristic
framework. The framework integrates the strengths of decomposition- and Pare
to based approaches to approximate the Pareto set of configurations. Our
framework has been tested on two benchmark cyber security problem instances:
Microsoft malware big data classification and anomaly intrusion detection. The
experimental results demonstrate the effectiveness and potential of the proposed
framework in achieving competitive, if not superior, results compared with
other algorithms.

47 | P a g e
CHAPTER-9

REFERENCES

[1] Mansour Ahmadi, Dmitry Ulyanov, Stanislav Semenov, Mikhail Trofimov,


and Giorgio Giacinto. Novel feature extraction, selection and fusion for
effective malware family classification. In Proceedings of the Sixth ACM
Conference on Data and Application Security and Privacy, pages 183–194.
ACM, 2016.
[2] Alfred V Aho and Margaret J Corasick. Efficient string matching: an aid to
bibliographic search. Communications of the ACM, 18(6):333–340, 1975.
[3] Shawkat Ali and Kate A Smith-Miles. A meta-learning approach to
automatic kernel selection for support vector machines. Neurocomputing,
70(1):173–186, 2006.
[4] Nedjem-Eddine Ayat, Mohamed Cheriet, and Ching Y Suen. Automatic
model selection for the optimization of support vector machine kernels. Pattern
Recognition, 38(10):1733–1745, 2005.
[5] Yukun Bao, Zhongyi Hu, and Tao Xiong. A particle swarm optimization
and pattern search based memetic algorithm for svms parameters optimization.
Neurocomputing, 117:98–106, 2013.
[6] Rodrigo C Barros, M´arcio P Basgalupp, Andr´e CPLF de Carvalho, and
Alex A Freitas. A hyper-heuristic evolutionary algorithm for automatically
designing decision-tree algorithms. In Proceedings of the 14th annual
conference on Genetic and evolutionary computation, pages 1237–1244. ACM,
2012.
[7] M´arcio P Basgalupp, Rodrigo C Barros, Tiago S da Silva, and Andr´e
CPLF de Carvalho. Software effort prediction: a hyperheuristic decision-tree
based approach. In Proceedings of the 28th Annual ACM Symposium on
Applied Computing, pages 1109–1116. ACM, 2013.
[8] M´arcio P Basgalupp, Rodrigo C Barros, and ViliPodgorelec. Evolving
decision-tree induction algorithms with a multiobjective hyper-heuristic. In
Proceedings of the 30th Annual ACM Symposium on Applied Computing,
pages 110–117. ACM, 2015.
[9] Asa Ben-Hur and Jason Weston. A users guide to support vector machines.
Data mining techniques for the life sciences, pages 223–239, 2010.

48 | P a g e
[10] David Brumley, Cody Hartwig, Zhenkai Liang, James Newsome, Dawn
Song, and Heng Yin. Automatically identifying triggerbased behavior in
malware. Botnet Detection, pages 65–88, 2008.
[11] Edmund K Burke, Matthew Hyde, Graham Kendall, Gabriela Ochoa,
Ender Ozcan, and John R Woodward. A classification ¨ of hyper-heuristic
approaches. In Handbook of metaheuristics, pages 449–468. Springer, 2010.
[12] Athanassia Chalimourda, Bernhard Sch¨olkopf, and Alex J Smola.
Experimentally optimal ν in support vector regression for different noise models
and parameter settings. Neural Networks, 17(1):127–141, 2004.
[13] Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector
machines. ACM transactions on intelligent systems and technology (TIST),
2(3):27, 2011.
[14] Min Chen, Shiwen Mao, and Yunhao Liu. Big data: A survey. Mobile
Networks and Applications, 19(2):171–209, 2014.
[15] NelloCristianini and John Shawe-Taylor. An introduction to support vector
machines and other kernel-based learning methods. Cambridge university press,
2000.

49 | P a g e

You might also like