0% found this document useful (0 votes)

7 views10 pages

Web Application Vulnerability Prediction Using Machine Learning

The document discusses a proposed system for predicting web application vulnerabilities using machine learning, focusing on input validation and sanitization techniques. It highlights the importance of these techniques in securing web applications and presents a hybrid approach that combines static and dynamic program analysis to identify vulnerabilities at a fine-grained level. The proposed method aims to improve the accuracy and scalability of vulnerability prediction models, addressing the challenges posed by the lack of labeled vulnerability data.

Uploaded by

Blender Junior

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Web Application Vulnerability Prediction Using Machine Learning

Uploaded by

Blender Junior

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 80

ISSN 2229-5518

WEB APPLICATION VULNERABILITY PREDICTION USING

MACHINE LEARNING
1
Vignesh M, 2Dr. K. Kumar
1
PG Scholar, 2Assistant Professor
Department of Computer Science and Engineering,
Government College of Technology, Coimbatore
Mail id: [email protected], [email protected]

Abstract-Web applications have become software is also highly accessible, web

one of the most important communication application vulnerabilities arguably have
channels between various kinds of service greater impact than vulnerabilities in other
providers and clients. Along with the types of software. Web developers are
increased importance of Web applications, directly responsible for the security of web
the negative impact of security in such applications. Unfortunately, they often have
applications has grown as well. Due to the limited time to follow up with new arising
limited time and resources, web software security issues and are often not provided
engineers need support in identifying with adequate security training to become
vulnerable code. A practical approach to aware of state-of-the-art web security
predicting vulnerable code would enable techniques. Input validation and input
them to prioritize security auditing efforts. sanitization are two secure coding

IJSER
In this proposed system hybrid (static + techniques that they can adopt to protect
dynamic) program attributes are used to their programs from such common
characterize input validation and vulnerabilities. Input validation typically
sanitation code patterns which act as a checks an input against required properties
significant indicator of web application like data length, range, type, and sign. Input
vulnerabilities. Current vulnerable sanitization, in general, cleanses an input
prediction techniques rely on the string by accepting only pre-defined
availability of data labeled with the characters and rejecting others, including
vulnerability information for training. For characters with special meaning to the
most web application, past vulnerability interpreter under consideration. Intuitively,
data is often not available or at least not an application is vulnerable if the
complete. Hence to address both situations developers failed to implement these
where labeled past data is fully available or techniques correctly or to a sufficient
not fully available, this approach can be degree.
used. The web program is sliced into small The code attributes that characterize
sinks and by using dynamic and static validation and sanitization code
program analysis, input validation and implemented in the program could be used
sanitation attributes are generated. to predict web application vulnerabilities.
Based on this hypothesis, we propose a set
Keywords – Input validation, Input of code attributes called input validation
sanitation, Static Program analysis, and sanitization (IVS) attributes from which
Dynamic program analysis, Machine we build vulnerability predictors that are
learning. fine-grained, accurate, and scalable. The
approach is fine-grained because it
I INTRODUCTION identifies vulnerabilities at program
statement levels. We use both static and
Web applications play an important dynamic program analysis techniques to
role in many of our daily activities such as extract IVS attributes. Static analysis can
social networking, email, banking, help assess general properties of a program.
shopping, registrations, and so on. As web Yet, dynamic analysis can focus on more

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 81
ISSN 2229-5518

specific code characteristics that are This taint analysis would identify points
complementary to the information obtained where tainted data can enter the program,
with static analysis. We use dynamic propagate taint values along assignments
analysis only to infer the possible types of and similar constructs, and inform the user
input validation and sanitization code, of every sensitive sink that receives tainted
rather than to precisely prove their input. also perform an alias analysis for
correctness, and apply machine learning on providing information about alias
these inferences for vulnerability prediction. relationships. Moreover, it is very beneficial
Therefore, we mitigate the scalability issue for the taint analysis to know about the
typically associated with dynamic analysis. literal values that variables and constants
Thus, our proposed IVS attributes reflect may hold at each program point. This task is
relevant properties of the implementations performed by literal analysis.
of input validation and input sanitization
methods in web programs and are expected Y. Xie and A. Aiken [2], “Static
to help predict vulnerabilities in an accurate detection of security vulnerabilities in
and scalable manner. Furthermore, both scripting languages,” In this approach, static
supervised learning and semi-supervised analysis is applied to finding security
learning methods are used to build vulnerabilities in PHP. The goal is to
vulnerability predictors from IVS attributes, develop a bug detection tool that
such that our method can also be used in automatically finds serious vulnerabilities

IJSER
contexts where there is limited vulnerability with high confidence. An interprocedural
data for training. static analysis algorithm for PHP is
proposed. A language as dynamic as PHP
II RELATED WORKS presents unique challenges for static
analysis: language constructs that allow
N. Jovanovic, C. Kruegel, and E. dynamic inclusion of program code,
Kirda [1], have proposed “Pixy: A static variables whose types change during
analysis tool for detecting web application execution, operations with semantics that
vulnerabilities,” Pixy is the first open depend on the runtime types of the
source tool for statically detecting XSS operands, and pervasive use of hash tables
vulnerabilities in php code by means of data and regular expression matching are just
flow analysis. A flow-sensitive, some features that must be modelled well to
interprocedural, and context sensitive edata produce useful results. Proposed static
flow analysis for PHP, targeted detecting analysis algorithm is used to find SQL
taint-style vulnerabilities. This analysis injection vulnerabilities. Once configured,
process had to overcome significant the analysis is fully automatic. Although we
conceptual challenges due to the untyped focus on SQL injections in this system, the
nature of PHP. Additional literal analysis same techniques can be applied to detecting
and alias analysis are the steps that lead to other vulnerabilities such as cross site
more comprehensive and precise results scripting (XSS) and code injection in web
than those provided by previous applications. We parse the PHP source code
approaches. Pixy is a system that into abstract syntax trees (ASTs). The
implements the proposed analysis parser is based on the standard open source
technique, written in Java and licensed implementation of PHP 5.0.5. Each PHP
under the GPL. A straightforward approach source file contains a main section and zero
to solving the problem of detecting taint- or more user defined functions. We store
style vulnerabilities would be to the user-defined functions in the
immediately conduct a taint analysis on the environment and start the analysis from the
intermediate three-address code main function. For each function in the
representation generated by the front-end. program, the analysis performs a standard

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 82
ISSN 2229-5518

conversion from the abstract syntax tree helps to identify cases where the
(AST) of the function body into a control sanitization is incorrect or incomplete. A
flow graph (CFG). The nodes of the CFG dynamic analysis technique is introduced,
are basic blocks: maximal single entry, that is able to reconstruct the code that is
single exit sequences of statements. The responsible for the sanitization of
edges of the CFG are the jump 6 application inputs, and then execute this
relationships between blocks. For code on malicious inputs to identify faulty
conditional jumps, the corresponding CFG sanitization procedures. By composing the
edge is labelled with the branch predicate. two techniques to leverage their advantages
Each basic block is simulated using and mitigate their disadvantages. We
symbolic execution. The goal is to implemented this approach and evaluated
understand the collective effects of the system on a set of real-world
statements in a block on the global state of applications. In the process, a number of
the program and summarize their effects previously unknown vulnerabilities in the
into a concise block summary. After sanitization routines of the analyzed
computing a summary for each basic block, programs are identified.
we use a standard reachability analysis to
combine block summaries into a function L. K. Shar and H. B. K. Tan [4],
summary. The function summary describes “Predicting SQL injection and cross site
the pre- and post conditions of a function. scripting vulnerabilities through mining

IJSER
input sanitization patterns,” An application
D. Balzarotti, M. Cova, V. that accesses database via a SQL language
Felmetsger, N. Jovanovic, E. Kirda, C. is vulnerable if an unrestricted input is used
Kruegel, and G. Vigna [3], “Saner: to build the query string because an attacker
Composing static and dynamic analysis to might craft the input value to have
validate sanitization in web applications,”A unauthorized access to the database and
novel approach to analyze the correctness of perform malicious actions. This security
the sanitization process is introduced. The issue is called SQLI vulnerability. An
approach combines two complementary application that sends HTTP response data
techniques to model the sanitization process to a web client is vulnerable if an
and to verify its thoroughness. More unrestricted input is included in the
precisely, this is the first technique based on response data because an attacker might
static analysis models how an application inject a malicious JavaScript code in the
modifies its inputs along the paths to a sink, input value. The injected code when
using precise modelling of string executed by the client’s browser could
manipulation routines. This approach uses a perform malicious actions to the client. This
conservative model of string operations, security issue is called XSS vulnerability.
which might lead to false positives. Web developers generally implement input
Therefore, a second technique based on sanitization schemes to prevent these two
dynamic analysis is devised. This approach vulnerabilities. Input sanitization code
works bottom-up from the sinks and attributes which can be statically collected.
reconstructs the code used by the From these attributes, we aim to build SQLI
application to modify the inputs. The code and XSS vulnerability predictors which
is then executed, using a large set of provide high recalls and low false alarm
malicious input values to identify rates so that the predictors can be used
exploitable flaws in the sanitization process. alternatively or in combination with existing
In this approach, a static analysis technique taint-based approaches. Compared to
is used that characterizes the sanitization current vulnerability prediction approaches,
process by modeling the way in which an we only use static code attributes and we
application processes input values. This target vulnerable code at statement level.

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 83
ISSN 2229-5518

This proposal could be easily extended to III EXISTING SYSTEM

address other web application
vulnerabilities such as buffer overflow, path SQL injection (SQLI), cross site
traversal, and URL redirects/forwards. scripting (XSS), remote code execution
These vulnerabilities are caused by the (RCE), and file inclusion (FI) are among the
common weakness of web applications in most common and serious web application
handling user inputs properly. Classifiers as vulnerabilities threatening the privacy and
the base data miners for building security of both clients and applications
vulnerability prediction models are used. nowadays. From the perspective of web
Based on different characteristics of developers, input validation and input
classification algorithms, classifiers can be sanitization are two secure coding
grouped into different categories such as techniques that they can adopt to protect
tree-based approaches, neural networks, their programs from such common
support vector machines, nearest-neighbor vulnerabilities. Input validation typically
approaches, statistical procedures, and checks an input against required properties
ensembles. like data length, range, type, and sign. Input
sanitization, in general, cleanses an input
E. Arisholm, L. C. Briand, and E. B. string by accepting only pre-defined
Johannessen [5], “A systematic and characters and rejecting others, including
comprehensive investigation of methods to characters with special meaning to the

IJSER
build and evaluate fault prediction models,” interpreter under consideration. Intuitively,
This paper describes a study performed in an application is vulnerable if the
an industrial setting that attempts to build developers failed to implement these
predictive models to identify parts of a Java techniques correctly or to a sufficient
system with a high fault probability. The degree. To address these security threats,
system under consideration is constantly many web vulnerability detection
evolving as several releases a year are approaches, such as static taint analysis,
shipped to customers. Developers usually dynamic taint analysis, modeling checking,
have limited resources for their testing and symbolic and concolic testing, have been
would like to devote extra resources to proposed. Static taint analysis approaches
faulty system parts. The main research are scalable in general but are ineffective in
focus of this paper is to systematically practice due to high false positive rates.
assess three aspects on how to build and Dynamic taint analysis, model checking,
evaluate fault-proneness models in the symbolic and concolic testing techniques
context of this large Java legacy system can be highly accurate as they are able to
development project: (1) compare many generate real attack values, but have
data mining and machine learning scalability issues for large systems due to
techniques to build fault-proneness models, path explosion problem. There are also
(2) assess the impact of using different scalable vulnerability prediction methods.
metric sets such as source code structural But the granularity of current prediction
measures and change/fault history (process approaches is coarse-grained: they identify
measures), and (3) compare several vulnerabilities at the level of software
alternative ways of assessing the modules or components.
performance of the models, in terms of (i)
confusion matrix criteria such as accuracy IV PROPOSED SYSTEM
and precision/recall, (ii) ranking ability,
using the receiver operating characteristic Input validation and input
area (ROC), and (iii) our proposed cost- sanitization are two secure coding
effectiveness measure (CE). techniques that they can adopt to protect
their programs from such common

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 84
ISSN 2229-5518

vulnerabilities. Input validation typically

checks an input against required properties
like data length, range, type, and sign. Input
sanitization, in general, cleanses an input
string by accepting only pre-defined
characters and rejecting others, including
characters with special meaning to the
interpreter under consideration. Intuitively,
an application is vulnerable if the
developers failed to implement these
techniques correctly or to a sufficient
degree. We hypothesize that code attributes
that characterize validation and sanitization Fig. 4.1 Proposed System Diagram
code implemented in the program could be
used to predict web application DIFFERENT MODULES IN THE
vulnerabilities. Based on this hypothesis, we PROJECT:
propose a set of code attributes called input 1. Static and dynamic program analysis
validation and sanitization (IVS) attributes 2. Backward slicing
from which we build vulnerability 3. Hybrid program analysis
predictors that are fine-grained, accurate, 4. Slicing of each sink

IJSER
and scalable. The approach is fine-grained 5. Static and dynamic analysis on each slice
because it identifies vulnerabilities at 6. Classification of path in each slice
program statement levels. We use both 7. IVS attributes
static and dynamic program analysis 8. Building vulnerability prediction model
techniques to extract IVS attributes. Static A. Data representation
analysis can help assess general properties B. Data processing
of a program. Yet, dynamic analysis can 9. Supervised learning
focus on more specific code characteristics 10. Semi-supervised learning
that are complementary to the information 11. Final predictor
obtained with static analysis. We used
dynamic analysis only to infer the possible 1. STATIC AND DYNAMIC
types of input validation and sanitization PROGRAM ANALYSIS
code, rather than to precisely prove their
correctness, and apply machine learning on Both static and dynamic program
these inferences for vulnerability prediction. analysis techniques are used to extract IVS
Therefore, we mitigate the scalability issue attributes. Static analysis can help assess
typically associated with dynamic analysis. general properties of a program. Yet,
Thus, our proposed IVS attributes reflect dynamic analysis can focus on more
relevant properties of the implementations specific code characteristics that are
of input validation and input sanitization complementary to the information obtained
methods in web programs and are expected with static analysis. The dynamic analysis is
to help predict vulnerabilities in an accurate used only to infer the possible types of input
and scalable manner. Furthermore, we use validation and sanitization code, rather than
both supervised learning and semi- to precisely prove their correctness.
supervised learning methods to build
vulnerability predictors from IVS attributes, 2. BACKWARD SLICING
such that our method can also be used in
contexts where there is limited vulnerability Program slicing is a program
data for training. analysis and transformation technique to
decompose programs by analyzing their

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 85
ISSN 2229-5518

data and control flow. Given an imperative

program, a slice is an executable program Program slicing is method for
whose behavior must be identical to the automatically decomposing programs by
specialized subset of the original program's analyzing their data flow and control flow.
behavior. A program slice consists of those Starting from a subset of a program’s
program statements which are (potentially) behavior, slicing reduce that program to a
related to the values computed at some minimal form which still produces that
program point and/or variable, referred to as behavior. The first step of our approach is to
a slicing criterion. compute a backward static program slice for
. each sink and the set of tainted variable
3. HYBRID PROGRAM ANALYSIS used in the sink.

The analysis is based on the control 5. STATIC AND DYNAMIC

flow graph (CFG), the program dependence ANALYSIS ON EACH SLICE
graph (PDG), and the system dependence
graph (SDG) of a web application program. The developers will implement
Each node in the graphs represents one adequate input validation and sanitization
source code statement. We may therefore methods but yet, they may fail to recognize
use program statement and node all the data that could be manipulated by
interchangeably depending on the context. external users, thereby missing some of the

IJSER
A sink is a node in a CFG that uses inputs for validation. Therefore, in security
variables defined from input sources and analysis, it is important to first identify all
thus, may be vulnerable to input the input sources. The reason for classifying
manipulation attacks. This allows us to the inputs into different types is that each
predict vulnerabilities at statement levels. class of inputs causes different types of
Input nodes are the nodes at which data vulnerabilities and different security
from the external environment are accessed. defense schemes may be required to secure
A variable is tainted if it is defined from these different classes of inputs.
input nodes. As described earlier, the first
step the approach is to compute a backward 6. CLASSIFICATION OF PATH IN
static program slice for each sink and the set EACH SCLICE
of tainted variables used. Backward static
slice with respect to slicing criterion For each sink, a backward static
consists of all nodes (including predicates) program slice is computed with respect to
in the CFG that may affect the values of, the sink statement and the variables used in
subset of variables are used. We first the sinks. Each path in the slice is analyzed
construct the PDG for the main method of a using hybrid (static and dynamic) analysis
web application program and also construct to extract its validation and sanitization
PDGs for the methods called from the main effects on those variables. The path is then
method according to the algorithm given by classified according to its input validation
Ferrante et al. We then construct the SDG. and sanitization effects inferred by the
A PDG models a program procedure as a hybrid analysis.
graph in which the nodes represent program
statements and the edges represent data or 7. IVS ATTRIBUTES
control dependences between statements.
SDG extends PDG by modeling These attributes characterize various
interprocedural relations between the main types of program functions and operations
program and its subprograms. that are commonly used as input validation
and sanitization procedures to defend
4. SLICING OF EACH SINK against web application vulnerabilities.

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 86
ISSN 2229-5518

Using these attributes, functions and information and then predict future
operations are classified according to their vulnerabilities.
security-related properties. Hybrid analysis-
based attributes are attributes to be 8. BUILDING VULNERABILITY
extracted combining static analysis and PREDICTION MODEL
dynamic analysis. The reason for including
input sources in our classification scheme is Many machine learning techniques
that most of the common vulnerabilities can be used to build vulnerability
arise from the misidentification of inputs. predictors. Regardless of the specific
That is, developers may implement technique used, the goal is to learn and
adequate input validation and sanitization generalize patterns in the data associated
methods but yet, they may fail to recognize with sinks, which can then be efficiently
all the data that could be manipulated by used for predicting vulnerability for new
external users, thereby missing some of the sinks. As more sophisticated security
inputs for validation. Therefore, in security attacks are being discovered, it is important
analysis, it is important to first identify all for a vulnerability analysis approach to be
the input sources. able to adapt. With machine learning, it is
possible to adapt to new vulnerability
This hybrid analysis-based patterns via re-training.
classification is applied for validation and

IJSER
sanitization methods implemented using A. DATA REPRESENTATION
both standard security functions and Our unit of measurement, an instance in
nonstandard security functions. If there are machine learning terminology, is a path in
only standard security functions to be the slice of a sink and we characterize each
classified, we classify them based on their path with IVS attributes. The attribute
security-related information else dynamic values may range from zero to an upper
analysis is used. Various input validation bound that depends on the number of
and sanitization processes may be classified program operations or functions.
implemented using language built-in Since 33 IVS attributes are proposed, each
functions and/or custom functions. Since path would be represented by a 33-
inputs to web applications are naturally dimensional attribute vector.
strings, string replacement/ matching
functions or string manipulation procedures B. DATA PROCESSING
like escaping are generally used to In most of our datasets, the proportion of
implement custom input validation and vulnerable sinks to non-vulnerable ones is
sanitization procedures. A good security small. This is an imbalanced data problem
function generally consists of a set of string and should be expected in many such
functions that accept safe strings or reject vulnerability datasets. Prior studies have
unsafe strings. These functions are clearly shown that imbalanced data can
important indicators of vulnerabilities, but significantly affect the performance of
we 19 need to analyze the purpose of each machine learning classifiers, because some
validation and sanitization function since of the data might go unlearned by the
different defense methods are generally classifier due to their lack of representation,
required to prevent different types of thus leading to induction rules which tend to
vulnerabilities. It is important to classify explain the majority class data and
these methods implemented in a program favouring its predictive accuracy. Since for
path into different types because, together our problem, the minority class data capture
with their associated vulnerability data, our the ‘vulnerable’ instances, we need a high
vulnerability predictors can learn this predictive accuracy for this class as missing
vulnerability is far more critical than

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 87
ISSN 2229-5518

reporting a false alarm. To address this much larger amount of unlabeled data. This
problem, we use a sampling method called method that exploits unlabeled data can
adaptive synthetic oversampling. It balances enable ensemble learning when there are
the (unbalanced) data by generating very few labeled data. Combining semi
synthetic, artificial data for the minority supervised learning with ensembles has
class instances, thus reducing the bias many advantages. Unlabeled data is
introduced by the class imbalance problem. exploited to help enrich labeled training
It does not require modification of standard samples allowing ensemble learning: Each
classifiers and thus, can be conveniently individual learners improved with unlabeled
added as an additional data pre-processing data labeled by the ensemble consisting of
step. all other learners. A few different types of
semi-supervised methods, such as
9. SUPERVISED LEARNING EMbased, clustering-based, and
disagreement-based learning, have been
Classification is a type of supervised proposed in literature. But none of these
learning methods because the class label of techniques has been explored for
each training instance has to be provided. In vulnerability prediction so far. Hence, based
this study, we build logistic regression and on these motivations, we explore the use of
Random Forest (RF) models from the an algorithm called Co Forest, Co-trained
proposed attributes. LR is a type of Random Forest (CF), which applies semi-

IJSER
statistical classification model. It can be supervised learning on RF. It is a
used for predicting the outcome (class label) disagreement-based, semi-supervised
of a dependent attribute based on one or learner. CF uses multiple, diverse learners,
more predictor attributes. The probabilities and combines them to exploit unlabeled
describing the possible outcomes of a given data (semi supervised learning), and
instance are modelled. Logistic regression maintains a large disagreement between the
analysis is flexible in terms of the types of learners to promote the learning process.
monotonic relationships it can model
between the probability of vulnerability and 11. FINAL PREDICTOR
predictor attributes. RF is an ensemble A qualified web application
learning method for classification that vulnerability predictor can be built with the
consists of a collection of tree-structured help of the input validation and sanitation
classifiers. In many cases the predictive attributes and the machine learning
accuracy is greatly enhanced as the final techniques. By using the above attributes
prediction output comes from an ensemble we will be able to generate a web
of learners, rather than a single learner. application predictor which is highly
Given an input sample, each tree casts a accurate, fine-grained and scalable.
vote (classification) and the forest outputs
the classification having the majority vote V IMPLEMENTATION
from the trees.
A. DERIVATION OF IVS ATTRIBUTES
10. SEMI-SUERVISED LEARNING The code attributes that characterize
As ensemble learning works by validation and sanitization code
combining individual classifiers, it typically implemented in the program could be used
requires significant amounts of labeled data to predict web application vulnerabilities.
for training. In certain industrial contexts, Based on this hypothesis, we propose a set
relevant and labeled data available for of code attributes called input validation
learning may be limited. Semi-supervised and sanitization (IVS) attributes from which
methods [39] use, for training, a small we build vulnerability predictors that are
amount of labeled data together with a fine-grained, accurate, and scalable. The

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 88
ISSN 2229-5518

approach is fine-grained because it 16. Lang-comment-delimiter Function that

identifies vulnerabilities at program filters programming language comment
statement levels. We use both static and delimiter characters such as (/)
dynamic program analysis techniques to 17. Other-delimiter Function that filters
extract IVS attributes. Static analysis can other delimiters different from the above
help assess general properties of a program. delimiters such as (#)
Yet, dynamic analysis can focus on more 18. Script-tag Function that filters dynamic
specific code characteristics that are client script tags such as (<script>)
complementary to the information obtained 19. HTML-tag Function that filters static
with static analysis. We use dynamic client script tags such as (<div>)
analysis only to infer the possible types of 20. Event-handler Function that disallow
input validation and sanitization code, the use of inputs as the values of client side
rather than to precisely prove their event handlers such as (onload = )
correctness. 21. Null-byte Function that filters null byte
(%00)
B. LIST OF IVS ATTRIBUTES 22. Dot Function that filters dot (.)
1. Client - Input accessed from HTTP 23. DotDotSlash Function that filters dot-
request parameters such as HTTP Get dot-slash (../) sequences
2. File - Input accessed from files such as 24. Backslash Function that filters
Cookies, XML backslash (\)

IJSER
3. Text-database - Text-based input 25. Slash Function that filters slash (/)
accessed from database 26. Newline Function that filters newline
4. Numeric-database - Numeric-based input (\n)
accessed from database 27. Colon Function that filters colon (,) or
5. Session - Input accessed from persistent semi-colon (;)
data object such as HTTP Session 28. Other-special Function that filters any
6. Uninit - Un-initialized program variable other special characters different from the
7. Un-taint - Function that returns above
predefined information or information not 29. Encode Function that encodes a string
influenced by external users. into a different format
8. Known-vuln-user - Custom function that 30. Canonicalize Function that converts a
has caused security issues in the past. string into its most standard, simplest form
9. Known-vuln-std - Language built-in 31. Path Function that filters directory paths
function that has caused security issues in or URLs
the past. 32. Limit-length Function or operation that
10. Propagate - Function or operation that limits a string into a specific length
propagates partial or complete value of a
string.
11. Numeric Function or operation that
converts a string into a numeric
12. DB-operator Function that filters query
operators such as ( = )
13. DB-comment-delimiter Function that
filters query comment delimiters such as (–)
14. DB-special Function that filters other
database special characters different from
the above, such as (\x00) and (\x1a) 5.1 Implementation of vulnerability
15. String-delimiter Function that filters predictor
string delimiters such as (‘) and (“)

IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 89
ISSN 2229-5518

Technol., vol. 55, no. 10, pp. 1767– 1780,

2013.

[5]. E. Arisholm, L. C. Briand, and E. B.

Johannessen, “A systematic and
comprehensive investigation of methods to
build and evaluate fault prediction models,”
J. Syst. Softw., vol. 83, no. 1, pp. 2–17,
2010.
5.2 Implementation of vulnerability
predictor
[6]. A. Kie_zun, P. J. Guo, K. Jayaraman,
and M. D. Ernst, “Automatic creation of
VI CONCLUSION
SQL injection and cross-site scripting
In this project, the input validation and
attacks,” in Proc. Int. Conf. Softw. Eng.,
sanitation attributes are generated. The first
2009, pp. 199–209.
step of our approach is to compute a static
backward slice for each sink. Both the static
[7]. M. Martin and M. S. Lam, “Automatic
program analysis and dynamic program
generation of XSS and SQL injection
analysis are used to extract the input validation
attacks with goal-directed model checking,”
and sanitation attributes. The program analysis
in Pro0c.USENIX Security Symp., 2008,
is based on the control flow graph, control

IJSER
pp. 31–43.
dependence graph and system dependence
graph of a web program. The input validation
and sanitation attributes will act as the building
blocks for the web application vulnerability
predictor.

REFERENCES
[1]. N. Jovanovic, C. Kruegel, and E. Kirda,
“Pixy: A static analysis tool for detecting
web application vulnerabilities,” in Proc.
IEEE Symp. Security Privacy, 2006, pp.
258–263.

[2]. Y. Xie and A. Aiken, “Static detection

of security vulnerabilities in scripting
languages,” in Proc. USENIX Security
Symp., 2006, pp. 179–192.

[3]. D. Balzarotti, M. Cova, V. Felmetsger,

N. Jovanovic, E. Kirda, C. Kruegel, and G.
Vigna, “Saner: Composing static and
dynamic analysis to validate sanitization in
web applications,” in Proc. IEEE Symp.
Security Privacy, 2008, pp. 387–401.

[4]. L. K. Shar and H. B. K. Tan,

“Predicting SQL injection and cross site
scripting vulnerabilities through mining
input sanitization patterns,” Inf. Softw.

Web Application Penetration Testing Amp Patch Development Using Kali Linux
No ratings yet
Web Application Penetration Testing Amp Patch Development Using Kali Linux
6 pages
Driller: Augmenting Fuzzing Through Selective Symbolic Execution
No ratings yet
Driller: Augmenting Fuzzing Through Selective Symbolic Execution
16 pages
Instrumentation Amplifier
67% (3)
Instrumentation Amplifier
4 pages
Web Application Vulnerability Prediction Using Hybrid Program Ana
No ratings yet
Web Application Vulnerability Prediction Using Hybrid Program Ana
21 pages
On Security Analysis of PHP Web Applications: David Hauzar and Jan Kofro N
No ratings yet
On Security Analysis of PHP Web Applications: David Hauzar and Jan Kofro N
6 pages
Detecting Web
No ratings yet
Detecting Web
5 pages
LineVul A Transformer-Based Line-Level Vulnerability Prediction
No ratings yet
LineVul A Transformer-Based Line-Level Vulnerability Prediction
13 pages
Injection Tsbb-Esec-Fse2017-Demo
No ratings yet
Injection Tsbb-Esec-Fse2017-Demo
5 pages
1 s2.0 S0164121222000437 Main
No ratings yet
1 s2.0 S0164121222000437 Main
4 pages
1 s2.0 S0164121222000437 Main3
No ratings yet
1 s2.0 S0164121222000437 Main3
2 pages
1 s2.0 S0164121222000437 Main2
No ratings yet
1 s2.0 S0164121222000437 Main2
3 pages
Source Code Audit PDF
No ratings yet
Source Code Audit PDF
16 pages
Evaluation of Static Analysis On Web Applications
No ratings yet
Evaluation of Static Analysis On Web Applications
8 pages
Automated Security Review of PHP Web Applications With Static Code Analysis PDF
No ratings yet
Automated Security Review of PHP Web Applications With Static Code Analysis PDF
123 pages
Kim Et Al. - 2016 - Software Vulnerability Detection Methodology Combined With Static and Dynamic Analysis
No ratings yet
Kim Et Al. - 2016 - Software Vulnerability Detection Methodology Combined With Static and Dynamic Analysis
17 pages
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
No ratings yet
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
16 pages
Paper Submission Example From Peergrade
No ratings yet
Paper Submission Example From Peergrade
6 pages
Usage of Machine Learning in Software Testing
No ratings yet
Usage of Machine Learning in Software Testing
15 pages
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
No ratings yet
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
24 pages
Guardrails for Secure Code Analysis: The Complete Guide for Developers and Engineers
From Everand
Guardrails for Secure Code Analysis: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Practical Dynamic Taint Analysis For Countering Input Validation Attacks On Web Applications
No ratings yet
Practical Dynamic Taint Analysis For Countering Input Validation Attacks On Web Applications
15 pages
Vulnerability Scanners-A Proactive Approach To Ass
No ratings yet
Vulnerability Scanners-A Proactive Approach To Ass
13 pages
Client-Side Validation and Verification of PHP Dynamic Websites
No ratings yet
Client-Side Validation and Verification of PHP Dynamic Websites
5 pages
AIBug Hunter
No ratings yet
AIBug Hunter
34 pages
Viewpoints: Differential String Analysis For Discovering Client-And Server-Side Input Validation Inconsistencies
No ratings yet
Viewpoints: Differential String Analysis For Discovering Client-And Server-Side Input Validation Inconsistencies
11 pages
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
Snyk Security Insights and Automation: The Complete Guide for Developers and Engineers
From Everand
Snyk Security Insights and Automation: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Routine Detection of Web Application Defence Flaws
No ratings yet
Routine Detection of Web Application Defence Flaws
5 pages
Sensors 23 07978 v2
No ratings yet
Sensors 23 07978 v2
33 pages
NativeScript for Application Development: Definitive Reference for Developers and Engineers
From Everand
NativeScript for Application Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IJSEVol7 No2 3
No ratings yet
IJSEVol7 No2 3
28 pages
Software Vulnerability Prediction Using Text Analysis Techniques
No ratings yet
Software Vulnerability Prediction Using Text Analysis Techniques
3 pages
Base Paper
No ratings yet
Base Paper
12 pages
Soft Vulns Survey
No ratings yet
Soft Vulns Survey
35 pages
Towards_the_application_of_recommender_systems_to_
No ratings yet
Towards_the_application_of_recommender_systems_to_
25 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Developing Interactive Web Applications with Shiny: Definitive Reference for Developers and Engineers
From Everand
Developing Interactive Web Applications with Shiny: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IEEE Xplore Full-Text PDF
No ratings yet
IEEE Xplore Full-Text PDF
1 page
Web Application Security Vulnerabilities
No ratings yet
Web Application Security Vulnerabilities
5 pages
Usenixsecurity23 Mirsky
No ratings yet
Usenixsecurity23 Mirsky
19 pages
ZAP Essentials: Definitive Reference for Developers and Engineers
From Everand
ZAP Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Paper 164
No ratings yet
Paper 164
7 pages
Applsci 14 09697
No ratings yet
Applsci 14 09697
14 pages
Buffer Overflow
No ratings yet
Buffer Overflow
12 pages
Multi-Module Vulnerability Analysis of Web-Based Application
No ratings yet
Multi-Module Vulnerability Analysis of Web-Based Application
11 pages
Detecting Vulnerabilities in Website Using Multisc
No ratings yet
Detecting Vulnerabilities in Website Using Multisc
8 pages
Usenixsec 05
No ratings yet
Usenixsec 05
36 pages
2020Typestate-Guided Fuzzer For Discovering Use-After-free Vulnerabilities
No ratings yet
2020Typestate-Guided Fuzzer For Discovering Use-After-free Vulnerabilities
14 pages
Mastering Secure Coding: Writing Software That Stands Up to Attacks
From Everand
Mastering Secure Coding: Writing Software That Stands Up to Attacks
Larry Jones
No ratings yet
Vulnerabilities Classification Machine Learning Paper SIMARGL
No ratings yet
Vulnerabilities Classification Machine Learning Paper SIMARGL
16 pages
Input Validation Vulnerabilities in Web Applications Systematic Review Classification and Analysis of The Current State-of-the-Art
No ratings yet
Input Validation Vulnerabilities in Web Applications Systematic Review Classification and Analysis of The Current State-of-the-Art
34 pages
Chai Assertion Library in Practice: Definitive Reference for Developers and Engineers
From Everand
Chai Assertion Library in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Vulnerability Scanning Method For Web Services I
No ratings yet
A Vulnerability Scanning Method For Web Services I
20 pages
1ST Review
No ratings yet
1ST Review
17 pages
Rekognition Programming Guide: Definitive Reference for Developers and Engineers
From Everand
Rekognition Programming Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pattern Based Vulnerability Discovery
No ratings yet
Pattern Based Vulnerability Discovery
151 pages
iTES Integrated Testing and Evaluation System For Software Vulnerability Detection Methods
No ratings yet
iTES Integrated Testing and Evaluation System For Software Vulnerability Detection Methods
6 pages
Shirley Yang Masc Thesis
No ratings yet
Shirley Yang Masc Thesis
65 pages
Comparison of Vulnerability Assessment and Penetration Testing
No ratings yet
Comparison of Vulnerability Assessment and Penetration Testing
4 pages
Ionic Development in Practice: Definitive Reference for Developers and Engineers
From Everand
Ionic Development in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Open Source Web Vulnerability Scanners - The Cost Effective Choice?
No ratings yet
Open Source Web Vulnerability Scanners - The Cost Effective Choice?
13 pages
Security in Microservices Architectures
No ratings yet
Security in Microservices Architectures
12 pages
Detectionof Remote Code Executionvulnerabilityinwebsitesourcecodesusing LSTMmachinelearningmodel
No ratings yet
Detectionof Remote Code Executionvulnerabilityinwebsitesourcecodesusing LSTMmachinelearningmodel
6 pages
Web Attack Intrusion Detection System Using Machine Learning Techniques
No ratings yet
Web Attack Intrusion Detection System Using Machine Learning Techniques
16 pages
Detection of SQL Injection Attack Using Machine Le
No ratings yet
Detection of SQL Injection Attack Using Machine Le
11 pages
Securing Web Applications Against XSS and SQLi Att
No ratings yet
Securing Web Applications Against XSS and SQLi Att
18 pages
Security and Communication Networks - 2022 - Shaheed - Web Application Firewall Using Machine Learning and Features
No ratings yet
Security and Communication Networks - 2022 - Shaheed - Web Application Firewall Using Machine Learning and Features
14 pages
Ymhgece Webvul
No ratings yet
Ymhgece Webvul
21 pages
To Cams 2020
No ratings yet
To Cams 2020
13 pages
Vulnerability Assessments in Ethical Hacking
No ratings yet
Vulnerability Assessments in Ethical Hacking
5 pages
Ecdl Mod4
No ratings yet
Ecdl Mod4
91 pages
OpenSAP Sps3 Week 1 Transcript
No ratings yet
OpenSAP Sps3 Week 1 Transcript
16 pages
Catalogo Tecnico Ingles
No ratings yet
Catalogo Tecnico Ingles
52 pages
Ir2117 Igbt Driver PDF
No ratings yet
Ir2117 Igbt Driver PDF
18 pages
Unit1 Topic1 Digital Logic Introduction
No ratings yet
Unit1 Topic1 Digital Logic Introduction
33 pages
SEK Assessment - FastTrack For Files
No ratings yet
SEK Assessment - FastTrack For Files
7 pages
R No24 TejasKulkarni Internshipreport
No ratings yet
R No24 TejasKulkarni Internshipreport
30 pages
Final Report 03-01-2017
No ratings yet
Final Report 03-01-2017
59 pages
Family Handyman Magazine 519 June 2011 PDF
No ratings yet
Family Handyman Magazine 519 June 2011 PDF
2 pages
Cea6 Pro - in - 1.4
100% (3)
Cea6 Pro - in - 1.4
98 pages
Anexa Mobilier Birotica Diverse Ianuarie 2011
No ratings yet
Anexa Mobilier Birotica Diverse Ianuarie 2011
8 pages
EAX-C236KP 1st Manual
No ratings yet
EAX-C236KP 1st Manual
88 pages
Guide
No ratings yet
Guide
15 pages
Use Case: From Wikipedia, The Free Encyclopedia
No ratings yet
Use Case: From Wikipedia, The Free Encyclopedia
6 pages
Silo - Tips HDD Troubleshooting This Manual Is Intended For Professional Installers Not For End Users Please Handle This Information With Care
No ratings yet
Silo - Tips HDD Troubleshooting This Manual Is Intended For Professional Installers Not For End Users Please Handle This Information With Care
7 pages
Gaceta Oficial de Convocatorias by Bolivia. Periódico Del Estado Plurinacional de Bolivia - Issuu
No ratings yet
Gaceta Oficial de Convocatorias by Bolivia. Periódico Del Estado Plurinacional de Bolivia - Issuu
1 page
Presentation Intro To Python and Metocean Data Analysis
No ratings yet
Presentation Intro To Python and Metocean Data Analysis
27 pages
Ingress Bootcamp v1.2
No ratings yet
Ingress Bootcamp v1.2
115 pages
Esr3000 Eng V18.00
No ratings yet
Esr3000 Eng V18.00
3 pages
Smart Data Services For Ngeniusone Splunk Integration
No ratings yet
Smart Data Services For Ngeniusone Splunk Integration
2 pages
MODULE-3 Constructors and Destructors
No ratings yet
MODULE-3 Constructors and Destructors
15 pages
Mayflex - WhitePaper - Six Steps To Successfully Designing and Planning An IP CCTV System
No ratings yet
Mayflex - WhitePaper - Six Steps To Successfully Designing and Planning An IP CCTV System
6 pages
Introduction To Visual Basic
No ratings yet
Introduction To Visual Basic
42 pages
Part B - Unit1
No ratings yet
Part B - Unit1
6 pages
Manual Nitrox m9 English
No ratings yet
Manual Nitrox m9 English
9 pages
Tec-541-T2 Key Lms Features and Review
100% (1)
Tec-541-T2 Key Lms Features and Review
4 pages
Tips To Improve PC Performance in Windows 10 - Windows Help
No ratings yet
Tips To Improve PC Performance in Windows 10 - Windows Help
19 pages
IIT Kharagpur Placement Brochure 2019-20
No ratings yet
IIT Kharagpur Placement Brochure 2019-20
27 pages

Web Application Vulnerability Prediction Using Machine Learning

Uploaded by

Web Application Vulnerability Prediction Using Machine Learning

Uploaded by

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 80

WEB APPLICATION VULNERABILITY PREDICTION USING

Abstract-Web applications have become software is also highly accessible, web

This proposal could be easily extended to III EXISTING SYSTEM

vulnerabilities. Input validation typically

data and control flow. Given an imperative

The analysis is based on the control 5. STATIC AND DYNAMIC

approach is fine-grained because it 16. Lang-comment-delimiter Function that

Technol., vol. 55, no. 10, pp. 1767– 1780,

[5]. E. Arisholm, L. C. Briand, and E. B.

[2]. Y. Xie and A. Aiken, “Static detection

[3]. D. Balzarotti, M. Cova, V. Felmetsger,

[4]. L. K. Shar and H. B. K. Tan,

You might also like