Paper
Paper
It follows the
anomaly-based approach, therefore known and unknown attacks can be
detected. The system relies on an XML file to classify the incoming requests as
normal or anomalous. The XML file, which is built from only normal traffic,
contains a description of the normal behavior of the target web application
statistically characterized. Any request which deviates from the normal behavior
is considered an attack. The system has been applied to protect a real web
application. An increasing number of training requests have been used to train
the system. Experiments show that when the XML file has enough information to
closely characterize the normal behavior of the target web application, a very
high detection rate is reached while the false alarm rate remains very low.
1. Introduction
Web applications are becoming increasingly popular and complex in all sorts of
environments, ranging from e- commerce applications to banking. Consequently,
web applications are subject to all sort of attacks. Additionally, web applications
handle large amounts of sensitive data, which makes web applications even more
attractive for malicious users. The consequences of many attacks might be
devastating, like identity supplanting, sensitive data hijacking, access to
unauthorized information, web page’s content modification, command execution,
etc. Therefore, it is fundamental to protect web applications and to adopt the
suitable security methods.
As the definition of the valid requests The administrative work is high as the
are complete and accurate, the signatures have to be updated to
administrative overhead is low. contain the new attacks and the new
variations of the existing attacks.
Definition of normal traffic is not an easy Signatures are easy to develop and
task in large and complex web understand if the behavior to be
applications. identified is known.
The normal behavior is defined, it is not A signature has to be defined for every
needed to define a signature for every attack and their variations.
attack and their variations.
It works well against self- modifying Usually, it does not work very accurately
attacks. against attacks with self- modifying
behavior. Such attacks are usually
generated by humans and polymorphic
worms.
It is not easy to know exactly which issue The events generated by a signature-
caused the alert. based IDS can inform very precisely
about what caused the alert, which
makes it easier to research on the
causing issue.
A simple and effective anomaly based WAF is presented. This system relies on an
XML file to describe what a normal web application is. Any irregular behavior is
flagged as intrusive. The XML file must be tailored for every target application to
be protected.
Web applications are usually divided into three logical tiers: presentation,
application, and storage. Typically, a web server is the first tier (presentation), an
engine using some web content technology is the middle tier (application logic)
and finally, a database is the third tier (storage). Some examples can be cited: IIS
and Apache are popular web servers, Web Logic Server and Apache Tomcat are
well known applications servers and finally, Oracle and MySQL are frequently used
databases. Separating the presentation tier from the storage one facilitates the
design and the maintenance of the web site. The web server sends requests to the
middle tier, which services them by making queries and updates against the
database and generates a user interface.
Most of the Web contents are dynamic. Dynamic pages require access to the back-
end database where the application information is stored, hence attacks against
these pages usually aim at the data stored in the database.
2.2 Web Attacks
Web attacks can be classified as static or dynamic, depending on whether they are
common to all web applications hosted on the same platform or depend on the
specific application.
Static web attacks look for security vulnerabilities in the web application platform:
web server, application server, database server, firewall, operating system, and
third-party components, such as shopping carts, cryptographic modules, payment
gateways, etc.
These security pitfalls comprise well-known vulnerabilities and erroneous
configurations. There are both commercial and free automated tools capable of
scanning a server in search of such known vulnerabilities and configuration errors.
A common feature of all these vulnerabilities is that they request pages, file
extensions, or elements that do not form part of the web application as intended
for the end user. Therefore, it is very easy to detect suspicious behavior when any
resource which does not belong to the application visible by the user is requested.
Dynamic web attacks only request legal pages of the application, but they subvert
the expected parameters. Manipulation of input arguments can lead to several
attacks with different consequences: disclosure of information about the platform,
information about other users’ theft, command execution, etc.
System Overview
Our WAF analyzes HTTP requests sent by a client browser trying to get
access to a web server. The analysis takes place exclusively at the
application layer. The system follows the anomaly-based approach,
detecting known and unknown web attacks, in contrast with existing
signature-based WAFs. Mod Security is a popular signature based WAF.
In our architecture, the system operates as a proxy located between the
client and the web server. Likewise, the system might be embedded as a
module within the server. However, the first approach enjoys the
advantage of being independent of the web platform. This proxy
analyzes all the traffic sent by the client. The input of the detection
process consists of a collection of HTTP requests {r1, r2, . . . rn}. The
output is a single bit ai for each input request ri, which indicates
whether the request is normal or anomalous. The proxy is able to work
in two different modes of operation: as an IDS or as an IPS. In detection
mode, the proxy simply analyzes the incoming packets and tries to find
suspicious patterns. If a suspicious request is detected, the proxy
launches an alert; otherwise, it remains inactive. In any case, the
request will reach the web server. When operating in detection mode,
attacks could succeed, whereas false positives do not limit the system
functionality. In prevention mode, the proxy receives requests from
clients and analyzes them. If the request is valid, the proxy routes it to
the server, and sends back the received response to the client. If not,
the proxy blocks the request, and sends back a generic denied access
page to the client. Thus, the communication between proxy and server
is established only when the request is deemed as valid.
3.2 Normal Behavior Description
Prior to the detection process, the system needs a precise picture of
what the normal behavior is in a specific web application. For this
purpose, our system relies on an XML file which contains a thorough
description of the web application’s normal behavior. Once a request is
received, the system compares it with the normal behavior model. If
the difference exceeds the given thresholds, then the request is flagged
as an attack and an alert is launched. The XML file contains rules
regarding to the correctness of HTTP verbs, HTTP headers, accessed
resources (files), arguments, and values for the arguments.
1. Verb check. The verb must be present in the XML file, otherwise the request is
rejected. For example, in the applications in which only GET, POST and HEAD are
required to work correctly, the XML file could be configured accordingly, thus
rejecting requests that use any other verb.
2. Headers check. If the header appears in the XML file, its value must be included
too. Different values will not be accepted, thus preventing attacks embedded in
these elements.
3. Resource test. The system checks whether the requested resource is valid. For
this purpose, the XML configuration file contains a complete list of all files that are
allowed to be served. If the requested resource is not present in the list, a web
attack is assumed.
4. Arguments test. If the request has any argument, the following aspects are
checked:
a)It is checked that all arguments are allowed for the resource. If the request
includes arguments not listed in the XML file the request is rejected.
b) It is confirmed that all mandatory arguments are present in the request. If any
mandatory argument (required Field=” true”) is not present in the request, it is
rejected.
c) Argument values are checked. An incoming request will be allowed if all
parameter values are identified as normal. Argument values are decoded before
being checked. If any property of the argument is outside the corresponding
interval or contains any forbidden special character, the request is rejected.
These steps allow the detection of both static attacks, which request resources
that do not belong to the application, and dynamic attacks, which manipulate the
arguments of the request.
4.3 Artificial Traffic Generation
In our approach, normal and anomalous request databases are generated
artificially with the help of dictionaries.
Dictionaries
Dictionaries are data files which contain real data to fill the different arguments
used in the target application. Names, surnames, addresses, etc., are examples of
dictionaries used. A set of dictionaries containing only allowed values is used to
generate the normal request database. A different set of dictionaries is used to
generate the anomalous request database. The latter dictionaries contain both
known attacks and illegal values with no malicious intention.
Normal Traffic Generation. Allowed HTTP requests are generated for each
page in the web application. Arguments and cookies in the page, if any, are also
filled in with values from the normal dictionaries. The result is a normal request
database (Normal DB). Some requests from Normal DB will be used in the training
phase and some others will be used in the test phase.
Attacks manipulating parameters are fenced off by the proper definition of the
statistical intervals. In the case of buffer overflow, the length property is of
paramount importance. Many attacks make use of special characters (typically
different from letters and from digits) in order to perform malicious actions. For
instance, this is the case of SQL injection, which uses characters with special
meaning in SQL to get queries or commands unexpectedly executed. For this
reason, the minimum and maximum percentage of letters, digits and special
characters are crucial for recognizing these attacks. Even more, any special
character present in the input argument is not allowed unless it is included in the
property called “special”. The interval check help to frustrate attacks exploiting
vulnerabilities such as CRLF injection, invalid parameters, command injection, XSS,
SQL injection, buffer overflow, broken authentication, and session management,
etc.
4.7 Performance measurement
The performance of the detector is then measured by Receiver Operating
Characteristic (ROC) curves. A ROC curve plots the attack detection rate (true
positives, TP) against the false alarm rate (false positives, FP).
The parameter of the ROC curve is the number of requests used in the training
phase.
4.8 Results
Several experiments have been performed using an increased amount of training
requests in the training phase. For each experiment, the proxy received 1000
normal requests and 1000 attacks during the test phase. As can be seen in Fig. 5
(a), very satisfactory results are obtained: the false alarm rate is close to 0
whereas the detection rate is close to 1. As shown in Fig. 5 (b), at the beginning,
with a low amount of training requests, the proxy rejects almost all requests (both
normal and attacks). As a consequence, the detection rate is perfect (1) and the
false positive rate is high. As the training progresses, the false alarm rate
decreases quickly, and the detection rate remains reasonably high. Therefore, this
WAF is adequate to protect from web attacks due to its capacity to detect
different attacks generating a very little amount of false alarms. It is important to
notice that when the XML file closely characterizes the web application’s normal
behavior, the different kinds of attacks can be detected, and few false alarms are
raised.
Conclusions
We presented a simple and efficient web attack detection system or
Web Application Firewall (WAF). As the system is based on the anomaly-
based methodology it proved to be able to protect web applications
from both known and unknown attacks. The system analyzes input
requests and decides whether they are anomalous or not. For the
decision, the WAF relies on an XML file which specifies web application
normal behavior. The experiments show that as long as the XML file
correctly defines normality for a given target application, near perfect
results are obtained. Thus, the main challenge is how to create an
accurate XML file in a fully automated manner for any web application.