0% found this document useful (0 votes)
69 views16 pages

Finding Security Vulnerabilities in Java Applications With Static Analysis

Uploaded by

anil_chandore
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views16 pages

Finding Security Vulnerabilities in Java Applications With Static Analysis

Uploaded by

anil_chandore
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Finding Security Vulnerabilities in Java Applications

with Static Analysis

V. Benjamin Livshits and Monica S. Lam


Computer Science Department
Stanford University

{livshits, lam}@cs.stanford.edu

Abstract curity breach and an average episode results in close to


$4 million in losses [10]. A recent penetration test-
This paper proposes a static analysis technique for
ing study performed by the Imperva Application De-
detecting many recently discovered application vulner-
fense Center included more than 250 Web applications
abilities such as SQL injections, cross-site scripting, and
from e-commerce, online banking, enterprise collabo-
HTTP splitting attacks. These vulnerabilities stem from
ration, and supply chain management sites [54]. Their
unchecked input, which is widely recognized as the most
vulnerability assessment concluded that at least 92% of
common source of security vulnerabilities in Web appli-
Web applications are vulnerable to some form of hacker
cations. We propose a static analysis approach based on
attacks. Security compliance of application vendors is
a scalable and precise points-to analysis. In our system,
especially important in light of recent U.S. industry reg-
user-provided specifications of vulnerabilities are auto-
ulations such as the Sarbanes-Oxley act pertaining to in-
matically translated into static analyzers. Our approach
formation security [4, 19].
finds all vulnerabilities matching a specification in the
statically analyzed code. Results of our static analysis A great deal of attention has been given to network-
are presented to the user for assessment in an auditing level attacks such as port scanning, even though, about
interface integrated within Eclipse, a popular Java devel- 75% of all attacks against Web servers target Web-based
opment environment. applications, according to a recent survey [24]. Tra-
Our static analysis found 29 security vulnerabilities in ditional defense strategies such as firewalls do not pro-
nine large, popular open-source applications, with two of tect against Web application attacks, as these attacks rely
the vulnerabilities residing in widely-used Java libraries. solely on HTTP traffic, which is usually allowed to pass
In fact, all but one application in our benchmark suite through firewalls unhindered. Thus, attackers typically
had at least one vulnerability.Context sensitivity, com- have a direct line to Web applications.
bined with improved object naming, proved instrumen- Many projects in the past focused on guarding against
tal in keeping the number of false positives low. Our problems caused by the unsafe nature of C, such as buffer
approach yielded very few false positives in our experi- overruns and format string vulnerabilities [12, 45, 51].
ments: in fact, only one of our benchmarks suffered from However, in recent years, Java has emerged as the lan-
false alarms. guage of choice for building large complex Web-based
systems, in part because of language safety features that
1 Introduction disallow direct memory access and eliminate problems
The security of Web applications has become increas- such as buffer overruns. Platforms such as J2EE (Java 2
ingly important in the last decade. More and more Web- Enterprise Edition) also promoted the adoption of Java
based enterprise applications deal with sensitive financial as a language for implementing e-commerce applications
and medical data, which, if compromised, in addition to such as Web stores, banking sites, etc.
downtime can mean millions of dollars in damages. It is A typical Web application accepts input from the user
crucial to protect these applications from hacker attacks. browser and interacts with a back-end database to serve
However, the current state of application security user requests; J2EE libraries make these common tasks
leaves much to be desired. The 2002 Computer Crime easy to code. However, despite Java language’s safety, it
and Security Survey conducted by the Computer Secu- is possible to make logical programming errors that lead
rity Institute and the FBI revealed that, on a yearly ba- to vulnerabilities such as SQL injections [1, 2, 14] and
sis, over half of all databases experience at least one se- cross-site scripting attacks [7, 22, 46]. A simple pro-
gramming mistake can leave a Web application vulner- 1.2 Code Auditing for Security
able to unauthorized data access, unauthorized updates
or deletion of data, and application crashes leading to Many attacks described in the previous section can
denial-of-service attacks. be detected with code auditing. Code reviews pinpoint
potential vulnerabilities before an application is run. In
1.1 Causes of Vulnerabilities fact, most Web application development methodologies
Of all vulnerabilities identified in Web applications, recommend a security assessment or review step as a sep-
problems caused by unchecked input are recognized as arate development phase after testing and before applica-
being the most common [41]. To exploit unchecked in- tion deployment [40, 41].
put, an attacker needs to achieve two goals: Code reviews, while recognized as one of the most
effective defense strategies [21], are time-consuming,
Inject malicious data into Web applications. Common costly, and are therefore performed infrequently. Secu-
methods used include: rity auditing requires security expertise that most devel-
opers do not possess, so security reviews are often car-
• Parameter tampering: pass specially crafted ma-
ried out by external security consultants, thus adding to
licious values in fields of HTML forms.
the cost. In addition to this, because new security errors
• URL manipulation: use specially crafted parame- are often introduced as old ones are corrected, double-
ters to be submitted to the Web application as part audits (auditing the code twice) is highly recommended.
of the URL. The current situation calls for better tools that help de-
• Hidden field manipulation: set hidden fields of velopers avoid introducing vulnerabilities during the de-
HTML forms in Web pages to malicious values. velopment cycle.
• HTTP header tampering: manipulate parts of
HTTP requests sent to the application. 1.3 Static Analysis
• Cookie poisoning: place malicious data in cookies,
This paper proposes a tool based on a static analy-
small files sent to Web-based applications.
sis for finding vulnerabilities caused by unchecked in-
Manipulate applications using malicious data. Com- put. Users of the tool can describe vulnerability pat-
mon methods used include: terns of interest succinctly in PQL [35], which is an easy-
to-use program query language with a Java-like syntax.
• SQL injection: pass input containing SQL com- Our tool, as shown in Figure 1, applies user-specified
mands to a database server for execution. queries to Java bytecode and finds all potential matches
statically. The results of the analysis are integrated into
• Cross-site scripting: exploit applications that out-
Eclipse, a popular open-source Java development envi-
put unchecked input verbatim to trick the user into
ronment [13], making the potential vulnerabilities easy
executing malicious scripts.
to examine and fix as part of the development process.
• HTTP response splitting: exploit applications that
output input verbatim to perform Web page deface- The advantage of static analysis is that it can find all
ments or Web cache poisoning attacks. potential security violations without executing the appli-
cation. The use of bytecode-level analysis obviates the
• Path traversal: exploit unchecked user input to need for the source code to be accessible. This is espe-
control which files are accessed on the server. cially important since libraries whose source is unavail-
• Command injection: exploit user input to execute able are used extensively in Java applications. Our ap-
shell commands. proach can be applied to other forms of bytecode such as
These kinds of vulnerabilities are widespread in today’s MSIL, thereby enabling the analysis of C# code [37].
Web applications. A recent empirical study of vulnera- Our tool is distinctive in that it is based on a precise
bilities found that parameter tampering, SQL injection, context-sensitive pointer analysis that has been shown
and cross-site scripting attacks account for more than a to scale to large applications [55]. This combination of
third of all reported Web application vulnerabilities [49]. scalability and precision enables our analysis to find all
While different on the surface, all types of attacks listed vulnerabilities matching a specification within the por-
above are made possible by user input that has not been tion of the code that is analyzed statically. In contrast,
(properly) validated. This set of problems is similar to previous practical tools are typically unsound [6, 20].
those handled dynamically by the taint mode in Perl [52], Without a precise analysis, these tools would find too
even though our approach is considerably more extensi- many potential errors, so they only report a subset of er-
ble. We refer to this class of vulnerabilities as the tainted rors that are likely to be real problems. As a result, they
object propagation problem. can miss important vulnerabilities in programs.
Figure 1: Architecture of our static analysis framework.

1.4 Contributions vulnerabilities have recently been appearing on special-


A unified analysis framework. We unify multiple, ized vulnerability tracking sites such as SecurityFocus
seemingly diverse, recently discovered categories of se- and were widely publicized in the technical press [39,
curity vulnerabilities in Web applications and propose an 41]. Recent reports include SQL injections in Oracle
extensible tool for detecting these vulnerabilities using a products [31] and cross-site scripting vulnerabilities in
sound yet practical static analysis for Java. Mozilla Firefox [30].
A powerful static analysis. Our tool is the first prac-
2.1 SQL Injection Example
tical static security analysis that utilizes fully context-
sensitive pointer analysis results. We improve the state Let us start with a discussion of SQL injections, one
of the art in pointer analysis by improving the object- of the most well-known kinds of security vulnerabilities
naming scheme. The precision of the analysis is effec- found in Web applications. SQL injections are caused
tive in reducing the number of false positives issued by by unchecked user input being passed to a back-end
our tool. database for execution [1, 2, 14, 29, 32, 47]. The hacker
A simple user interface. Users of our tool can find may embed SQL commands into the data he sends to the
a variety of vulnerabilities involving tainted objects by application, leading to unintended actions performed on
specifying them using PQL [35]. Our system provides a the back-end database. When exploited, a SQL injection
GUI auditing interface implemented on top of Eclipse, may cause unauthorized access to sensitive data, updates
thus allowing users to perform security audits quickly or deletions from the database, and even shell command
during program development. execution.
Experimental validation. We present a detailed ex- Example 1. A simple example of a SQL injection is
perimental evaluation of our system and the static analy- shown below:
sis approach on a set of large, widely-used open-source
HttpServletRequest request = ...;
Java applications. We found a total of 29 security errors, String userName = request.getParameter("name");
including two important vulnerabilities in widely-used li- Connection con = ...
braries. Eight out of nine of our benchmark applications String query = "SELECT * FROM Users " +
had at least one vulnerability, and our analysis produced " WHERE name = ’" + userName + "’";
con.execute(query);
only 12 false positives.
This code snippet obtains a user name (userName) by in-
1.5 Paper Organization voking request.getParameter("name") and uses it to
The rest of the paper is organized as follows. Section 2 construct a query to be passed to a database for execution
presents a detailed overview of application-level security (con.execute(query)). This seemingly innocent piece
vulnerabilities we address. Section 3 describes our static of code may allow an attacker to gain access to unautho-
analysis approach. Section 4 describes improvements rized information: if an attacker has full control of string
that increase analysis precision and coverage. Section 5 userName obtained from an HTTP request, he can for
describes the auditing environment our system provides. example set it to ’OR 1 = 1; −−. Two dashes are used
Section 6 summarizes our experimental findings. Sec- to indicate comments in the Oracle dialect of SQL, so the
tion 7 describes related work, and Section 8 concludes. WHERE clause of the query effectively becomes the tau-
tology name = ’’ OR 1 = 1. This allows the attacker
2 Overview of Vulnerabilities to circumvent the name check and get access to all user
In this section we focus on a variety of security records in the database. 2
vulnerabilities in Web applications that are caused by SQL injection is but one of the vulnerabilities that
unchecked input. According to an influential sur- can be formulated as tainted object propagation prob-
vey performed by the Open Web Application Security lems. In this case, the input variable userName is con-
Project [41], unvalidated input is the number one secu- sidered tainted. If a tainted object (the source or any
rity problem in Web applications. Many such security other object derived from it) is passed as a parameter to
con.execute (the sink), then there is a vulnerability. As 2.2.3 Hidden Field Manipulation
discussed above, such an attack typically consists of two Because HTTP is stateless, many Web applications
parts: (1) injecting malicious data into the application use hidden fields to emulate persistence. Hidden fields
and (2) using the data to manipulating the application. are just form fields made invisible to the end-user. For
The former corresponds to the sources of a tainted object example, consider an order form that includes a hidden
propagation problem and the latter to the sinks. The rest field to store the price of items in the shopping cart:
of this section presents attack techniques and examples <input type="hidden" name="total_price"
of how exploits may be created in practice. value="25.00">

2.2 Injecting Malicious Data A typical Web site using multiple forms, such as an on-
line store will likely rely on hidden fields to transfer state
Protecting Web applications against unchecked input
information between pages. Unlike regular fields, hid-
vulnerabilities is difficult because applications can obtain
den fields cannot be modified directly by typing values
information from the user in a variety of different ways.
into an HTML form. However, since the hidden field is
One must check all sources of user-controlled data such
part of the page source, saving the HTML page, editing
as form parameters, HTTP headers, and cookie values
the hidden field value, and reloading the page will cause
systematically. While commonly used, client-side filter-
the Web application to receive the newly updated value
ing of malicious values is not an effective defense strat-
of the hidden field.
egy. For example, a banking application may present the
user with a form containing a choice of only two account 2.2.4 HTTP Header Manipulation
numbers; however, this restriction can be easily circum- HTTP headers typically remain invisible to the user
vented by saving the HTML page, editing the values in and are used only by the browser and the Web server.
the list, and resubmitting the form. Therefore, inputs However, some Web applications do process these head-
must be filtered by the Web application on the server. ers, and attackers can inject malicious data into applica-
Note that many attacks are relatively easy to mount: an tions through them. While a normal Web browser will
attacker needs little more than a standard Web browser not allow forging the outgoing headers, multiple freely
to attack Web applications in most cases. available tools allow a hacker to craft an HTTP request
2.2.1 Parameter Tampering leading to an exploit [9]. Consider, for example, the
The most common way for a Web application to accept Referer field, which contains the URL indicating where
parameters is through HTML forms. When a form is sub- the request comes from. This field is commonly trusted
mitted, parameters are sent as part of an HTTP request. by the Web application, but can be easily forged by an
An attacker can easily tamper with parameters passed to attacker. It is possible to manipulate the Referer field’s
a Web application by entering maliciously crafted values value used in an error page or for redirection to mount
into text fields of HTML forms. cross-site scripting or HTTP response splitting attacks.

2.2.2 URL Tampering 2.2.5 Cookie Poisoning


For HTML forms that are submitted using the HTTP Cookie poisoning attacks consist of modifying a
GET method, form parameters as well as their values ap- cookie, which is a small file accessible to Web applica-
pear as part of the URL that is accessed after the form is tions stored on the user’s computer [27]. Many Web ap-
submitted. An attacker may directly edit the URL string, plications use cookies to store information such as user
embed malicious data in it, and then access this new URL login/password pairs and user identifiers. This informa-
to submit malicious data to the application. tion is often created and stored on the user’s computer af-
ter the initial interaction with the Web application, such
Example 2. Consider a Web page at a bank site that al- as visiting the application login page. Cookie poison-
lows an authenticated user to select one of her accounts ing is a variation of header manipulation: malicious in-
from a list and debit $100 from the account. When the put can be passed into applications through values stored
submit button is pressed in the Web browser, the follow- within cookies. Because cookies are supposedly invisi-
ing URL is requested: ble to the user, cookie poisoning is often more dangerous
http://www.mybank.com/myaccount? in practice than other forms of parameter or header ma-
accountnumber=341948&debit_amount=100
nipulation attacks.
However, if no additional precautions are taken by the 2.2.6 Non-Web Input Sources
Web application receiving this request, accessing Malicious data can also be passed in as command-
http://www.mybank.com/myaccount? line parameters. This problem is not as important be-
accountnumber=341948&debit_amount=-5000
cause typically only administrators are allowed to ex-
may in fact increase the account balance. 2 ecute components of Web-based applications directly
from the command line. However, by examining our spoofed page to collect user data even more devastating.
benchmarks, we discovered that command-line utilities For HTTP splitting to be possible, the application must
are often used to perform critical tasks such as initializ- include unchecked input as part of the response headers
ing, cleaning, or validating a back-end database or mi- sent back to the client. For example, applications that
grating the data. Therefore, attacks against these impor- embed unchecked data in HTTP Location headers re-
tant utilities can still be dangerous. turned back to users are often vulnerable.

2.3 Exploiting Unchecked Input 2.3.4 Path Traversal


Path-traversal vulnerabilities allow a hacker to ac-
Once malicious data is injected into an application, an
cess or control files outside of the intended file access
attacker may use one of many techniques to take advan-
path. Path-traversal attacks are normally carried out via
tage of this data, as described below.
unchecked URL input parameters, cookies, and HTTP
2.3.1 SQL Injections request headers. Many Java Web applications use files
SQL injections first described in Section 2.1 are to maintain an ad-hoc database and store application re-
caused by unchecked user input being passed to a back- sources such as visual themes, images, and so on.
end database for execution. When exploited, a SQL in- If an attacker has control over the specification of these
jection may cause a variety of consequences from leak- file locations, then he may be able to read or remove files
ing the structure of the back-end database to adding new with sensitive data or mount a denial-of-service attack
users, mailing passwords to the hacker, or even executing by trying to write to read-only files. Using Java secu-
arbitrary shell commands. rity policies allows the developer to restrict access to the
Many SQL injections can be avoided relatively eas- file system (similar to using chroot jail in Unix). How-
ily with the use of better APIs. J2EE provides the ever, missing or incorrect policy configuration still leaves
PreparedStatement class, that allows specifying a room for errors. When used carelessly, IO operations in
SQL statement template with ?’s indicating statement pa- Java may lead to path-traversal attacks.
rameters. Prepared SQL statements are precompiled, and 2.3.5 Command Injection
expanded parameters never become part of executable Command injection involves passing shell commands
SQL. However, not using or improperly using prepared into the application for execution. This attack technique
statements still leaves plenty of room for errors. enables a hacker to attack the server using access rights
2.3.2 Cross-site Scripting Vulnerabilities of the application. While relatively uncommon in Web
applications, especially those written in Java, this attack
Cross-site scripting occurs when dynamically gener-
technique is still possible when applications carelessly
ated Web pages display input that has not been properly
use functions that execute shell commands or load dy-
validated [7, 11, 22, 46]. An attacker may embed mali-
namic libraries.
cious JavaScript code into dynamically generated pages
of trusted sites. When executed on the machine of a user 3 Static Analysis
who views the page, these scripts may hijack the user ac-
In this section we present a static analysis that ad-
count credentials, change user settings, steal cookies, or
dresses the tainted object propagation problem described
insert unwanted content (such as ads) into the page. At
in Section 2.
the application level, echoing the application input back
to the browser verbatim enables cross-site scripting. 3.1 Tainted Object Propagation
2.3.3 HTTP Response Splitting We start by defining the terminology that was infor-
mally introduced in Example 1. We define an access path
HTTP response splitting is a general technique that
as a sequence of field accesses, array index operations, or
enables various new attacks including Web cache poi-
method calls separated by dots. For instance, the result
soning, cross-user defacement, sensitive page hijacking,
of applying access path f.g to variable v is v.f.g. We
as well as cross-site scripting [28]. By supplying unex-
denote the empty access path by ; array indexing opera-
pected line break CR and LF characters, an attacker can
tions are indicated by [].
cause two HTTP responses to be generated for one mali-
ciously constructed HTTP request. The second HTTP re- A tainted object propagation problem consists of a set
sponse may be erroneously matched with the next HTTP of source descriptors, sink descriptors, and derivation
request. By controlling the second response, an attacker descriptors:
can generate a variety of issues, such as forging or poi- • Source descriptors of the form hm, n, pi specify
soning Web pages on a caching proxy server. Because ways in which user-provided data can enter the pro-
the proxy cache is typically shared by many users, this gram. They consist of a source method m, parame-
makes the effects of defacing a page or constructing a ter number n and an access path p to be applied to
argument n to obtain the user-provided input. We To allow the use of string concatenation in the construc-
use argument number -1 to denote the return result tion of query strings, we use derivation descriptors:
of a method call.
hStringBuffer.append(String), 1, , −1, i, and
• Sink descriptors of the form hm, n, pi specify un-
hStringBuffer.toString(), 0, , −1, i
safe ways in which data may be used in the program.
They consist of a sink method m, argument number Due to space limitations, we show only a few descrip-
n, and an access path p applied to that argument. tors here; more information about the descriptors in our
• Derivation descriptors of the form experiments is available in our technical report [34]. 2
hm, ns , ps , nd , pd i specify how data propa- Below we formally define a security violation:
gates between objects in the program. They consist
Definition 3.1 A source object for a source descriptor
of a derivation method m, a source object given
hm, n, pi is an object obtained by applying access path p
by argument number ns and access path ps , and a
to argument n of a call to m.
destination object given by argument number nd
and access path pd . This derivation descriptor spec- Definition 3.2 A sink object for a sink descriptor
ifies that at a call to method m, the object obtained hm, n, pi is an object obtained by applying access path
by applying pd to argument nd is derived from the p to argument n of a call to method m.
object obtained by applying ps to argument ns . Definition 3.3 Object o2 is derived from object o1 ,
In the absence of derived objects, to detect potential vul- written derived (o1 , o2 ), based on a derivation descrip-
nerabilities we only need to know if a source object is tor hm, ns , ps , nd , pd i, if o1 is obtained by applying ps
used at a sink. Derivation descriptors are introduced to to argument ns and o2 is obtained by applying pd to ar-
handle the semantics of strings in Java. Because Strings gument nd at a call to method m.
are immutable Java objects, string manipulation routines Definition 3.4 An object is tainted if it is obtained by
such as concatenation create brand new String objects, applying relation derived to a source object zero or more
whose contents are based on the original String objects. times.
Derivation descriptors are used to specify the behavior of Definition 3.5 A security violation occurs if a sink ob-
string manipulation routines, so that taint can be explic- ject is tainted. A security violation consists of a sequence
itly passed among the String objects. of objects o1 . . . ok such that o1 is a source object and ok
Most Java programs use built-in String libraries and is a sink object and each object is derived from the pre-
can share the same set of derivation descriptors as a vious one:
result. However, some Web applications use multiple
String encodings such as Unicode, UTF-8, and URL ∀ i : derived (oi , oi+1 ).
0≤i<k
encoding. If encoding and decoding routines propagate
taint and are implemented using native method calls or We refer to object pair ho1 , ok i as a source-sink pair.
character-level string manipulation, they also need to 3.2 Specifications Completeness
be specified as derivation descriptors. Sanitization rou-
The problem of obtaining a complete specification for
tines that validate input are often implemented using
a tainted object propagation problem is an important one.
character-level string manipulation. Since taint does not
If a specification is incomplete, important errors will be
propagate through such routines, they should not be in-
missed even if we use a sound analysis that finds all vul-
cluded in the list of derivation descriptors.
nerabilities matching a specification. To come up with a
It is possible to obviate the need for manual specifica-
list of source and sink descriptors for vulnerabilities in
tion with a static analysis that determines the relationship
our experiments, we used the documentation of the rele-
between strings passed into and returned by low-level
vant J2EE APIs.
string manipulation routines. However, such an analy-
Since it is relatively easy to miss relevant descriptors
sis must be performed not just on the Java bytecode but
in the specification, we used several techniques to make
on all the relevant native methods as well.
our problem specification more complete. For example,
Example 3. We can formulate the problem of detecting to find some of the missing source methods, we instru-
parameter tampering attacks that result in a SQL injec- mented the applications to find places where application
tion as follows: the source descriptor for obtaining pa- code is called by the application server.
rameters from an HTTP request is: We also used a static analysis to identify tainted ob-
jects that have no other objects derived from them, and
hHttpServletRequest.getParameter(String), −1, i examined methods into which these objects are passed.
The sink descriptor for SQL query execution is: In our experience, some of these methods turned out to
be obscure derivation and sink methods missing from our
hConnection.executeQuery(String), 1, i. initial specification, which we subsequently added.
3.3 Static Analysis plying access path p to argument n in a call to
Our approach is to use a sound static analysis to find method m for a sink descriptor hm, n, pi.
all potential violations matching a vulnerability specifi- 3. There exist variables v1 , . . . , vk such that
cation given by its source, sink, and derivation descrip-
tors. To find security violations statically, it is necessary ∀ : pointsto(vi , hi ) ∧ pointsto(vi+1 , hi+1 ),
1≤i<k
to know what objects these descriptors may refer to, a
general problem known as pointer or points-to analysis. where variable vi corresponds to applying ps to ar-
3.3.1 Role of Pointer Information gument ns and vi+1 corresponds applying pd to ar-
gument nd in a call to method m for a derivation
To illustrate the need for points-to information, we
descriptor hm, ns , ps , nd , pd i.
consider the task of auditing a piece of Java code for SQL
injections caused by parameter tampering, as described Our static analysis is based on a context-sensitive Java
in Example 1. points-to analysis developed by Whaley and Lam [55].
Their algorithm uses binary decision diagrams (BDDs)
Example 4. In the code below, string param is
to efficiently represent and manipulate points-to results
tainted because it is returned from a source method
for exponentially many contexts in a program. They have
getParameter. So is buf1, because it is derived from
developed a tool called bddbddb (BDD-Based Deductive
param in the call to append on line 6. Finally, string
DataBase) that automatically translates program analy-
query is passed to sink method executeQuery.
ses expressed in terms of Datalog [50] (a language used
1 String param = req.getParameter("user");
in deductive databases) into highly efficient BDD-based
2
3 StringBuffer buf1; implementations. The results of their points-to analysis
4 StringBuffer buf2; can also be accessed easily using Datalog queries. Notice
5 ... that in the absence of derived objects, finding security vi-
6 buf1.append(param);
7 String query = buf2.toString();
olations can be easily done with pointer analysis alone,
8 con.executeQuery(query); because pointer analysis tracks objects as they are passed
into or returned from methods.
Unless we know that variables buf1 and buf2 may never However, it is relatively easy to implement the tainted
refer to the same object, we would have to conservatively object propagation analysis using bddbddb. Constraints
assume that they may. Since buf1 is tainted, variable of a specification as given by Definition 3.6 can be trans-
query may also refer to a tainted object. Thus a conser- lated into Datalog queries straightforwardly. Facts such
vative tool that lacks additional information about point- as “variable v is parameter n of a call to method m” map
ers will flag the call to executeQuery on line 8 as po- directly into Datalog relations representing the internal
tentially unsafe. 2 representation of the Java program. The points-to results
An unbounded number of objects may be allocated by
used by the constraints are also readily available as Dat-
the program at run time, so, to compute a finite answer,
alog relations after pointer analysis has been run.
the pointer analysis statically approximates dynamic pro- Because Java supports dynamic loading and classes
gram objects with a finite set of static object “names”. A can be dynamically generated on the fly and called reflec-
common approximation approach is to name an object by tively, we can find vulnerabilities only in the code avail-
its allocation site, which is the line of code that allocates able to the static analysis. For reflective calls, we use a
the object. simple analysis that handles common uses of reflection
3.3.2 Finding Violations Statically to increase the size of the analyzed call graph [34].
Points-to information enables us to find security viola- 3.3.3 Role of Pointer Analysis Precision
tions statically. Points-to analysis results are represented
Pointer analysis has been the subject of much compiler
as the relation pointsto(v, h), where v is a program vari-
research over the last two decades. Because determining
able and h is an allocation site in the program.
what heap objects a given program variable may point to
Definition 3.6 A static security violation is a sequence during program execution is undecidable, sound analy-
of heap allocation sites h1 . . . hk such that ses compute conservative approximations of the solution.
Previous points-to approaches typically trade scalability
1. There exists a variable v1 such that for precision, ranging from highly scalable but imprecise
pointsto(v1 , h1 ), where v1 corresponds to ac- techniques [48] to precise approaches that have not been
cess path p applied to argument n of a call to shown to scale [43].
method m for a source descriptor hm, n, pi. In the absence of precise information about pointers, a
2. There exists a variable vk such that sound tool would conclude that many objects are tainted
pointsto(vk , hk ), where vk corresponds to ap- and hence report many false positives. Therefore, many
1 class DataSource {
2 String url; query main()
3 DataSource(String url) { returns
4 this.url = url; object Object sourceObj, sinkObj;
5 } matches {
6 String getUrl(){ sourceObj := source();
7 return this.url; sinkObj := derived*(sourceObj);
8 } sinkObj := sink();
9 ... }
10 }
11 String passedUrl = request.getParameter("..."); Figure 3: Main query for finding source-sink pairs.
12 DataSource ds1 = new DataSource(passedUrl);
13 String localUrl = "http://localhost/";
allowed us to scale our framework to programs consist-
14 DataSource ds2 = new DataSource(localUrl); ing of almost 1,000 classes.
15
16 String s1 = ds1.getUrl(); 3.4 Specifying Taint Problems in PQL
17 String s2 = ds2.getUrl();
While a useful formalism, source, sink, and deriva-
Figure 2: Example showing the importance of context sensitivity. tion descriptors as defined in Section 3.1 are not a user-
practical tools use an unsound approach to pointers, as- friendly way to describe security vulnerabilities. Data-
suming that pointers are unaliased unless proven other- log queries, while giving the user complete control, ex-
wise [6, 20]. Such an approach, however, may miss im- pose too much of the program’s internal representation
portant vulnerabilities. to be practical. Instead, we use PQL, a program query
Having precise points-to information can significantly language. PQL serves as syntactic sugar for Datalog
reduce the number of false positives. Context sensitivity queries, allowing users to express vulnerability patterns
refers to the ability of an analysis to keep information in a familiar Java-like syntax; translation of tainted object
from different invocation contexts of a method separate propagation queries from PQL into Datalog is straight-
and is known to be an important feature contributing to forward. PQL is a general query language capable of ex-
precision. The effect of context sensitivity on analysis pressing a variety of questions about program execution.
precision is illustrated by the example below. However, we only use a limited form of PQL queries to
formulate tainted object propagation problems.
Example 5. Consider the code snippet in Figure 2.
Due to space limitations, we summarize only the most
The class DataSource acts as a wrapper for a URL
important features of PQL here; interested readers are re-
string. The code creates two DataSource objects and
ferred to [35] for a detailed description. A PQL query is
calls getUrl on both objects. A context-insensitive an-
a pattern describing a sequence of dynamic events that
alysis would merge information for calls of getUrl on
involves variables referring to dynamic object instances.
lines 16 and 17. The reference this, which is consid-
The uses clause declares all object variables the query
ered to be argument 0 of the call, points to the object
refers to. The matches clause specifies the sequence of
on line 12 and 14, so this.url points to either the ob-
events on object variables that must occur for a match.
ject returned on line 11 or "http : //localhost/" on
Finally, the return clause specifies the objects returned
line 13. As a result, both s1 and s2 will be considered
by the query whenever a set of object instances partici-
tainted if we rely on context-insensitive points-to results.
pating in the events in the matches clause is found.
With a context-sensitive analysis, however, only s2 will
Source-sink object pairs corresponding to static se-
be considered tainted. 2
curity violations for a given tainted object propagation
While many points-to analysis approaches exist, until
problem are computed by query main in Figure 3. This
recently, we did not have a scalable analysis that gives
query uses auxiliary queries source and sink used to
a conservative yet precise answer. The context-sensitive,
define source and sink objects as well as query derived∗
inclusion-based points-to analysis by Whaley and Lam is
shown in Figure 4 that captures a transitive derivation re-
both precise and scalable [55]. It achieves scalability by
lation. Object sourceObj in main is returned by sub-
using BDDs to exploit the similarities across the expo-
nentially many calling contexts.
A call graph is a static approximation of what methods query derived*(object Object x)
returns
may be invoked at all method calls in the program. While object Object y;
there are exponentially many acyclic call paths through uses
the call graph of a program, the compression achieved by object Object temp;
BDDs makes it possible to efficiently represent as many matches {
y := x |
as 1014 contexts. The framework we propose in this pa- temp := derived(x); y := derived*(temp);
per is the first practical static analysis tool for security to }
leverage the BDD-based approach. The use of BDDs has
Figure 4: Transitive derived relation derived? .
1 class Vector {
query source() 2 Object[] table = new Object[1024];
returns 3
object Object sourceObj; 4 void add(Object value){
uses 5 int i = ...;
object String[] sourceArray; 6 // optional resizing ...
object HttpServletRequest req; 7 table[i] = value;
matches { 8 }
sourceObj = req.getParameter(_) 9
| sourceObj = req.getHeader(_) 10 Object getFirst(){
| sourceArray = req.getParameterValues(_); 11 Object value = table[0];
sourceObj = sourceArray[] 12 return value;
| ... 13 }
} 14 }
15 String s1 = "...";
query sink() 16 Vector v1 = new Vector();
returns 17 v1.add(s1);
object Object sinkObj; 18 Vector v2 = new Vector();
uses 19 String s2 = v2.getFirst();
object java.sql.Statement stmt;
object java.sql.Connection con; Figure 6: Typical container definition and usage.
matches { the identity of the object to be matched is irrelevant.
stmt.executeQuery(sinkObj)
| stmt.execute(sinkObj)
Query source is structured as an alterna-
| con.prepareStatement(sinkObj) tion: sourceObj can be returned from a call to
| ... req.getParameter or req.getHeader for an object
} req of type HttpServletRequest; sourceObj may
query derived(object Object x) also be obtained by indexing into an array returned by
returns a call to req.getParameterValues, etc. Query sink
object Object y; defines sink objects used as parameters of sink methods
matches { such as java.sql.Connection.executeQuery, etc.
y.append(x)
| y = _.append(x) Query derived determines when data propagates from
| y = new String(x) object x to object y. It consists of the ways in which
| y = new StringBuffer(x) Java strings can be derived from one another, including
| y = x.toString()
string concatenation, substring computation, etc. 2
| y = x.substring(_ ,_)
| y = x.toString(_) As can be seen from this example, sub-queries
| ... source, sink, and derived map to source, sink, and
} derivation descriptors for the tainted object propagation
problem. However, instead of descriptor notation for
Figure 5: PQL sub-queries for finding SQL injections.
method parameters and return values, natural Java-like
query source. Object sinkObj is the result of sub-query method invocation syntax is used.
derived? with sourceObj used as a sub-query param-
eter and is also the result of sub-query sink. Therefore, 4 Precision Improvements
sinkObj returned by query main matches all tainted ob- This section describes improvements we made to the
jects that are also sink objects. object-naming scheme used in the original points-to an-
Semicolons are used in PQL to indicate a sequence of alysis [55]. These improvements greatly increase the
events that must occur in order. Sub-query derived∗ de- precision of the points-to results and reduce the number
fines a transitive derived relation: object y is transitively of false positives produced by our analysis.
derived from object x by applying sub-query derived
zero or more times. This query takes advantage of PQL’s 4.1 Handling of Containers
sub-query mechanism to define a transitive closure re- Containers such as hash maps, vectors, lists, and oth-
cursively. Sub-queries source, sink, and derived are ers are a common source of imprecision in the original
specific to a particular tainted object propagation prob- pointer analysis algorithm. The imprecision is due to the
lem, as shown in the example below. fact that objects are often stored in a data structure al-
Example 6. This example describes sub-queries located inside the container class definition. As a result,
source, sink, and derived shown in Figure 5 that the analysis cannot statically distinguish between objects
can be used to match SQL injections, such as the one stored in different containers.
described in Example 1. Usually these sub-queries are Example 7. The abbreviated vector class in Figure 6
structured as a series of alternatives separated by |. The allocates an array called table on line 2 and vectors v1
wildcard character _ is used instead of a variable name if and v2 share that array. As a result, the original analysis
Figure 7: Tracking a SQL injection vulnerability in the Eclipse GUI plugin. Objects involved in the vulnerability trace are shown at the bottom.

will conclude that the String object referred to by s2 5 Auditing Environment


retrieved from vector v2 may be the same as the String The static analysis described in the previous two sec-
object s1 deposited in vector v1. 2 tions forms the basis of our security-auditing tool for
To alleviate this problem and improve the precision of Java applications. The tool allows a user to specify secu-
the results, we create a new object name for the inter- rity patterns to detect. User-provided specifications are
nally allocated data structure for every allocation site of expressed as PQL queries, as described in Section 3.4.
the external container. This new name is associated with These queries are automatically translated into Datalog
the allocation site of the underlying container object. As queries, which are subsequently resolved using bddbddb.
a result, the type of imprecision described above is elim- To help the user with the task of examining violation
inated and objects deposited in a container can only be reports, our provides an intuitive GUI interface. The in-
retrieved from a container created at the same allocation terface is built on top of Eclipse, a popular open-source
site. In our implementation, we have applied this im- Java development environment. As a result, a Java pro-
proved object naming to standard Java container classes grammer can assess the security of his application, of-
including HashMap, HashTable, and LinkedList. ten without leaving the development environment used
to create the application in the first place.
A typical auditing session involves applying the anal-
4.2 Handling of String Routines ysis to the application and then exporting the results into
Eclipse for review. Our Eclipse plugin allows the user to
Another set of methods that requires better object easily examine each vulnerability by navigating among
naming is Java string manipulation routines. Methods the objects involved in it. Clicking on each object allows
such as String.toLowerCase() allocate String ob- the user to navigate through the code displayed in the text
jects that are subsequently returned. With the default editor in the top portion of the screen.
object-naming scheme, all the allocated strings are con-
sidered tainted if such a method is ever invoked on a Example 8. An example of using the Eclipse GUI
tainted string. is shown in Figure 7. The bottom portion of the
screen lists all potential security vulnerabilities re-
We alleviate this problem by giving unique names to ported by our analysis. One of them, a SQL injec-
results returned by string manipulation routines at differ- tion caused by non-Web input is expanded to show
ent call sites. We currently apply this object naming im- all the objects involved in the vulnerability. The
provement to Java standard libraries only. As explained source object on line 76 of JDBCDatabaseExport.java
in Section 6.4, imprecise object naming was responsible is passed to derived objects using derivation methods
for all the 12 false positives produced by our analysis. StringBuffer.append and StringBuffer.toString
Version File Line Analyzed
until it reaches the sink object constructed and used on Benchmark number count count classes
line 170 of the same file. Line 170, which contains a jboard 0.30 90 17,542 264
blueblog 1.0 32 4,191 306
call to Connection.prepareStatement, is highlighted webgoat 0.9 77 19,440 349
in the Java text editor shown on top of the screen. 2 blojsom 1.9.6 61 14,448 428
personalblog 1.2.6 39 5,591 611
snipsnap 1.0-BETA-1 445 36,745 653
6 Experimental Results road2hibernate
pebble
2.1.4
1.6-beta1 333
2 140
36,544
867
889
roller 0.9.9 276 52,089 989
In this section we summarize the experiments we per-
Total 1,355 186,730 5,356
formed and described the security violations we found.
Figure 8: Summary of information about the benchmarks. Applica-
We start out by describing our benchmark applications tions are sorted by the total number of analyzed classes.
and experimental setup, describe some representative
vulnerabilities found by our analysis, and analyze the im- mark applications. Notice that the traditional lines-of-
pact of analysis features on precision. code metric is somewhat misleading in the case of ap-
plications that use large libraries. Many of these bench-
6.1 Benchmark Applications marks depend on massive libraries, so, while the appli-
cation code may be small, the full size of the application
While there is a fair number of commercial and open-
executed at runtime is quite large. An extreme case is
source tools available for testing Web application secu-
road2hibernate, which is a small 140-line stub pro-
rity, there are no established benchmarks for comparing
gram designed to exercise the hibernate object per-
tools’ effectiveness. The task of finding suitable bench-
sistence library; however, the total number of analyzed
marks for our experiments was especially complicated
classes for road2hibernate exceeded 800. A better
by the fact that most Web-based applications are propri-
measure is given in the last column of Figure 8, which
etary software, whose vendors are understandably reluc-
shows the total number of classes in each application’s
tant to reveal their code, not to mention the vulnerabili-
call graph.
ties found. At the same time, we did not want to focus on
artificial micro-benchmarks or student projects that lack 6.2 Experimental Setup
the complexities inherent in real applications. We fo-
The implementation of our system is based on the
cused on a set of large, representative open-source Web-
joeq Java compiler and analysis framework. In our sys-
based J2EE applications, most of which are available on
tem we use a translator from PQL to Datalog [35] and the
SourceForge.
bddbddb program analysis tool [55] to find security vio-
The benchmark applications are briefly described be-
lations. We applied static analysis to look for all tainted
low. jboard, blueblog, blojsom, personalblog,
object propagation problems described in this paper, and
snipsnap, pebble, and roller are Web-based bulletin
we used a total of 28 source, 18 sink, and 29 derivation
board and blogging applications. webgoat is a J2EE ap-
descriptors in our experiments. The derivation descrip-
plication designed by the Open Web Application Secu-
tors correspond to methods in classes such as String,
rity Project [40, 41] as a test case and a teaching tool for
StringBuffer, StringTokenizer, etc. Source and
Web application security. Finally, road2hibernate is a
sink descriptors correspond to methods declared in 19
test program developed for hibernate, a popular object
different J2EE classes, as is further described in [34].
persistence library.
We used four different variations of our static analysis,
Applications were selected from among J2EE-based
obtained by either enabling or disabling context sensitiv-
open-source projects on SourceForge solely on the ba-
ity and improved object naming. Analysis times for the
sis of their size and popularity. Other than webgoat,
variations are listed in Figure 9. Running times shown in
which we knew had intentional security flaws, we had
the table are obtained on an Opteron 150 machine with
no prior knowledge as to whether the applications had
4 GB of memory running Linux. The first section of
security vulnerabilities. Most of our benchmark appli-
cations are used widely: roller is used on dozens of Pre- Points-to analysis Taint analysis
sites including prominent ones such as blogs.sun.com. Context sensitivity proces- X X X X
Improved naming sing X X X X
snipsnap has more than 50,000 downloads according
jboard 37 8 7 12 10 14 12 16 14
to its authors. road2hibernate is a wrapper around blueblog 39 13 8 15 10 17 14 21 16
hibernate, a highly popular object persistence library webgoat 57 45 30 118 90 69 66 106 101
blojsom 60 18 13 25 16 24 21 30 27
that is used by multiple large projects, including a news personalblog 173 107 28 303 32 62 50 19 59
snipsnap 193 58 33 142 47 194 154 160 105
aggregator and a portal. personalblog has more than road2hibernate 247 186 40 268 43 73 44 161 58
pebble 177 58 35 117 49 150 140 136 100
3,000 downloads according to SourceForge statistics. Fi- roller 362 226 55 733 103 196 83 338 129
nally, blojsom was adopted as a blogging solution for Figure 9: Summary of times, in seconds, it takes to perform prepro-
the Apple Tiger Weblog Server. cessing, points-to, and taint analysis for each analysis variation. Analy-
Figure 8 summarizes information about our bench- sis variations have either context sensitivity or improved object naming
enabled, as indicated by X signs in the header row.
Sources Sinks Tainted objects Reported warnings False positives Errors
Context sensitivity X X X X X X
Improved object naming X X X X X X
jboard 1 6 268 23 2 2 0 0 0 0 0 0 0 0 0
blueblog 6 12 17 17 17 17 1 1 1 1 0 0 0 0 1
webgoat 13 59 1,166 201 903 157 51 7 51 6 45 1 45 0 6
blojsom 27 18 368 203 197 112 48 4 26 2 46 2 24 0 2
personalblog 25 31 2,066 1,023 1,685 426 460 275 370 2 458 273 368 0 2
snipsnap 155 100 1,168 791 897 456 732 93 513 27 717 78 498 12 15
road2hibernate 1 33 2,150 843 1,641 385 18 12 16 1 17 11 15 0 1
pebble 132 70 1,403 621 957 255 427 211 193 1 426 210 192 0 1
roller 32 64 2,367 504 1,923 151 378 12 261 1 377 11 260 0 1
Total 392 393 10,973 4,226 8,222 1,961 2,115 615 1,431 41 2,086 586 1,402 12 29

2500
Number of tainted objects

2250 Context-insensitive,
2000 default naming
1750 Context-insensitive,
1500 improved naming
1250
Context-sensitive,
1000 default naming
750
500 Context-sensitive,
improved naming
250
0
jboard blueblog webgoat blojsom personalblog snipsnap road2hibernate pebble roller
Benchmark applications

Figure 10: (a) Summary of data on the number of tainted objects, reported security violations, and false positives for each analysis version. Enabled
analysis features are indicated by X signs in the header row. (b) Comparison of the number of tainted objects for each version of the analysis.

the table shows the times to pre-process the application malicious input. Exploits may also be ruled out because
to create relations accepted by the pointer analysis; the of the particular configuration of the application, but con-
second shows points-to analysis times; the last presents figurations may change over time, potentially making ex-
times for the tainted object propagation analysis. ploits possible. For example, a SQL injection that may
It should be noted that the taint analysis times often not work on one database may become exploitable when
decrease as the analysis precision increases. Contrary the application is deployed with a database system that
to intuition, we actually pay less for a more precise an- does not perform sufficient input checking. Furthermore,
alysis. Imprecise answers are big and therefore take a virtually all static errors we found can be fixed easily by
long time to compute and represent. In fact, the context- modifying several lines of Java source code, so there is
insensitive analysis with default object naming runs sig- generally no reason not to fix them in practice.
nificantly slower on the largest benchmarks than the most After we ran our analysis, we manually examined all
precise analysis. The most precise analysis version takes the errors reported to make sure they represent security
a total of less than 10 minutes on the largest application; errors. Since our knowledge of the applications was not
we believe that this is acceptable given the quality of the sufficient to ascertain that the errors we found were ex-
results the analysis produces. ploitable, to gain additional assurance, we reported the
errors to program maintainers. We only reported to ap-
6.3 Vulnerabilities Discovered plication maintainers only those errors found in the ap-
The static analysis described in this paper reports a to- plication code rather than general libraries over which
tal of 41 potential security violations in our nine bench- the maintainer had no control. Almost all errors we re-
marks, out of which 29 turn out to be security errors, ported to program maintainers were confirmed, resulting
while 12 are false positives. All but one of the bench- in more that a dozen code fixes.
marks had at least one security vulnerability. Moreover, Because webgoat is an artificial application designed
except for errors in webgoat and one HTTP splitting vul- to contain bugs, we did not report the errors we found
nerability in snipsnap [16], none of these security er- in it. Instead, we dynamically confirmed some of the
rors had been reported before. statically detected errors by running webgoat, as well as
a few other benchmarks, on a local server and creating
6.3.1 Validating the Errors We Found actual exploits.
Not all security errors found by static analysis or code It is important to point out that our current analysis
reviews are necessarily exploitable in practice. The error ignores control flow. Without analyzing the predicates,
may not correspond to a path that can be taken dynam- our analysis may not realize that a program has checked
ically, or it may not be possible to construct meaningful its input, so some of the reported vulnerabilities may turn
XXX S INKS
X SQL injections HTTP splitting Cross-site scripting Path traversal Total
S OURCES XX
X
Header manip. 0 snipsnap = 6 blueblog: 1, webgoat: 1, pebble: 1, roller: 1 = 4 0 10
Parameter manip. webgoat: 4, personalblog: 2 = 6 snipsnap = 5 0 blojsom = 2 13
Cookie poisoning webgoat = 1 0 0 0 1
Non-Web inputs snipsnap: 1, road2hibernate: 1 = 2 0 0 snipsnap = 3 5
Total 9 11 4 5 29
Figure 11: Classification of vulnerabilities we found. Each cell corresponds to a combination of a source type (in rows) and sink type (in columns).

out to be false positives. However, our analysis shows all HTTP splitting was the most popular exploitation tech-
the steps involved in propagating taint from a source to a nique (11 cases). Many HTTP splitting vulnerabilities
sink, thus allowing the user to check if the vulnerabilities are due to an unsafe programming idiom where the ap-
found are exploitable. plication redirects the user’s browser to a page whose
Many Web-based application perform some form of URL is user-provided as the following example from
input checking. However, as in the case of the vulnera- snipsnap demonstrates:
bilities we found in snipsnap, it is common that some
checks are missed. It is surprising that our analysis did response.sendRedirect(
request.getParameter("referer"));
not generate any false warnings due to the lack of pred-
icate analysis, even though many of the applications we
Most of the vulnerabilities we discovered are in appli-
analyze include checks on user input. Two security er-
cation code as opposed to libraries. While errors in ap-
rors in blojsom flagged by our analysis deserve special
plication code may result from simple coding mistakes
mention. The user-provided input was in fact checked,
made by programmers unaware of security issues, one
but the validation checks were too lax, leaving room for
would expect library code to generally be better tested
exploits. Since the sanitization routine in blojsom was
and more secure. Errors in libraries expose all applica-
implemented using string operations as opposed to direct
tions using the library to attack. Despite this fact, we
character manipulation, our analysis detected the flow of
have managed to find two attack vectors in libraries: one
taint from the routine’s input to its output. To prove the
in a commonly used Java library hibernate and another
vulnerability to the application maintainer, we created
in the J2EE implementation. While a total of 29 security
an exploit that circumvented all the checks in the vali-
errors were found, because the same vulnerability vec-
dation routine, thus making path-traversal vulnerabilities
tor in J2EE is present in four different benchmarks, they
possible. Note that if the sanitation was properly imple-
actually corresponded to 26 unique vulnerabilities.
mented, our analysis would have generated some false
positives in this case.
6.3.3 SQL Injection Vector in hibernate
6.3.2 Classification of Errors We start by describing a vulnerability vector found
This section presents a classification of all the errors in hibernate, an open-source object-persistence library
we found. A summary of our experimental results is pre- commonly used in Java applications as a lightweight
sented in Figure 10(a). Columns 2 and 3 list the number back-end database. hibernate provides the function-
of source and sink objects for each benchmark. It should ality of saving program data structures to disk and load-
be noted that the number of sources and sinks for all of ing them at a later time. It also allows applications to
these applications is quite large, which suggests that se- search through the data stored in a hibernate database.
curity auditing these applications is time-consuming, be- Three programs in our benchmark suite, personalblog,
cause the time a manual security code review takes is road2hibernate, and snipsnap, use hibernate to
roughly proportional to the number of sources and sinks store user data.
that need to be considered. The table also shows the We have discovered an attack vector in code pertain-
number of vulnerability reports, the number of false pos- ing to the search functionality in hibernate, version
itives, and the number of errors for each analysis version. 2.1.4. The implementation of method Session.find re-
Figure 11 presents a classification of the 29 secu- trieves objects from a hibernate database by passing
rity vulnerabilities we found grouped by the type of the its input string argument through a sequence of calls to
source in the table rows and the sink in table columns. a SQL execute statement. As a result, all invocations of
For example, the cell in row 4, column 1 indicates Session.find with unsafe data, such as the two errors
that there were 2 potential SQL injection attacks caused we found in personalblog, may suffer from SQL injec-
by non-Web sources, one in snipsnap and another in tions. A few other public methods such as iterate and
road2hibernate. delete also turn out to be attack vectors. Our findings
Overall, parameter manipulation was the most com- highlight the importance of securing commonly used
mon technique to inject malicious data (13 cases) and software components in order to protect their clients.
6.3.4 Cross-site Tracing Attacks jboard, the most precise version on average reported 5
Analysis of webgoat and several other applications re- times fewer tainted objects than the least precise. More-
vealed a previously unknown vulnerability in core J2EE over, the number of tainted objects dropped more that 15-
libraries, which are used by thousands of Java applica- fold in the case of roller, our largest benchmark.
tions. This vulnerability pertains to the TRACE method To achieve a low false-positive rate, both context sen-
specified in the HTTP protocol. TRACE is used to echo sitivity and improved object naming are necessary. The
the contents of an HTTP request back to the client for number of false positives remains high for most pro-
debugging purposes. However, the contents of user- grams when only one of these analysis features is used.
provided headers are sent back verbatim, thus enabling One way to interpret the importance of context sensitiv-
cross-site scripting attacks. ity is that the right selection of object “names” in pointer
In fact, this variation of cross-site scripting caused analysis allows context sensitivity to produce precise re-
by a vulnerability in HTTP protocol specification was sults. While it is widely recognized in the compiler com-
discovered before, although the fact that it was present munity that special treatment of containers is necessary
in J2EE was not previously announced. This type of for precision, improved object naming alone is not gener-
attack has been dubbed cross-site tracing and it is re- ally sufficient to completely eliminate the false positives.
sponsible for CERT vulnerabilities 244729, 711843, and All 12 of the false positives reported by the
728563. Because this behavior is specified by the HTTP most precise version for our analysis were located
protocol, there is no easy way to fix this problem at in snipsnap and were caused by insufficient preci-
the source level. General recommendations for avoiding sion of the default allocation site-based object-naming
cross-site tracing include disabling TRACE functionality scheme. The default naming caused an allocation site
on the server or disabling client-side scripting [18]. in snipsnap to be conservatively considered tainted
because a tainted object could propagate to that al-
6.4 Analysis Features and False Positives location site. The allocation site in question is lo-
The version of our analysis that employs both context cated within StringWriter.toString(), a JDK func-
sensitivity and improved object naming described in Sec- tion similar to String.toLowerCase() that returns a
tion 4 achieves very precise results, as measured by the tainted String only if the underlying StringWriter is
number of false positives. In this section we examine constructed from a tainted string. Our analysis conser-
the contribution of each feature of our static analysis ap- vatively concluded that the return result of this method
proach to the precision of our results. We also explain may be tainted, causing a vulnerability to be reported,
the causes of the remaining 12 false positives reported by where none can occur at runtime. We should men-
the most precise analysis version. To analyze the impor- tion that all the false positives in snipsnap are elim-
tance of each analysis feature, we examined the number inated by creating a new object name at every call to,
of false positives as well as the number of tainted objects StringWriter.toString(), which is achieved with a
reported by each variation of the analysis. Just like false one-line change to the pointer analysis specification.
positives, tainted objects provide a useful metric for an-
alysis precision: as the analysis becomes more precise, 7 Related Work
the number of objects deemed to be tainted decreases. In this section, we first discuss penetration testing and
Figure 10(a) summarizes the results for the four differ- runtime monitoring, two of the most commonly used ap-
ent analysis versions. The first part of the table shows the proaches for finding vulnerabilities besides manual code
number of tainted objects reported by the analysis. The reviews. We also review the relevant literature on static
second part of the table shows the number of reported analysis for improving software security.
security violations. The third part of the table summa-
rizes the number of false positives. Finally, the last col- 7.1 Penetration Testing
umn provides the number of real errors detected for each Current practical solutions for detecting Web applica-
benchmark. Figure 10(b) provides a graphical represen- tion security problems generally fall into the realm of
tation of the number of tainted objects for different anal- penetration testing [3, 5, 15, 36, 44]. Penetration testing
ysis variations. Below we summarize our observations. involves attempting to exploit vulnerabilities in a Web
Context sensitivity combined with improved object application or crashing it by coming up with a set of
naming achieves a very low number of false positives. In appropriate malicious input values. Penetration reports
fact, the number of false positives was 0 for all applica- usually include a list of identified vulnerabilities [25].
tions but snipsnap. For snipsnap, the number of false However, this approach is incomplete. A penetration test
positives was reduced more than 50-fold compared to the can usually reveal only a small sample of all possible se-
context-insensitive analysis version with no naming im- curity risks in a system without identifying the parts of
provements. Similarly, not counting the small program the system that have not been adequately tested. Gener-
ally, there are no standards that define which tests to run The security type system in such a language enforces
and which inputs to try. In most cases this approach is not information-flow policies. The annotation effort, how-
effective and considerable program knowledge is needed ever, may be prohibitively expensive in practice. In
to find application-level security errors successfully. addition to explicit information flows our approach ad-
dresses, JFlow also deals with implicit information flows.
7.2 Runtime Monitoring Static analysis has been applied to analyzing SQL
A variety of both free and commercial runtime mon- statements constructed in Java programs that may lead
itoring tools for evaluating Web application security are to SQL injection vulnerabilities [17, 53]. That work an-
available. Proxies intercept HTTP and HTTPS data be- alyzes strings that represent SQL statements to check for
tween the server and the client, so that data, including potential type violations and tautologies. This approach
cookies and form fields, can be examined and modified, assumes that a flow graph representing how string values
and resubmitted to the application [9, 42]. Commercial can propagate through the program has been constructed
application-level firewalls available from NetContinuum, a priori from points-to analysis results. However, since
Imperva, Watchfire, and other companies take this con- accurate pointer information is necessary to construct an
cept further by creating a model of valid interactions be- accurate flow graph, it is unclear whether this technique
tween the user and the application and warning about vi- can achieve the scalability and precision needed to detect
olations of this model. Some application-level firewalls errors in large systems.
are based on signatures that guard against known types
of attacks. The white-listing approach specifies what 8 Conclusions
the valid inputs are; however, maintaining the rules for In this paper we showed how a general class of se-
white-listing is challenging. In contrast, our technique curity errors in Java applications can be formulated as
can prevent security errors before they have a chance to instances of the general tainted object propagation prob-
manifest themselves. lem, which involves finding all sink objects derivable
7.3 Static Analysis Approaches from source objects via a set of given derivation rules.
We developed a precise and scalable analysis for this
A good overview of static analysis approaches applied
problem based on a precise context-sensitive pointer
to security problems is provided in [8]. Simple lexical
alias analysis and introduced extensions to the handling
approaches employed by scanning tools such as ITS4 and
of strings and containers to further improve the preci-
RATS use a set of predefined patterns to identify poten-
sion. Our approach finds all vulnerabilities matching the
tially dangerous areas of a program [56]. While a signif-
specification within the statically analyzed code. Note,
icant improvement on Unix grep, these tools, however,
however, that errors may be missed if the user-provided
have no knowledge of how data propagates throughout
specification is incomplete.
the program and cannot be used to automatically and
We formulated a variety of widespread vulnerabili-
fully solve taint-style problems.
A few projects use path-sensitive analysis to find er- ties including SQL injections, cross-site scripting, HTTP
rors in C and C++ programs [6, 20, 33]. While capa- splitting attacks, and other types of vulnerabilities as
ble of addressing taint-style problems, these tools rely on tainted object propagation problems. Our experimental
an unsound approach to pointers and may therefore miss results showed that our analysis is an effective practical
some errors. The WebSSARI project uses combined un- tool for finding security vulnerabilities. We were able to
sound static and dynamic analysis in the context of ana- find a total of 29 security errors, and all but one of our
lyzing PHP programs [23]. WebSSARI has successfully nine large real-life benchmark applications were vulner-
been applied to find many SQL injection and cross-site able. Two vulnerabilities were located in commonly used
scripting vulnerabilities in PHP code. libraries, thus subjecting applications using the libraries
An analysis approach that uses type qualifiers has to potential vulnerabilities. Most of the security errors
been proven successful in finding security errors in C we reported were confirmed as exploitable vulnerabili-
for the problems of detecting format string violations ties by their maintainers, resulting in more than a dozen
and user/kernel bugs [26, 45]. Context sensitivity sig- code fixes. The analysis reported false positives for only
nificantly reduces the rate of false positives encountered one application. We determined that the false warnings
with this technique; however, it is unclear how scalable reported can be eliminated with improved object naming.
the context-sensitive approach is.
Much of the work in information-flow analysis uses a 9 Acknowledgements
type-checking approach, as exemplified by JFlow [38]. We are grateful to Michael Martin for his help with
The compiler reads a program containing labeled types PQL and dynamic validation of some of the vulnerabili-
and, in checking the types, ensures that the program ties we found and to John Whaley for his support with
cannot contain improper information flow at runtime. the bddbddb tool and the joeq framework. We thank
our paper shepherd R. Sekar, whose insightful comments 119–134, 2004.
[27] A. Klein. Hacking Web applications using cookie poisoning. http://
helped improve this paper considerably. We thank the www.cgisecurity.com/lib/CookiePoisoningByline.pdf,
benchmark application maintainers for responding to our 2002.
[28] A. Klein. Divide and conquer: HTTP response splitting,
bug reports. We thank Amit Klein for providing detailed Web cache poisoning attacks, and related topics. http:
//www.packetstormsecurity.org/papers/general/
clarifications about Web application vulnerabilities and whitepaper httpresponse.pdf, 2004.
Ramesh Chandra, Chris Unkel, and Ted Kremenek and [29] S. Kost. An introduction to SQL injection attacks for Oracle
developers. http://www.net-security.org/dl/articles/
the anonymous paper reviewers for providing additional IntegrigyIntrotoSQLInjectionAttacks.pdf, 2004.
helpful comments. Finally, this material is based upon [30] M. Krax. Mozilla foundation security advisory 2005-38. http://www.
mozilla.org/security/announce/mfsa2005-38.html,
work supported by the National Science Foundation un- 2005.
[31] D. Litchfield. Oracle multiple PL/SQL injection vulnerabilities.
der Grant No. 0326227. http://www.securityfocus.com/archive/1/385333/
2004-12-20/2004-12-26/0, 2003.
[32] D. Litchfield. SQL Server Security. McGraw-Hill Osborne Media, 2003.
References [33] V. B. Livshits and M. S. Lam. Tracking pointers with path and context
sensitivity for bug detection in C programs. In Proceedings of the ACM
[1] C. Anley. Advanced SQL injection in SQL Server applica- SIGSOFT Symposium on the Foundations of Software Engineering, pages
tions. http://www.nextgenss.com/papers/advanced sql 317–326, Sept. 2003.
injection.pdf, 2002. [34] V. B. Livshits and M. S. Lam. Detecting security vulnerabilities in
[2] C. Anley. (more) advanced SQL injection. http://www.nextgenss. Java applications with static analysis. Technical report. Stanford Univer-
com/papers/more advanced sql injection.pdf, 2002. sity. http://suif.stanford.edu/∼livshits/papers/tr/
[3] B. Arkin, S. Stender, and G. McGraw. Software penetration testing. IEEE webappsec tr.pdf, 2005.
Security and Privacy, 3(1):84–87, 2005. [35] M. Martin, V. B. Livshits, and M. S. Lam. Finding application errors using
[4] K. Beaver. Achieving Sarbanes-Oxley compliance for Web applica- PQL: a program query language (to be published). In Proceedings of the
tions through security testing. http://www.spidynamics.com/ ACM Conference on Object-Oriented Programming, Systems, Languages,
support/whitepapers/WI SOXwhitepaper.pdf, 2003. and Applications (OOPSLA), Oct. 2005.
[5] B. Buege, R. Layman, and A. Taylor. Hacking Exposed: J2EE and [36] J. Melbourne and D. Jorm. Penetration testing for Web applications.
Java: Developing Secure Applications with Java Technology. McGraw- http://www.securityfocus.com/infocus/1704, 2003.
Hill/Osborne, 2002. [37] J. S. Miller, S. Ragsdale, and J. Miller. The Common Language Infrastruc-
[6] W. R. Bush, J. D. Pincus, and D. J. Sielaff. A static analyzer for finding ture Annotated Standard. Addison-Wesley Professional, 2003.
dynamic programming errors. Software - Practice and Experience (SPE), [38] A. C. Myers. JFlow: practical mostly-static information flow control. In
30:775–802, 2000. Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles
[7] CGI Security. The cross-site scripting FAQ. http://www. of Programming Languages, pages 228–241, Jan. 1999.
cgisecurity.net/articles/xss-faq.shtml. [39] NetContinuum, Inc. The 21 primary classes of Web application threats.
[8] B. Chess and G. McGraw. Static analysis for security. IEEE Security and https://www.netcontinuum.com/securityCentral/
Privacy, 2(6):76–79, 2004. TopThreatTypes/index.cfm, 2004.
[9] Chinotec Technologies. Paros—a tool for Web application security assess- [40] Open Web Application Security Project. A guide to building se-
ment. http://www.parosproxy.org, 2004. cure Web applications. http://voxel.dl.sourceforge.net/
[10] Computer Security Institute. Computer crime and security sur- sourceforge/owasp/OWASPGuideV1.1.pdf, 2004.
vey. http://www.gocsi.com/press/20020407.jhtml? [41] Open Web Application Security Project. The ten most critical Web applica-
requestid=195148, 2002. tion security vulnerabilities. http://umn.dl.sourceforge.net/
[11] S. Cook. A Web developers guide to cross-site scripting. http://www. sourceforge/owasp/OWASPTopTen2004.pdf, 2004.
giac.org/practical/GSEC/Steve Cook GSEC.pdf, 2003. [42] Open Web Application Security Project. WebScarab. http://www.
[12] C. Cowan, C. Pu, D. Maier, J. Walpole, P. Bakke, S. Beattie, A. Grier, owasp.org/software/webscarab.html, 2004.
P. Wagle, Q. Zhang, and H. Hinton. StackGuard: Automatic adaptive de- [43] S. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued
tection and prevention of buffer-overflow attacks. In Proceedings of the 7th logic. In Proceedings of the 26th ACM Symposium on Principles of Pro-
USENIX Security Conference, pages 63–78, January 1998. gramming Languages, pages 105–118, Jan. 1999.
[13] J. D’Anjou, S. Fairbrother, D. Kehn, J. Kellerman, and P. McCarthy. Java [44] J. Scambray and M. Shema. Web Applications (Hacking Exposed).
Developer’s Guide to Eclipse. Addison-Wesley Professional, 2004. Addison-Wesley Professional, 2002.
[14] S. Friedl. SQL injection attacks by example. http://www.unixwiz. [45] U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. Detecting format string
net/techtips/sql-injection.html, 2004. vulnerabilities with type qualifiers. In Proceedings of the 2001 Usenix Se-
[15] D. Geer and J. Harthorne. Penetration testing: A duet. http://www. curity Conference, pages 201–220, Aug. 2001.
acsac.org/2002/papers/geer.pdf, 2002. [46] K. Spett. Cross-site scripting: are your Web applications vulnerable.
[16] Gentoo Linux Security Advisory. SnipSnap: HTTP response split- http://www.spidynamics.com/support/whitepapers/
ting. http://www.gentoo.org/security/en/glsa/ SPIcross-sitescripting.pdf, 2002.
glsa-200409-23.xml, 2004. [47] K. Spett. SQL injection: Are your Web applications vulnera-
[17] C. Gould, Z. Su, and P. Devanbu. Static checking of dynamically generated ble? http://downloads.securityfocus.com/library/
queries in database applications. In Proceedings of the 26th International SQLInjectionWhitePaper.pdf, 2002.
Conference on Software Engineering, pages 645–654, 2004. [48] B. Steensgaard. Points-to analysis in almost linear time. In Proceedings of
[18] J. Grossman. Cross-site tracing (XST): The new techniques and emerg- the 23th ACM Symposium on Principles of Programming Languages, pages
ing threats to bypass current Web security measures using TRACE 32–41, Jan. 1996.
and XSS. http://www.cgisecurity.com/whitehat-mirror/ [49] M. Surf and A. Shulman. How safe is it out there? http://www.
WhitePaper screen.pdf, 2003. imperva.com/download.asp?id=23, 2004.
[19] J. Grossman. WASC activities and U.S. Web application secu- [50] J. D. Ullman. Principles of Database and Knowledge-Base Systems. Com-
rity trends. http://www.whitehatsec.com/presentations/ puter Science Press, Rockville, Md., volume II edition, 1989.
WASC WASF 1.02.pdf, 2004. [51] D. Wagner, J. Foster, E. Brewer, and A. Aiken. A first step towards auto-
[20] S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for mated detection of buffer overrun vulnerabilities. In Proceedings of Net-
building system-specific, static analyses. In Proceedings of the ACM SIG- work and Distributed Systems Security Symposium, pages 3–17, Feb. 2000.
PLAN 2002 Conference on Programming language Design and Implemen- [52] L. Wall, T. Christiansen, and R. Schwartz. Programming Perl. O’Reilly
tation, pages 69–82, 2002. and Associates, Sebastopol, CA, 1996.
[21] M. Howard and D. LeBlanc. Writing Secure Code. Microsoft Press, 2001. [53] G. Wassermann and Z. Su. An analysis framework for security in Web
[22] D. Hu. Preventing cross-site scripting vulnerability. http://www. applications. In Proceedings of the Specification and Verification of
giac.org/practical/GSEC/Deyu Hu GSEC.pdf, 2004. Component-Based Systems Workshop, Oct. 2004.
[23] Y.-W. Huang, F. Yu, C. Hang, C.-H. Tsai, D.-T. Lee, and S.-Y. Kuo. Se- [54] WebCohort, Inc. Only 10% of Web applications are secured against com-
curing Web application code by static analysis and runtime protection. In mon hacking techniques. http://www.imperva.com/company/
Proceedings of the 13th conference on World Wide Web, pages 40–52, 2004. news/2004-feb-02.html, 2004.
[24] G. Hulme. New software may improve application security. http: [55] J. Whaley and M. S. Lam. Cloning-based context-sensitive pointer alias
//www.informationweek.com/story/IWK20010209S0003, analysis using binary decision diagrams. In Proceedings of the ACM SIG-
2001. PLAN 2004 conference on Programming Language Design and Implemen-
[25] Imperva, Inc. SuperVeda penetration test. http://www.imperva. tation, pages 131–144, June 2004.
com/download.asp?id=3. [56] J. Wilander and M. Kamkar. A comparison of publicly available tools for
[26] R. Johnson and D. Wagner. Finding user/kernel pointer bugs with type static intrusion prevention. In Proceedings of 7th Nordic Workshop on Se-
inference. In Proceedings of the 2004 Usenix Security Conference, pages cure IT Systems, Nov. 2002.

You might also like