0% found this document useful (0 votes)
19 views11 pages

Bypass Testing Web Apps

Uploaded by

codinfinity74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Bypass Testing Web Apps

Uploaded by

codinfinity74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Bypass Testing of Web Applications

Jeff Offutt, Ye Wu, Xiaochen Du and Hong Huang


Information and Software Engineering
George Mason University
Fairfax, VA 22030, USA
(+1) 703-993-1654 / 1651
fofut,wuye,xdu,[email protected]

Abstract must satisfy very high requirements for reliability, availabil-


ity and usability. These characteristics offer powerful new
Web software applications are increasingly being de- abilities and also present new problems to software devel-
ployed in sensitive situations. Web applications are used to opers.
transmit, accept and store data that is personal, company Analyzing, evaluating, maintaining and testing these ap-
confidential and sensitive. Input validation testing (IVT) plications present many new challenges for software de-
checks user inputs to ensure that they conform to the pro- velopers and researchers. Most Web applications are run
gram’s requirements, which is particularly important for by users through a Web browser and use HTML to create
software that relies on user inputs, including Web applica- graphical user interfaces. The user interfaces use the Inter-
tions. A common technique in Web applications is to per- net to connect to software components that run on separate
form input validation on the client with scripting languages Web servers. Web servers are computers or collections of
such as JavaScript. An insidious problem with client-side computers that host software that provides resources in re-
input validation is that end users can bypass this valida- sponse to HTTP requests. Because this research is primarily
tion. Bypassing validation can cause failures in the soft- concerned with software, this paper uses the term “server”
ware, and can also break the security on Web applications, to be synonymous with “software on the Web server.” Users
leading to unauthorized access to data, system failures, in- enter data and make choices by manipulating HTML forms
valid purchases and entry of bogus data. We are developing and pressing submit buttons. Browsers send the data and
a strategy called bypass testing to create client-side tests for choices to the server (that is, the software on the Web
Web applications that intentionally violate explicit and im- server) using HTTP requests. An important point to note
plicit checks on user inputs. This paper describes the strat- is that HTTP is a “stateless” protocol, that is, each request
egy, defines specific rules and adequacy criteria for tests, is independent from previous requests and, by default, the
describes a proof-of-concept automated tool, and presents server does not know whether multiple requests come from
initial empirical results from applying bypass testing. the same or different users.
The type of HTTP request determines how the user’s
data is packaged when sent to the server. Although
HTTP defines a number of request types, this pa-
1. Introduction per only considers the two most common types of
HTTP requests, GET and POST. GET requests pack-
The World Wide Web gives software developers a new age the data as parameters on the URL that are visi-
way to deploy sophisticated, interactive programs with ble in the URL window of most browsers (for example,
complex GUIs and large numbers of back-end software http://www.buyit.com?name=george). POST
components that are integrated in novel and interesting requests package the data in the data packets that are sent
ways. Web applications are constructed from heteroge- to the server.
neous software components that interact with each other A common activity of Web applications is to validate the
and with users in novel ways. Web software components are users’ data. This is necessary to ensure that the software re-
distributed across multiple computers and organizations, are ceives data that will not cause the software to do bad things
often created and integrated dynamically, are written in di- such as crash, corrupt the program state, corrupt the data
verse languages and run on diverse hardware platforms, and store on the server, or allow access to unauthorized users.

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
This type of input validation is crucial for Web applications, validation, then the users can bypass the validation. This
which are heavily user interactive, often serve a very large technique is sometimes used by hackers.
user base, have very high quality requirements, and are al- This research addresses the problem of testing server
ways publicly accessible [14]. Because of the fundamental software for robustness and security. The technique of by-
client-server nature of Web applications, input validation is pass testing is to utilize the ability to bypass client-side
done both on the client and the server. checking to create tests, thereby supplying invalid inputs to
HTML pages (whether static html files or dynamically the software.
created) can include scripting programs that can respond An additional ability that is available when bypassing
to user events and check various syntactic attributes of the HTML forms is to override hidden form fields. HTML al-
inputs before sending the data to the server. User events lows data to be placed into a page with the tag “<INPUT
that JavaScript can respond to are defined by the HTML Type="Hidden" ...>”. These fields are not shown to
document object model (DOM) and include mouse over the users in the browsers, but data in the fields are submitted
events, form field focus events, form field changes, and but- to the server. Bypassing forms allow the additional ability
ton presses, among others. Client-side checking is used to to change or remove the contents of hidden form fields.
check that required fields are filled in, inputs conform to cer- One of the most common ways to violate data security
tain restrictions on characteristics such as length, characters is through “SQL injection.” Many Web applications use
used, and satisfaction of syntactic patterns (such as email client-supplied data in SQL queries. However, if the appli-
addresses). Client-side checking can be done as soon as a cation does not strip potentially harmful characters, users
user event is triggered or after the user clicks on a submit can add SQL statements into their inputs. This is called
button but before the data is submitted. Doing input valida- SQL injection, and Anley [2] claims that despite being sim-
tion on the client avoids the need for a trip to the server and ple to protect against, many production systems connected
allows the checking to be defined within the input form. to the Internet are vulnerable. SQL injection vulnerability
Server side checking is done by programs on the server occurs when an attacker inserts an SQL statement into a
such as CGI/Perl, Java servlets, Java Server Pages, and Ac- query by manipulating data inputs.
tive Server Pages. Server side checking can perform all of
the checks that client-side checking can, but not until after 2. Types of Client-side Validation
the user presses the submit button. Server side checking
cannot respond to user events, but has access to the state of Input validation can check both the syntax and the se-
the file system and database on the server. High level lan- mantics of inputs. Inputs can be validated on the client by
guages (such as Java) can be used on the server to provides using the HTML input boxes to restrict the size or contents
more robust and flexible ways to check inputs and respond of inputs (syntactic restrictions only), and by writing pro-
to invalid user inputs than client-side untyped scripting lan- grams such as JavaScripts to evaluate the values before sub-
guage (such as JavaScript). mission (syntactic and semantic restrictions).

1.1. Running Web application tests through HTML 2.1. Semantic input validation
forms
In an initial attempt at categorization, we have identified
HTML forms expect users to type their values and make three types of semantic data input validation. A number is
their choices by using the keyboard and mouse. However, provided for each type to refer to later in the paper.
it turns out to be easy for users to bypass the HTML to 1. Data type conversion (2.1.A). Most inputs to HTML
send values directly to the server software. For example, form elements are plain strings that are converted to other
if the GET request is expected, the users can simply type types on the server. The client can check whether the string
the parameters into the URL box in their browsers. If the can be converted correctly. For example, if the input is an
POST request is expected, a simple program can be written integer, the client can check to ensure that all characters are
on the client that creates and submits the request. There are numeric digits.
two reasons for bypassing HTML forms. One is for conve- 2. Data format validation (2.1.B). There are many more
nience; if a Web application is used a lot it might be more restrictive constraints on inputs that can be checked, and this
convenient to skip the relatively slow FORM interface. An- is one of the most common ways to validate input on the
other reason is for automation; when running multiple tests Web. This includes checking the format of money, phone
on a Web application, the test execution can be automated numbers, personal identification numbers, email address,
by bypassing the forms. and URLs.
This ability to bypass form entry allows another strategy 3. Inter-value constraint validation (2.1.C). There are of-
to be used. If the Web application uses client-side input ten constraint relationships among input values. For exam-

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
ple, when providing payment information, a check payment Illegal Character Symbol
should include a bank routing number and a bank account, Empty String
whereas a credit card payment should include a credit card Commas ,
number and an expiration date. The combination of a bank Directory paths .. ../
routing number and a credit card number should not be al- Strings starting with for- /
lowed. ward slash
Strings starting with a .
2.2. Syntactic input validation period
Ampersands &
HTML can also be used to impose several types of syn- Control character NIL, newline
tactic restrictions, all of which can be avoided by bypass Characters with high bit decimal 254 and 255
testing: set
1. Built-in length restriction (2.2.A). Text boxes can in- XML tag characters ,
clude a “maxlength” attribute to restrict the length of text
inputs. In the following text input box, only three charac- Table 1. Characters that sometimes cause
ters will be accepted: problems for Web applications.

<INPUT Type=text Name=Age Maxlength=3>

2. Built-in input value restriction (2.2.B). HTML can


use select boxes, check boxes, and radio boxes to restrict SQL injection, cross-site scripting, buffer overflow, embed-
the user to a certain pre-defined set of inputs. ded script attack, and shell escape vulnerabilities, Wheeler
3. Built-in input transfer mode (2.2.C). HTML forms [18] gives a general solution to user input validation from a
define the type of request (GET or POST). Because of the security perspective. Any input accepted from a user must
differences in these requests, this is effectively a way to re- be validated and any illegal input data should be filtered out.
strict the user’s input. HTML links always generate a GET Here are some general rules that should be considered.
request. 1. Filters (2.3.A): Set up filters in the Web application
4. Built-in data access (2.2.D). Web browsers manage to prevent illegal characters from reaching the server’s data
two types of data, cookies and hidden form fields. Hidden store. Table 1 lists some specific characters that can be
form fields can be viewed if the users look at the source, but problems for Web applications.
are normally not shown. Cookies are automatically man- 2. Numeric limits (2.3.B): Limit all numbers to the min-
aged by the browsers and server software and are sent to imum (often zero) and maximum allowed values.
the server automatically. Cookies can also be viewed in 3. Email addresses (2.3.C): A full email address checker
most browsers (for example, in Mozilla by “Tools-Cookie should be enforced. A full email address includes a user-
Manager-Manage Stored Cookies”). A major difference is name and valid domain name. A complete email check
that cookies persist across multiple requests, whereas hid- should also ensure that the email contains all expected in-
den form fields are transient data items that only appear in formation, including subject, and recipient addresses.
individual HTML pages. 4. URLs (2.3.D): URLs (and more generally, URIs)
5. Built-in input field selection (2.2.E). An HTML form should be checked to ensure that they have a valid form and
has a pre-defined set of form fields that users can select val- the destination exists.
ues for. Other values are normally not allowed, and client- 5. Character patterns (2.3.E): When possible, legal char-
side scripting can also disable certain input fields by making acter patterns need to be identified. They can often be ex-
them unavailable or hidden. pressed as regular expressions. Inputs that do not match the
6. Built-in control flow restriction (2.2.F). HTML pages pattern should be rejected.
allow the user to transfer to a certain, fixed set of URLs.
These are defined by Action attributes in FORM tags and 2.4. Feasibility study: CyberChair
by HTML links.
As an initial feasibility study, we applied preliminary
2.3. Generalizing to input validation versions of the bypass testing techniques to CyberChair,
a Web-based paper submission and reviewing system [17]
Some of the vulnerabilities (both on client and server) that is used by a number of conferences, including ISSRE.
are due to the server not checking inputs from the client; It has been in use since 1996, and was opened as free soft-
but it would be a mistake to assume checking data is all that ware for downloading in 2000. The CyberChair web site
is necessary. By considering penetration techniques such as (www.cyberchair.org) listed 242 users in April 2004.

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
CyberChair has multiple phases to support a conference. be an integer value between 1 and some fairly small num-
Authors submit abstracts in the first phase, and then full ber such as 10 or 15. We tried submitting non-integer val-
papers in the second. We manually tested the submission ues, negative numbers, and extremely large numbers, none
page of the second phase. Tests were performed on the IS- of which were detected by the software. Similar problems
SRE 2004 conference server. We did not have access to the are also found in other fields.
source code and did not download CyberChair. We started Although this experience is anectodal, and our process
with a user id and access code from an abstract submission was fairly ad-hoc, it does demonstrate that bypass testing
in the first phase. After logging in, CyberChair returns an can be effective on software that has reasonably wide use.
HTML page with a form to submit papers. To implement The rest of this paper provides a first attempt to formalize
bypass testing, we saved the page and then modified it. these ideas.
In this early feasibility study, our test creation process
was not formalized. We broke the inputs into three levels; 3. Modeling HTML Input Units
the control flow level, parameter level, and value level. At
the control flow level, we attempted to submit a paper with-
Web applications include static HTML files and pro-
out logging in. At the parameter level, we removed some
grams that dynamically generate HTML pages. HTML
parameters from the form and then submitted. At the value
pages can include more than one form, and each form can
level, we tried various values for parameters, including val-
include many input fields. For example, we identified 169
ues that are normally not expected by the software. This
HTML hyperlinks and 20 forms on amazon.com’s home
process revealed five types of faults, all of which are poten-
page. This makes automatic input validation difficult to
tial security holes.
manage by hand, thus we take a first step toward automa-
1. Submission without authentication: After correctly tion by constructing a formal model for HTML client-server
logging in to CyberChair, a submission form is returned. inputs.
We decided to attempt to use that form to submit without Each HTML page, whether a static file or dynamically
a valid login. We saved the page locally, and changed the generated, can have zero or more HTML links and forms
Action attribute on the FORM tag from a relative URL to that let users interact with the server. An input unit I U =
a complete URL. (A relative URL does not include a do-
main name and only works within a single browser session.)
S D T  is the basic element of interaction on the client
side. The inputs are sent to a software component S , which
Then we copied the modified form to a second computer, is on some Web server, and includes a set of input elements
D . D is a set of ordered pairs, n v , where n is a name
and used it to submit a file. The submission was allowed,
implying that the semantics of a login is to send the submis- (parameter) and v represents the set of values that can be
sion page, not to only allow authenticated users to submit. assigned to n. The set of values may be unlimited, as in a
Whereas we used a valid login to find the submission page, text box, or finite, as with a selection input. It is sometimes
it would not be difficult for someone to find or guess a valid convenient to think of these sets of values as defining a type.
URL for the submission, particularly since CyberChair is T is the HTTP transfer mode (GET, POST, etc.).
an open-source program. There are two types of input units, form and link. A form
2. Unsafe use of hidden field: The submission page uses input unit is an HTML form that specifies the server soft-
a hidden field to track the user. We customized the submis- ware component as the Action attribute within the Form
sion form by changing the value of the hidden form field tag, and the input data corresponds to all the input fields
and were still able to submit the paper. This allows the pos- within the form. The transfer mode is specified within the
sibility of overwriting another user’s submission. Method attribute of the Form tag.
3. Disclosing information: We also tried removing the A link input unit is an HTML link in an <A> tag. A link
hidden field and setting its value to empty. In these cases, input unit’s server target can either be a static HTML file
the software failed and returned messages that indicated in or a program such as a servlet, and the target is specified
which file and which line of code the program failed. This as the HREF attribute of the <A> tag. By definition, link
kind of information is confusing to valid users and poten- input units always generate GET requests and only have in-
tially unsafe to show to malicious hackers. put elements when the URL is modified or extended with
4. No validation for parameter constraint: The software parameter values (URL rewriting). In the HTML link <A
does not check if the selected file type and the file submitted HREF="prog?val=1", S is prog and D is fval 1g.
really match. For example, it is possible to select the file As an example, consider the screen shot of STIS in Fig-
type to be pdf, but submit an rtf file instead. This lack of ure 1. The Small Text Information System (STIS) was built
constraint checking can corrupt the state on the server. by a student at GMU for a class project, and extensively up-
5. No data type or data value validation: CyberChair dated by another student to be use as a classroom demo. It
asks the user to submit the number of pages, which should helps users keep track of arbitrary textual information. The

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
main part of the screen in Figure 1 contains two form input input values. For example, in an online grade entry form
units and 12 link input units, plus the menu bars on the top at our university, undergraduate courses have fewer choices
and the bottom have 5 more link input units apiece. Key for grades than graduate courses do. Of course, this compo-
portions of the HTML for the search form input units and sition may lead to more invalid combinations of choices.
the delete link input units are shown in the callout bubbles. The two search forms at the top and the bottom of the
screen in Figure 1 are identical and are thus composed. The
3.1. Composing input units two search forms with buttons “Search” and “All Records”
use the same server software component, but have differ-
ent input elements, thus can be merged under the optional
Some Web pages may have large numbers of inputs,
input element composition rule. Finally, the three “delete”
which can be difficult to manage. For example, it is com-
link input units all reference the same server software com-
mon to have identical forms for things like searching and
ponent, so can be merged under the optional input value
logging in. It is easy to eliminate the redundancy associated
composition rule.
with identical forms in static HTML pages, but harder for
dynamically generated pages. Because the number of po-
tentially unique dynamically generated pages is arbitrary, 4. Bypass Testing
and which pages are generated depends on inputs and pro-
cessing on the server, the problem of identifying all input Most input validation focuses on individual parameters.
units from a client without access to the program source is This works well for traditional software, where the patterns
undecidable. of interaction between users and software are fixed and can-
When possible, the following three composition rules are not be altered by the users. An interesting complexity is that
used to reduce the number of input units to consider. To the use of dynamic Web pages means that the same URL
simplify the discussion, the following definitions assume can produce different forms at different times, depending on
two input units, each of which contains only one parame- the parameters supplied, state on the server, characteristics
ter: iu1 = S1 D1 T1  iu2 = S2 D2 T2 , and of the client, and other environmental information. Addi-
D1 = fn1 v1 g and D2 = fn2 v2 g. All three com- tionally, users of Web applications can not only change the
position rules require the two input units to have the same values of input parameters, but can also change the num-
server software component. ber of input parameters and the control flow. This makes
1. Identical input units composition. Two input units it easier to violate constraints among different parameters
iu1 = S1 D1 T1  and iu2 = S2 D2 T2  are iden- and between software components. This section describes a
tical iff S1 = S2 , D1 = D2 and T1 = T2 . The two units are systematic approach to identify constraints among input pa-
merged into a new input unit iu = S1 D1 T1 . For ex- rameters. This problem is common to all Web testing strate-
ample, it is common for a Web page to have the same search gies as well as GUI testing strategies. Then rules are given
form in two different places on the page. to generate bypass test cases to test the Web application to
2. Optional input element composition. Two input units ensure these constraints are adequately evaluated. Accord-
iu1 = S1 D1 T1  and iu2 = S2 D2 T2  have op- ing to the classification of input validation types from Sec-
tional elements if S1 = S2 , T1 = T2 and one input unit tion 2, bypass testing will be conducted at three levels, as
has an input element name that is not in the other. That is, discussed in the following subsections. Automatic recogni-
there exists n1 v1  2 D1 such that there is no v2 where tion of failure and invalid behavior has not been addressed
n1 v2  2 D2, or conversely, there exists n2 v2  2 D2 by this research.
such that there is no v1 where n2 v1  2 D1 . The two
input units are merged, forming iu = S1 D T1  where
0
4.1. Value level bypass testing
D = fD1  D2 g. This happens when a dynamically gen-
0

erated page includes different input elements, for example, This type of bypass testing tries to verify whether a Web
if an order entry form sometimes includes an input box to application adequately evaluates invalid inputs. This testing
enter a discount coupon code. is based on the restrictions described in Section 2. Given
3. Optional input value composition. Two input units a single input variable, invalid inputs can be generated ac-
iu1 = S1 D1 T1  and iu2 = S2 D2 T2  have op- cording to the 14 types of input validation that are specified
tional input values if S1 = S2 , T1 = T2 and there exists in Section 2.
n1 v1  2 D1 and n2 v2  2 D2 , such that n1 = n2
but v1 6= v2 . Then the two input units are merged, form-  Data type conversion violation (2.1.A). HTML inputs
ing iu = S1 D T1  where D = fD1 , n1 v1 g 
0 0
are initially strings, but they are often converted to
fD1 , n2 v2 g  fn1 v1  v2 g. This happens when other data types on the server. Data type conversion
a dynamically generated page sometimes includes different testing uses values of different types to evaluate the

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
<form action="update_search_params.jsp" method=POST>
<select name="infoCategory">
<option value="[*ALL*]">[All Records]</option>
<option value="Computer">Computer</option>
</select>
Search:<input type="text" name="search" size="30">
<input type="checkbox" name="sname" checked> Name
<input type="checkbox" name="content" checked>Con…
<input type="submit" value="Search"> </form>

<form action="update_search_params.jsp" method=POST>


<input type="submit" value="All records"> </form>

<tr> <td> 1 </td>


<td><a
href="delete_record.jsp?rec_name=Downloads&
rec_category=Computer">delete</a></td> ….
</tr>
<tr> <td> 2 </td>
<td><a href="delete_record.jsp?rec_name=JBuilder&
rec_category=Computer">delete</a></td> ….
</tr>
<tr> <td> 3 </td>
<td><a href="delete_record.jsp?rec_name=JTB&
rec_category=Computer">delete</a></td> ….
</tr>

Figure 1. STIS initial screen.

server-side processing, including general strings, inte- It is relatively easy to enumerate possible invalid inputs
gers, real numbers, and dates. for an input parameter. However, the restrictive relation-
ships among different parameters are hard to identify, hard
Built-in length restriction violation (2.2.A). The to validate and are thus often ignored during testing. There
HTML tag input can have an attribute maxLength, as are many kinds of relationships. One type is invalid pair,
described in Section 2. Invalid values are generated to where two parameters cannot both have values at the same
violate these restrictions. time. For example, it is not reasonable to have a check-
ing account number and a credit card expiration date in the
Built-in value restriction violation (2.2.B). Pre-defined same transaction. Another type is required pair, where if
input restrictions from HTML select, check and radio one parameter has a value, the other must also have a value.
boxes are violated by modifying the submission to sub- For example, if we have a credit card number, we must also
mit values that are not in the pre-defined set. have an expiration date. Parameter level bypass testing tries
to test Web application by executing test cases that violate
Special input value (2.1.B, 2.3.A - 2.3.E). When data
restrictive relationships among multiple parameters.
is stored into a database or XML document, and under
Because the HTML files are very often generated dy-
certain kinds of processing, some special characters,
namically, these relationships cannot always be obtained
as defined in Table 1, can corrupt the data or cause
statically and must be identified dynamically. They are
the software to fail. This data is often validated with
sometimes described in English-language instructions, and
client-side checking, but sometimes with server-side
sometimes simply assumed. Nevertheless, if we can iden-
checking. Thus, following Wheeler’s suggestions [18],
tify and follow all possible ways to send parameters to a
values for text fields are generated that contain special
server program, we can ensure conformance to the restric-
characters.
tive relationships, and then find values to violate the restric-
tive relationships. Thus, we define the input pattern a par-
4.2. Parameter level bypass testing ticular set of parameters that can be used at the same time.
In the example that is shown in Figure 1, four dif-
This type of bypass testing tries to address issues related ferent buttons (two search buttons and two all record
to inter-value constraint (2.1.C), built-in data access (2.2.D), buttons) send requests to the same server soft-
and built-in input field selection (2.2.E). ware component update search params.jsp.

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
Our model composes the four submit buttons Otherwise, a new input pattern has been identi-
into one input unit I U = fS D T g where fied; add iu to IUS as an optional input unit, and
S = fupdate search params.jspg, D = then push iu onto ST.
f(infoCategory, “Computer” or “[All
Records]”), (search, “”), (sname, “checked”), After executing the algorithm, we have a collection of
(content, “checked”), (submit, “Search” or input units I U = S D T , where D = fP1 P2 ::: Pk g
“All record”)g and T = fPOSTg. This input and Pi = fni1 v1i  ni2 v2i  ::: nia vai g. Each Pi is a
unit models all possible values that can be sent to valid input pattern for the input unit I U . Based on the input
update search params.jsp. But STIS is designed patterns, we generate three types of invalid input patterns to
to send only two input patterns. The first one sends all test the restrictive relationships among parameters.
the parameters to the server, and the second only sends
(submit, “all record”). The empty input pattern submits no data to the
The following algorithm is designed to derive all possi- server component. Formally, I U 1 = S  T .
ble input patterns in a Web application. As with finding in- The empty input pattern will violate all re-
put units, this is generally an undecidable problem without quired pair restrictive relationships. For ex-
access to the server program source. Thus, this algorithm ample, in Figure 1, a normal delete request
creates an approximation that is limited by the data that is is delete record.jsp?reco name=JBT
supplied to existing forms in Step 2. The algorithm does &rec category=Computer. The correspond-
not specify how the data is generated; this is up to the dis- ing request of the empty input pattern will be
cretion of the tester. Likely approaches are to use all selec- delete record.jsp.
tions in list boxes and radio buttons and to generate arbitrary
The universal input pattern submits valid values for
valid strings. Elbaum, Karre and Rothermel [5] proposed a
method to generate tests by saving and modifying data that S S S
all parameters that the server component knows about.
Formally, I U 2 = S P1 P2 ::: Pk T ). The
normal users have submitted; this method could be used to
universal input pattern will violate all invalid pair re-
support bypass testing. The input patterns that are created
strictive relationships.
by the algorithm are used to generate parameter level bypass
tests. The differential input pattern submits valid values for
all parameters in one input pattern, plus a value for
Algorithm: DFS to identify input patterns of Web
one parameter that is not in that input pattern (an in-
applications
valid input). For each pair of input patterns Pi and
Input: The initial page of a Web application, I
Pj , generate an invalid input pattern in the following
Output: Identifiable input patterns
ily. I U 3 = S P T , where P = Pi fxg. The
0 0 S
way. x is a parameter from Pj , Pi , chosen arbitrar-

Step 1 : Create a stack ST to retain all input units that need intent of the differential input pattern is to make subtle
to be explored. Define an initial input unit ius as the changes that are not likely to be identified by checks
URL for I with no parameters. Initialize ST to ius . other than invalid input checking.
Create a set IUS to retain all input units that have been
identified. Initialize IUS to empty. Parameter level bypass testing focuses on relationships
among different parameters, therefore, all values of input
Step 2 : While ST is not empty, pop an input unit (defined parameters are selected from a set of valid values.
in Section 3) from ST, generate data for the input unit
and send it to the server. When a reply is returned, 4.3. Control flow level bypass testing
analyze the HTML content. For each input unit iu in
the returned HTML document: The previous two types of bypass testing assume users
if iu is a link input unit (A tag) and the URL follow the control flow that is defined by the software. How-
has already been explored, do not push iu onto ever, users of Web applications can alter the control flow
the stack. (built-in control flow restriction violation 2.2.F) by pressing
the back button, pressing the refresh button, or by directly
if iu 2 I U S (it has already been found), do not entering a URL into a browser. This ability adds uncertainty
push iu onto the stack. and threatens the reliability of Web applications.
0
if there exists an input unit iu 2 I U S such that Control flow level bypass testing tries to verify Web ap-
0
iu and iu have optional input elements, update plications by executing test cases that break the normal ex-
the value of iu. Do not push iu onto the stack. ecution sequence. As a first step, the “normal” control flow

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
must be identified. The algorithm for finding input patterns and the abnormal software behavior is exposed directly to
in the previous section provides the needed information. the users. Abnormal software behavior includes responses
The order for traversing the ius can be obtained from the like run time exceptions and revealing confidential informa-
partial ordering implied by the DFS graph. The input units tion to unauthorized clients. A type 1 response represents
that were identified can be used to define all normal control proper software behavior, while type 2 and 3 responses rep-
flows from that unit. So we expand the algorithm to derive resent inadequate software behavior and are considered to
all normal control flows for all input units. In the algorithm, be failures.
an input unit iu is popped form the stack, data is supplied, Some input values had to be created by hand for bypass
then sent to the server. All the input units that are returned testing, including user names and passwords (STIS has two
from that submission are considered to be candidates for the levels of access) and some very long invalid input strings.
next step in the control flow. Given that, control flow bypass Other inputs were either automatically extracted from the
testing includes two types of control flow alterations: HTML files or randomly generated. For comparison, we
generated four levels of tests: (I) for just the value level, (II)
1. Backward and forward control flow alteration. the parameter level but not control, (III) the control level
Given a normal control flow iu1 iu2 ::: iuk , each pair but not parameter, and (IV) both the parameter and control
of input units (iui iui+1 ) forms a transition. The user level.
pressing the back button is modeled by changing each Table 2 summarizes the results. For each group of tests,
transition (iui iui+1 ) to (iui iui,1 ). The user press- the number of tests (T) and the number of tests that caused
ing the forward button is modeled by changing each a failure (F) are shown. There were a total of 158 tests,
transition (iui iui+1 ) to (iui iui+2 ). 66 of which caused failures. Of these 158 tests, none of
2. Arbitrary control flow alteration. Given a normal the parameter level or control level tests could be executed
control flow iu1 iu2 ::: iuk , for each input unit iui , without bypass testing, and only 55 of the value level tests
1 i k , change the control iui to some arbitrary
could be executed without bypass testing. These 55 tests
iut , such that t 6= i, t 6= i + 1, and t 6= i , 1.
only caused 9 failures. This is strong evidence that bypass
testing can find software problems related to invalid input
data. Of course, no statistical analysis is possible with this
4.4. Summary of bypass testing
early data. This was a case study, without a control compar-
ison or true hypotheses that could be statistically tested.
The three levels of testing in this section, value level,
parameter level, and control flow level, can be used indi-
vidually or combined together. Parameter level and control 6. Related Work
flow level bypass testing focus on interactions among dif-
ferent parameters and different server components, thus can
be run independently of value level bypass testing. The bypass testing techniques are motivated by a combi-
nation of input validation and the category-partition method
[15], a multi-step method to derive test frames and tests
5. Empirical Validation from specifications. The rest of this section discusses the
most closely related test ideas, input validation testing and
As an initial validation, we applied bypass testing to the testing of graphical user interfaces. Similar techniques have
STIS Web application from Section 3 (Figure 1). STIS been used for compiler checking, but these are not directly
stores all information in a database (currently mysql) and related to bypass testing.
is comprised of 17 Java Server Pages and 5 Java bean
classes. Eight of the JSPs process parameterized requests,
6.1. Input validation testing
login.jsp, browse.jsp, record edit.jsp, record delete.jsp,
record insert.jsp, categories.jsp, category edit.jsp and reg-
ister save.jsp. We extensively tested these eight JSPs with Input validation analysis and testing involves statically
bypass testing. analyzing the input command syntax as defined in interface
When a Web application receives invalid inputs, there are and requirement specifications and then generating input
three possible types of software responses. (1) The invalid data from the specification. Hayes and Offutt [6] proposed
inputs are recognized and adequately processed by the soft- techniques for input validation analysis and testing for sys-
ware. (2) The invalid inputs are not recognized and cause tems that take inputs that can be represented in grammars.
abnormal software behavior, but the abnormal behavior is Both IVT and bypass testing attempt to violate input spec-
caught and automatically processed by software error han- ifications, so bypass testing could be viewed as a special
dling mechanism. (3) The invalid inputs are not recognized kind of IVT that addresses concerns of Web applications.

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
Table 2. Failures found for each dynamic component.
Component I II III IV Total Note:
T F T F T F T F T F I: Value Level, No
login 15 0 2 2 n/a n/a 17 2 Parameter or Control
browse 7 4 1 0 1 1 1 1 10 6 II: Parameter Level,
record edit 17 9 5 2 1 1 5 5 28 17 No Control Level
record delete 5 0 2 0 1 1 2 2 10 3 III: Control Level,
record insert 13 9 3 1 1 1 3 3 20 14 No Parameter Level
categories 12 2 2 0 1 0 2 0 17 2 IV: Parameter Level
category edit 13 2 2 0 1 0 2 0 18 2 and Control Level
register save 25 11 6 3 1 0 6 6 38 19 T: number of tests
Total (#tests & #failures) 107 37 23 8 7 4 21 17 158 66 F: number of failures

6.2. GUI testing without limited use of the server software. They define
intra-object testing, where test paths are selected for the
HTML forms can be considered to offer a graphical user variables that have def-use chains within the object, inter-
interface to run software that is deployed across the Web. object testing, where test paths are selected for variables
Memon has developed techniques to test software through that have def-use chains across objects, and inter-client test-
their GUIs by creating inputs that match the input specifica- ing, where tests are derived from a reachability graph that is
tions of the software [12, 13]. This approach focuses on the related to the data interactions among clients.
layout of graphical elements and the user’s interaction when Ricca and Tonella [16] proposed an analysis model and
supplying form data. Bypass testing relies on following the corresponding testing strategies for static Web page analy-
syntax of the GUI forms, but specifically finds ways to vio- sis. As Web technologies have developed, more and more
late constraints imposed by the syntax. The two approaches Web applications are being built on dynamic content, and
are complementary, specifically, GUI testing could be used therefore strategies are needed to model these dynamic be-
to develop values for bypass testing. haviors.
Benedikt, Freire and Godefroid [3] presented VeriWeb,
6.3. Web application testing a navigation testing tool for Web applications. VeriWeb ex-
plores sequences of links in Web applications by nondeter-
Most research in testing Web applications has focused ministically exploring “action sequences”, starting from a
on client-side validation and static server-side validation of given URL. Excessively long sequences of links are limited
links. An extensive listing of existing Web test support tools by pruning paths in a derivative form of prime path cover-
is on a Web site maintained by Hower [7]. The list includes age. VeriWeb creates data for form fields by choosing from
link checking tools, HTML validators, capture/playback a set of name-value pairs that are initialized by the tester.
tools, security test tools, and load and performance stress VeriWeb’s testing is based on graphs where nodes are Web
tools. These are all static validation and measurement tools, pages and edges are explicit HTML links, and the size of the
none of which support functional testing or black box test- graphs is controlled by a pruning process. This is similar to
ing. our algorithm, but does not handle dynamically generated
The Web Modeling Language (WebML) [4] allows Web HTML pages.
sites to be conceptually described. The focus of WebML is Elbaum, Karre and Rothermel [5] proposed a method to
primarily from the user’s view and the data modeling. Our use what they called “user session data” to generate test
model derived from the software is complementary to the cases for Web applications. Their use of the term user ses-
solutions proposed by WebML. sion data was nonstandard for Web application developers.
More recent research has looked into testing software Instead of looking at the data kept in J2EE servlet session,
from a static view, but few researchers have addressed the their definition of user session data was input data collected
problem of dynamic integration. Kung et al. [9, 11] have and remembered from previous user sessions. The user data
developed a model to represent Web sites as a graph, and was captured from HTML forms and included name-value
provide preliminary definitions for developing tests based pairs. Experimental results from comparing their method
on the graph in terms of Web page traversals. Their model with existing methods show that user session data can help
includes static link transitions and focuses on the client side produce effective test suites with very little expense.

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
Lee and Offutt [10] describe a system that generates test problem is harder to solve. This is left as an area for future
cases using a form of mutation analysis. It focuses on research.
validating the reliability of data interactions among Web- The existence of bypass testing may motivate Web ap-
based software system components. Specifically, it consid- plication developers to check data on the server, obviat-
ers XML based component interactions. ing much of the need for bypass testing. This may al-
Jia and Liu [8] propose an approach for formally describ- ready be a trend in the industry. Five years ago, many
ing tests for Web applications using XML. A prototype tool, books on Web software advocated checking inputs with
WebTest, based on this approach was also developed. Their JavaScript as a mechanism to reduce network traffic; mod-
XML approach could be combined with the test criteria pro- ern books and instructors usually advocate doing input val-
posed in this paper to express the tests in XML. idation on the server. Nevertheless, major e-commerce and
Andrews et al. use hierarchical FSMs to model poten- e-service sites still use client-side checking and hidden form
tially large Web applications. Test sequences are generated fields. We found client-side checking on amazon.com and
based on FSMs and use input constraints to reduce the state netflix.com, and the use of hidden form fields to store
space explosion [1]. Finally, our previous work on mod- sensitive information on fastlane.nsf.com. The long
eling of Web applications has led to the development of history of buffer-overflow problems leads us to be some-
atomic sections, which can be used to model dynamic as- what pessimistic that developers will develop software well
pects of Web applications [19]. This approach is at the de- enough to make bypass testing completely obsolete.
tailed analysis level and relies on access to the code, unlike A major advantage of bypass testing is that it does not
bypass testing. require access to the source of the back-end software. This
greatly simplifies the generation of tests and automated
7. Conclusions tools, and we expect bypass tests can be generated auto-
matically. Our current plan is to build tools that parse
HTML, discover and analyze the form field elements, parse
This paper has presented four results. First, the concept the client-side checking encoded in the JavaScript, and au-
of bypass testing was introduced to submit values to Web tomatically generate bypass tests to evaluate the server-side
applications that are not validated by client-side checking. software.
Second, a detailed model for how to introduce inputs to
server-side software components was developed. Third, this
model supports more general input validation testing, and References
rules were defined for bypass and input validation. Finally,
empirical results from an open-source conference manage- [1] A. Andrews, J. Offutt, and R. Alexander. Testing Web ap-
plications. Software and Systems Modeling, 2004. Accepted
ment system and our own laboratory-built Web application
per minor revision.
were shown. [2] C. Anley. Advanced SQL injection in
Bypass testing is a unique and novel way to create test SQL server applications. online, 2004.
cases that is available only because of the unusual mix of http://www.nextgenss.com/papers/advanced
client-server, HTML GUI, and JavaScript technologies that sql injection.pdf, last access February 2004.
are used in Web applications. It is also more complicated [3] M. Benedikt, J. Freire, and P. Godefroid. Veriweb: Auto-
than it appears on the surface. Although the concept is rel- matically testing dynamic Web sites. In Proceedings of 11th
atively simple, to submit inputs that violate client-side con- International World Wide Web Conference (WW W’2002),
straints, the distributed and heterogeneous nature of Web Honolulu, HI, May 2002.
[4] S. Ceri, P. Fraternali, and A. Bongio. Web modeling lan-
applications brings in many complexities. Not surprisingly, guage (WebML): A modeling language for designing Web
the most complicated part is handling inputs to dynamically sites. In Ninth World Wide Web Conference, Amsterdam,
generated HTML forms. The algorithm presented in Sec- Netherlands, May 2000.
tion 4.2 is a first attempt to approximate the kinds of input [5] S. Elbaum, S. Karre, and G. Rothermel. Improving Web
forms that can be generated dynamically. application testing with user session data. In Proceedings of
This research does not directly address the problem of the 25th International Conference on Software Engineering,
automatically determining if the test results are correct, pages 49–59, Portland, Oregon, May 2003. IEEE Computer
commonly called the oracle problem, However, many of the Society Press.
[6] J. H. Hayes and J. Offutt. Increased software reliability
failures that bypass testing is trying to cause are quite ob-
through input validation analysis and testing. In Proceedings
vious – including unauthenticated access, unsafe disclosure
of the 10th International Symposium on Software Reliability
of information, accepting invalid data, and unhandled ex- Engineering, pages 199–209, Boca Raton, FL, November
ceptions. The oracle problem is easy to solve by hand or 1999. IEEE Computer Society Press.
automatically in these cases. For other failures, for exam- [7] R. Hower. Web site test tools and site management tools,
ple, subtle corruption of a server-side database, the oracle 2002. www.softwareqatest.com/qatweb1.html.

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE
[8] X. Jia and H. Liu. Rigorous and automatic testing of Web [14] J. Offutt. Quality attributes of Web software applications.
applications. In 6th IASTED International Conference on IEEE Software: Special Issue on Software Engineering of
Software Engineering and Applications (SEA 2002), pages Internet Software, 19(2):25–32, March/April 2002.
280–285, Cambridge, MA, November 2002. [15] T. J. Ostrand and M. J. Balcer. The category-partition
[9] D. Kung, C. H. Liu, and P. Hsia. An object-oriented Web method for specifying and generating functional tests. Com-
test model for testing Web applications. In Proc. of IEEE munications of the ACM, 31(6):676–686, June 1988.
24th Annual International Computer Software and Applica- [16] F. Ricca and P. Tonella. Analysis and testing of web appli-
tions Conference (COMPSAC2000), pages 537–542, Taipei, cations. In 23rd International Conference on Software Engi-
Taiwan, October 2000. neering (ICSE ‘01), pages 25–34, Toronto, CA, May 2001.
[10] S. C. Lee and J. Offutt. Generating test cases for XML- [17] R. van de Stadt. Cyberchair: A free web-based pa-
based Web component interactions using mutation analysis. per submission and reviewing system. online, 2004.
In Proceedings of the 12th International Symposium on Soft- http://www.cyberchair.org/, last access April 2004.
ware Reliability Engineering, pages 200–209, Hong Kong [18] D. A. Wheeler. Secure Programming for Linux
China, November 2001. IEEE Computer Society Press. and Unix HOWTO. Published online, March 2003.
[11] C. H. Liu, D. Kung, P. Hsia, and C. T. Hsu. Structural http://www.dwheeler.com/secure-programs/, last access Feb
testing of Web applications. In Proceedings of the 11th In- 2004.
ternational Symposium on Software Reliability Engineering, [19] Y. Wu, J. Offutt, and X. Du. Modeling and test-
pages 84–96, San Jose CA, October 2000. IEEE Computer ing of dynamic aspects of Web applications. Submitted
Society Press. for publication, 2004. Technical Report ISE-TR-04-01,
[12] A. M. Memon. GUI testing: Pitfalls and process. IEEE www.ise.gmu.edu/techreps/.
Computer, 35(8):90–91, Aug. 2002.
[13] A. M. Memon, M. L. Soffa, and M. E. Pollack. Hierar-
chical GUI test case generation using automated planning.
IEEE Transactions on Software Engineering, 27(2):144–
155, February 2001.

Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE’04)


1071-9458/04 $ 20.00 IEEE

You might also like