Advances in Computers
Advances in Computers
Advances in Computers
In Computers, 101
FIRST EDITION
Atif Memon
College Park, MD, USA
Table of Contents
Cover image
Title page
Copyright
Preface
Chapter One: Security Testing: A Survey
Abstract
1 Introduction
2 Software Testing
3 Security Engineering
4 Security Testing
5 Security Testing Techniques
6 Application of Security Testing Techniques
7 Summary
Acknowledgments
Abstract
1 Introduction
2 Web Applications
3 Challenges to Web Application Testing
4 Web Application Testing, 20102014
5 Conclusion
Chapter Five: Approaches and Tools for Automated End-to-End Web Testing
Abstract
1 Introduction
2 Capture-Replay Web Testing
3 Programmable Web Testing
4 Test Case Evolution
5 Analysis of the Approaches
6 Overcoming the Limitations of the Existing Approaches
7 Conclusions
Author Index
Subject Index
Contents of Volumes in this Series
Copyright
Preface
Prof. Atif M. Memon Ph.D., College Park, MD, USA
This volume of Advances in Computers is the 101st in this series. This series, which has
been continuously published since 1960, presents in each volume four to seven chapters
describing new developments in software, hardware, or uses of computers.
This 101st volume is the second in a miniseries of volumes based on the theme
Advances in Software Testing. The need for such a thematic miniseries came up when I
was teaching my graduate class Fundamentals of Software Testing, in which students
were asked to study and report on recent (years 201015) advances in various topics
surrounding software testing. They failed to find up-to-date survey papers on almost all
topics. In this miniseries, I have invited leaders in their respective fields of software
testing to write about recent advances. In the first volume in the miniseries (Volume 99),
we focused on combinatorial testing, constraint-based testing, automated fault
localization, automatic black-box testing, and testing access control.
Volume 101 focuses on five important topics. In Chapter 1, entitled Security Testing: A
Survey, Felderer et al. provide an overview of recent security testing techniques. They
first summarize the required background of testing and security engineering. Then, they
discuss the basics and recent developments of security testing techniques applied during
secure software development, ie, model-based security testing, code-based testing and
static analysis, penetration testing and dynamic analysis, as well as security regression
testing. They illustrate security testing techniques by adopting them for an example threetiered web-based business application.
In Chapter 2, entitled Recent Advances in Model-Based Testing, Utting et al. provide
an overview of the field of model-based testing (MBT), particularly, the recent advances
in the last decade. They give a summary of the MBT process, the modeling languages
currently used by various communities who practice MBT, the technologies used to
generate tests from models, and best practices, such as traceability between models and
tests. They also briefly describe several findings from a recent survey of MBT users in
industry, outline the increasingly popular use of MBT for security testing, and discuss
future challenges for MBT.
In Chapter 3, On Testing Embedded Software, Banerjee et al. describe the unique
challenges associated with testing embedded software, which is specialized software
intended to run on embedded devices. As embedded devices have expanded their reach
into major aspects of human lives, from small handheld devices (such as smartphones) to
advanced automotive systems (such as antilock braking systems), the complexity of
embedded software has also grown, creating new challenges for testing. In particular,
embedded software are required to satisfy several nonfunctional constraints, in addition to
functionality-related constraints. Such nonfunctional constraints may include (but not
limited to) timing/energy consumption-related constraints or reliability requirements.
Additionally, embedded systems are often required to operate in interaction with the
physical environment, obtaining their inputs from environmental factors (such as
temperature or air pressure). The need to interact with a dynamic, often nondeterministic
physical environment, further increases the challenges associated with testing embedded
software. The authors discuss advances in software testing methodologies in the context of
embedded software. They introduce key challenges in testing nonfunctional properties of
software by means of realistic examples. They also present an easy-to-follow,
classification of existing research work on this topic.
The importance of test automation in web engineering comes from the widespread use
of web applications and the associated demand for code quality. Test automation is
considered crucial for delivering the quality levels expected by users, since it can save a
lot of time in testing and it helps developers to release web applications with fewer
defects. The main advantage of test automation comes from fast, unattended execution of
a set of tests after some changes have been made to a web application. Moreover, modern
web applications adopt a multitier architecture where the implementation is scattered
across different layers and run on different machines. For this reason, end-to-end testing
techniques are required to test the overall behavior of web applications. In the last years,
several approaches have been proposed for automated end-to-end web testing and the
choice among them depends on a number of factors, including the tools used for web
testing and the costs associated with their adoption. In Chapter 4, Advances in Web
Application Testing, 201014, Sampath and Sprenkle provide background on web
applications and the challenges in testing these distributed, dynamic applications made up
of heterogeneous components. They then focus on the recent advances in web application
testing that were published between 2010 and 2014, including work on test-case
generation, oracles, testing evaluation, and regression testing. Through this targeted
survey, they identify trends in web application testing and open problems that still need to
be addressed. In Chapter 5, entitled Approaches and Tools for Automated End-to-End
Web Testing, Leotta et al. provide a comprehensive overview of automated end-to-end
web testing approaches and summarize the findings of a long-term research project aimed
at empirically investigating their strengths and weaknesses.
I hope that you find these articles of interest. If you have any suggestions of topics for
future chapters, or if you wish to be considered as an author for a chapter, I can be reached
at [email protected].
CHAPTER ONE
Security Testing
A Survey
Michael Felderer*; Matthias Bchler; Martin Johns; Achim D. Brucker; Ruth Breu*; Alexander Pretschner *
University of Innsbruck, Innsbruck, Austria
Abstract
Identifying vulnerabilities and ensuring security functionality by security testing is a widely applied measure to
evaluate and improve the security of software. Due to the openness of modern software-based systems, applying
appropriate security testing techniques is of growing importance and essential to perform effective and efficient
security testing. Therefore, an overview of actual security testing techniques is of high value both for researchers to
evaluate and refine the techniques and for practitioners to apply and disseminate them. This chapter fulfills this
need and provides an overview of recent security testing techniques. For this purpose, it first summarize the
required background of testing and security engineering. Then, basics and recent developments of security testing
techniques applied during the secure software development life cycle, ie, model-based security testing, code-based
testing and static analysis, penetration testing and dynamic analysis, as well as security regression testing are
discussed. Finally, the security testing techniques are illustrated by adopting them for an example three-tiered webbased business application.
Keywords
Security testing; Security testing techniques; Model-based security testing; White-box security testing;
Black-box security testing; Penetration testing; Security regression testing; Security engineering; Software
testing; Survey
1 Introduction
Modern IT systems based on concepts like cloud computing, location-based services, or
social networking are permanently connected to other systems and handle sensitive data.
These interconnected systems are subject to security attacks that may result in security
incidents with high severity affecting the technical infrastructure or its environment.
Exploited security vulnerabilities can cause drastic costs, eg, due to downtimes or the
modification of data. A high proportion of all software security incidents is caused by
attackers who exploit known vulnerabilities [1]. An important, effective, and widely
applied measure to improve the security of software are security testing techniques which
identify vulnerabilities and ensure security functionality.
Software testing is concerned with evaluation of software products and related artifacts
to determine that they satisfy specified requirements, to demonstrate that they are fit for
purpose and to detect defects. Security testing verifies and validates software system
requirements related to security properties like confidentiality, integrity, availability,
authentication, authorization, and nonrepudiation. Sometimes security properties come as
classical functional requirements, eg, user accounts are disabled after three unsuccessful
login attempts which approximates one part of an authorization property and is aligned
with the software quality standard ISO/IEC 9126 [2] defining security as functional
quality characteristic. However, it seems desirable that security testing directly targets the
above security properties, as opposed to taking the detour of functional tests of security
mechanisms. This view is supported by the ISO/IEC 25010 [3] standard that revises
ISO/IEC 9126 and introduces Security as a new quality characteristic which is not
included in the characteristic functionality any more.
Web application security vulnerabilities such as Cross-Site Scripting or SQL Injection,
which can adequately be addressed by security testing techniques, are acknowledged
problems [4] with thousands of vulnerabilities reported each year [5]. Furthermore,
surveys as published by the National Institute of Standards and Technology [6] show high
cost of insecure software due to inadequate testing even on an economic level. Therefore,
support for security testing, which is still often considered as a black art, is essential to
increase its effectiveness and efficiency in practice. This chapter intends to contribute to
the growing need for information on security testing techniques by providing an overview
of actual security testing techniques. This is of high value both for researchers to evaluate
and refine existing techniques and practitioners to apply and disseminate them. In this
chapter, security testing techniques are classified (and also the discussion thereof)
according to their test basis within the secure software development life cycle into four
different types: (1) model-based security testing is grounded on requirements and design
models created during the analysis and design phase, (2) code-based testing and static
analysis on source and byte code created during development, (3) penetration testing and
dynamic analysis on running systems, either in a test or production environment, as well
as (4) security regression testing performed during maintenance.
This chapter provides a comprehensive survey on security testing and is structured as
follows. Section 2 provides an overview of the underlying concepts on software testing.
Section 3 discusses the basic concepts of security engineering and the secure software
development life cycle. Section 4 provides an overview of security testing and its
integration in the secure software development life cycle. Section 5 discusses the security
testing techniques model-based security testing, code-based testing and static analysis,
penetration testing, and dynamic analysis as well as security regression testing in detail.
Section 6 discusses the application of security testing techniques to three tiered business
applications. Finally, Section 7 summarizes this chapter.
2 Software Testing
According to the classic definition in software engineering [7], software testing consists of
the dynamic verification that a program provides expected behaviors on a finite set of test
cases, a so called test suite, suitably selected from the usually infinite execution domain.
This dynamic notion of testing, so called dynamic testing, evaluates software by observing
its execution [8]. The executed system is called system under test (SUT). More general
notions of testing [9] consist of all life cycle activities, both static and dynamic, concerned
with evaluation of software products and related artifacts to determine that they satisfy
specified requirements, to demonstrate that they are fit for purpose and to detect defects.
This definition also takes static testing into account, which checks software development
artifact (eg, requirements, design, or code) without execution of these artifacts. The most
prominent static testing approaches are (manual) reviews and (automated) static analysis,
which are often combined with dynamic testing, especially in the context of security. For
security testing, the general notion of testing comprising static and dynamic testing is
therefore frequently applied [1012], and thus also in this chapter testing comprises static
and dynamic testing.
After running a test case, the observed and intended behaviors of a SUT are compared
with each other, which then results in a verdict. Verdicts can be either of pass (behaviors
conform), fail (behaviors do not conform), and inconclusive (not known whether
behaviors conform) [13]. A test oracle is a mechanism for determining the verdict. The
observed behavior may be checked against user or customer needs (commonly referred to
as testing for validation), against a specification (testing for verification), A failure is an
undesired behavior. Failures are typically observed (by resulting in verdict fail) during the
execution of the system being tested. A fault is the cause of the failure. It is a static defect
in the software, usually caused by human error in the specification, design, or coding
process. During testing, it is the execution of faults in the software that causes failures.
Differing from active execution of test cases, passive testing only monitors running
systems without interaction.
Testing can be classified utilizing the three dimensions objective, scope, and
accessibility [14, 15] shown in Fig. 1.
Test objectives are reason or purpose for designing and executing a test. The reason is
either to check the functional behavior of the system or its nonfunctional properties.
Functional testing is concerned with assessing the functional behavior of an SUT, whereas
nonfunctional testing aims at assessing nonfunctional requirements with regard to quality
characteristics like security, safety, reliability or performance.
The test scope describes the granularity of the SUT and can be classified into
component, integration, and system testing. It also determines the test basis, ie, the
artifacts to derive test cases. Component testing (also referred to as unit testing) checks the
smallest testable component (eg, a class in an object-oriented implementation or a single
electronic control unit) in isolation. Integration testing combines components with each
other and tests those as a subsystem, that is, not yet a complete system. System testing
checks the complete system, including all subsystems. A specific type of system testing is
acceptance testing where it is checked whether a solution works for the user of a system.
Regression testing is a selective retesting to verify that modifications have not caused side
effects and that the SUT still complies with the specified requirements [16].
In terms of accessibility of test design artifacts, we can classifying testing methods into
white- and black-box testing. In white-box testing, test cases are derived based on
information about how the software has been designed or coded [7]. In black-box testing,
test cases rely only on the input/output behavior of the software. This classification is
especially relevant for security testing, as black-box testing, where no or only basic
information about the system under test is provided, enables to mimic external attacks
from hackers. In classical software testing, a related classification of test design
techniques [17] distinguishes between structure-based testing techniques (ie, deriving test
cases from internal descriptions like implementation code), specification-based testing
techniques (ie, deriving test cases from external descriptions of software like
specifications), and experience-based testing techniques (ie, deriving test cases based on
knowledge, skills, and background of testers).
The process of testing comprises the core activities test planning, design,
implementation, execution, and evaluation [9]. According to Refs. [18] and [9], test
planning is the activity of establishing or updating a test plan. A test plan includes the test
objectives, test scope, and test methods as well as the resources, and schedule of intended
test activities. It identifies, amongst others, features to be tested and exit criteria defining
conditions for when to stop testing. Coverage criteria aligned with the tested feature types
and the applied test design techniques are typical exit criteria. Once the test plan has been
established, test control begins. It is an ongoing activity in which the actual progress is
compared against the plan which often results in concrete measures. During the test design
phase the general testing objectives defined in the test plan are transformed into tangible
test conditions and abstract test cases. For test derivation, specific test design techniques
can be applied, which can according to ISO/IEC/IEEE 29119 [17] be classified into
specification-based, structure-based, and experience-based techniques. Test
implementation comprises tasks to make the abstract test cases executable. This includes
tasks like preparing test harnesses and test data, providing logging support or writing test
scripts which are necessary to enable the automated execution of test cases. In the test
execution phase, the test cases are then executed and all relevant details of the execution
are logged and monitored. In manual test execution, testing is guided by a human, and in
automated testing by a specialized application. Finally, in the test evaluation phase the exit
criteria are evaluated and the logged test results are summarized in a test report.
In model-based testing (MBT), manually selected algorithms automatically and
systematically generate test cases from a set of models of the system under test or its
environment [19]. Whereas test automation replaces manual test execution with automated
test scripts, MBT replaces manual test designs with automated test designs and test
generation.
3 Security Engineering
In this section, we cover basic concepts of security engineering as well as an overview of
the secure software development life cycle.
At the other side, an error is always produced by a fault. A fault is not necessarily related
to security properties but is the cause of errors and failures in general.
A vulnerability is a special type of fault. If the fault is related to security properties, it is
called a vulnerability. A vulnerability is always related to one or more assets and their
corresponding security properties. An exploitation of a vulnerability attacks an asset by
violating the associated security property. Since vulnerabilities are always associated with
the protection of an asset, the security relevant fault is usually correlated with a
mechanism that protects the asset. A vulnerability either means that (1) the responsible
security mechanism is completely missing, or (2) the security mechanism is in place but is
implemented in a faulty way.
An exploit is a concrete malicious input that makes use of the vulnerability in the
system under test (SUT) and violates the property of an asset. Vulnerabilities can often be
exploited in different ways. One concrete exploit selects a specific asset and a specific
property, and makes use of the vulnerability to violate the property for the selected asset.
A threat is the potential cause of an unwanted incident that harms or reduces the value
of an asset. For instance, a threat may be a hacker, power outages, or malicious insiders.
An attack is defined by the steps a malicious or inadvertently incorrectly behaving entity
performs to the end of turning a threat into an actual corruption of an assets properties.
This is usually done by exploiting a vulnerability.
Security aspects can be considered on the network, operating system, and application
level. Each level has its own security threats and corresponding security requirements to
deal with them. Typical threats on the network level are distributed denial-of-service or
network intrusion. On the operating system level, all types of malware cause threats.
Finally, on the application level threats typical threats are related to access control or are
application type specific like Cross-Site Scripting in case of web applications. All levels
of security can be subject to tests.
Security testing simulates attacks and employs other kinds of penetration testing
attempting to compromise the security of a system by playing the role of a hacker trying to
attack the system and exploit its vulnerabilities [21]. Security testing requires specific
expertise which makes it difficult and hard to automate [22]. By identifying risks in the
system and creating tests driven by those risks, security vulnerability testing can focus on
parts of a system implementation in which an attack is likely to succeed.
Risks are often used as a guiding factor to define security test processes. For instance,
Potter and McGraw [22] consider the process steps creating security misuse cases, listing
normative security requirements, performing architectural risk analysis, building riskbased security test plans, wielding static analysis tools, performing security tests,
performing penetration testing in the final environment, and cleaning up after security
breaches. Also the Open Source Security Testing Methodology Manual (OSSTMM) [23]
and the OWASP Testing Guide [10] take risks into account for their proposed security
testing activities.
4 Security Testing
In this section, we cover basic concepts of security testing and the integration of security
testing in the secure software development life cycle.
FIGURE 2 Most faults in security mechanisms are related to missing or incorrect functionality,
most vulnerabilities are related to unintended side-effect behavior (adapted from Thompson [30]).
security testing must be holistic covering the whole secure software development life
cycle [12]. In concrete terms, Fig. 3 shows a recommended distribution of static and
dynamic testing efforts among the phases of the secure software development life cycle
according to Ref. [10]. It shows that security testing should be balanced over all phases,
with a focus on the early phases, ie, analysis, design, and implementation.
FIGURE 3 Proportion of test effort in secure software development life cycle according to Ref.
[10].
To provide support for the integration of security testing into all phases of the secure
software development process, major security development processes (see Section 3.2),
consider the integration of testing. In the Security Development Life cycle (SDL) [26]
from Microsoft practices with strong interference with testing efforts are the following:
SDL Practice #2 (Requirements): Establish Security and Privacy Requirements
SDL Practice #4 (Requirements): Perform Security and Privacy Risk Assessments
SDL Practice #5 (Design): Establish Design Requirements
SDL Practice #7 (Design): Use Threat Modeling
SDL Practice #10 (Implementation): Perform Static Analysis
SDL Practice #11 (Verification): Perform Dynamic Analysis
SDL Practice #12 (Verification): Perform Fuzz Testing
SDL Practice #13: Conduct Attack Surface Review
SDL Practice #15: Conduct Final Security Review
In OpenSAMM [27] from OWASP, the verification activity includes the security
practices design review, code review, as well as (dynamic) security testing.
The OWASP Testing Guide [10] and the OWASP Code Review Guide [31] provide a
detailed overview of the variety of testing activities of web application security. While the
Testing Guide has a focus on black-box testing, the Code Review Guide is a white-box
approach focusing on manual code review. Overall, the Testing Guide distinguishes 91
different testing activities split into 11 subcategories (ie, information gathering,
configuration and deployment management testing, identity management testing,
authentication testing, authorization testing, session management testing, data validation
testing, error handling, cryptography, business logic testing, as well as client side testing).
Applying security testing techniques to web applications is covered in Section 6.
The OWASP testing framework workflow, which is also contained in the OWASP
Testing Guide, contains checks and reviews of respective artifacts in all secure software
development phases, creation of UML and threat models in the analysis and design
phases, unit and system testing during development and deployment, penetration testing
during deployment, as well as regression testing during maintenance. Proper security
testing requires a mix of techniques as there is no single testing technique that can be
performed to effectively cover all security testing and their application within testing
activities at unit, integration, and system level. Nevertheless, many companies adopt only
one security testing approach, for instance penetration testing [10].
Fig. 4 abstracts from concrete security testing techniques mentioned before, and
classifies them according to their test basis within the secure software development life
cycle.
FIGURE 4 Security testing techniques in the secure software development life cycle.
security properties, (2) testing security mechanisms, and (3) testing for vulnerabilities.
Classification of model-based (security) testing. Several publications have been
published that propose taxonomies and classifications of existing MBT [32, 33] and
MBST approaches [39, 40]. We will focus on the classification proposed by
Schieferdecker et al. [40] considering different perspectives used in securing a system.
The authors claim that MBST needs to be based on different types of models and
distinguish three types of input models for security test generation, ie, architectural and
functional models, threat, fault and risk models, as well as weakness and vulnerability
models. Architectural and functional models of the SUT are concerned with security
requirements and their implementation. They focus on the expected system behavior.
Threat, fault and risk models focus on what can go wrong, and concentrate on causes and
consequences of system failures, weaknesses or vulnerabilities. Weakness and
vulnerability models describe weaknesses or vulnerabilities by themselves.
In the following, we exemplary describe selected approaches, that make use of different
models according to the classification of Schieferdecker et al.
a mapping from these browser actions to executable API calls to make them operational in
a browser. Finally, a test execution engine executes the operationalized test cases on the
SUT to verify, if the implementation of the model suffers from the same vulnerability as
reported by the model checker at the abstract level.
test patterns. The involved security test patterns are formalized by using a minimal test
design strategies language framework which is represented as a UML profile. Such a
(semi-)formal security test pattern is then used as the input for a test generator
accompanied by the test design model out of which the test cases are generated. The
approach is based on the CORAS method [24] for risk analysis activities. Finally, a tool
prototype is presented which shows how to combine the CORAS-based risk analysis with
pattern-based test generation.
Botella et al. [47] describe an approach to security testing called Risk-Based
Vulnerability Testing, which is guided by risk assessment and coverage to perform and
automate vulnerability testing for web applications. Risk-Based Vulnerability testing
adapts model-based testing techniques using a pattern-based approach for the generation
of test cases according to previously identified risks. For risk identification and analysis,
the CORAS method [24] is utilized. The integration of information from risk analysis
activities with the model-based test generation approach is realized by a test purpose
language. It is used to formalize security test patterns in order to make them usable for test
generators. Risk-Based Vulnerability Testing is applied to security testing of a web
application.
Zech et al. [48, 49] propose a new method for generating negative security tests for
nonfunctional security testing of web applications by logic programming and knowledge
engineering. Based on a declarative model of the system under test, a risk analysis is
performed and used for derivation of test cases.
Manual code review is the process by which an expert is reading program code line-byline to identify vulnerabilities. This requires expertise in three areas: the application
architecture, the implementation techniques (programming languages, frameworks used to
build the application), as well as security. Thus, a good manual code review should start
with a threat model or at least an interview with the developers to get a good
understanding of the application architecture, its attack surface, as well as the
implementation techniques. After this, the actual code review can start in which code is,
guided by the identified attack surface, manually analyzed for security vulnerabilities.
Finally, the results of the analysis are reported back to development to fix the identified
vulnerabilities as well as to educate architects and developers to prevent similar issues in
the future. Overall, manual code reviews are a tedious process that requires skill,
experience, persistence, and patience.
Similarly, if an SAST tool does not report security issues, this can have two reasons:
The source code is secure (true negative)
The source code has security vulnerability but due to limitations of the tool, the tool does
not report a problem (false negative).
There are SAST tools available for most of the widely used programming language, eg,
FindBugs [54] that is able to analyzes Java byte code and, thus, can analyze various
languages running on the Java Virtual Machine. There are also specializes techniques for
Java programs (eg, [55]), or C/C++ (eg, [56]) as well as approaches that work on multiple
languages (eg, [57]). For a survey on static analysis methods, we refer the reader
elsewhere [51, 58]. Moreover, we discuss further static analysis techniques in the context
of a small case study in Section 6.
Besides the fact that SAST tools can be applied very early in the software development
life cycle as well as the fact that source code analysis can provide detailed fix
recommendations, SAST has one additional advantages over most dynamic security
testing techniques: SAST tools can analyze all control flows of a program. Therefore,
SAST tools achieve, compared to dynamic test approaches, a significant higher coverage
of the program under test and, thus, produce a significant lower false negative rate. Thus,
SAST is a very effective method [59] for detecting programming related vulnerabilities
early in the software development life cycle.
1. Planning: No actual testing occurs in this phase. Instead, important side conditions and
boundaries for the test are defined and documented. For instance, the relevant components
of the applications that are subject of the test are determined and the nature/scope of the to
be conducted tests and their level of invasiveness.
2. Discovery: This phase consists of a steps. First, all accessible external interfaces of the
system under test are systematically discovered and enumerated. This set of interfaces
constitutes the systems initial attack surface. The second part of the discovery phase is
vulnerability analysis, in which the applicable vulnerability classes that match the
interface are identified (eg, Cross-Site Scripting for HTTP services or SQL Injection for
applications with database back end). In a commercial penetration test, this phase also
includes the check if any of the found components is susceptible to publicly documented
vulnerabilities which is contained in precompiled vulnerability databases.
3. Attack: Finally, the identified interfaces are tested through a series of attack attempts. In
these attacks, the testers actively attempts to compromise the system via sending attack
payloads. In case of success, the found security vulnerabilities are exploited in order to
gain further information about the system, widen the access privileges of the tester and
find further system components, which might expose additional interfaces. This expanded
attack surface is fed back into the discovery phase, for further processing.
4. Reporting: The reporting phase occurs simultaneously with the other three phases of the
penetration test and documents all findings along with their estimated severeness.
5.3.4 Fuzzing
Fuzzing or fuzz testing is a dynamic testing technique that is based on the idea of feeding
random data to a program until it crashes. It was pioneered in the late 1980s by Barton
Miller at the University of Wisconsin [65]. Since then, fuzz testing has been proven to be
an effective technique for finding vulnerabilities in software. While the first fuzz testing
approaches where purely based on randomly generated test data (random fuzzing),
advances in symbolic computation, model-based testing, as well as dynamic test case
generation have lead to more advanced fuzzing techniques such as mutation-based
fuzzing, generation-based fuzzing, or gray-box fuzzing.
Random fuzzing is the simplest and oldest fuzz testing technique: a stream of random
input data is, in a black-box scenario, send to the program under test. The input data can,
eg, be send as command line options, events, or protocol packets. This type of fuzzing in,
in particular, useful for test how a program reacts on large or invalid input data. While
random fuzzing can find already severe vulnerabilities, modern fuzzers do have a detailed
understanding of the input format that is expected by the program under test.
Mutation-based fuzzing is one type of fuzzing in which the fuzzer has some knowledge
about the input format of the program under test: based on existing data samples, a
mutation-based fuzzing tools generated new variants (mutants), based on a heuristics, that
it uses for fuzzing. there are a wide range of mutation-based fuzzing approaches available
for different domains. We refer the interested reader elsewhere for details [66, 67].
Generation-based fuzzing uses a model (of the input data or the vulnerabilities) for
generating test data from this model or specification. Compared to pure random-based
fuzzing, generation-based fuzzing achieves usually a higher coverage of the program
under test, in particular if the expected input format is rather complex. Again, for details
we refer the interested reader elsewhere [68, 69].
Advanced fuzzing techniques combine several of the previously mentioned approaches,
eg, use a combination of mutation-based and generation-based techniques as well as
observe the program under test and use these observations for constructing new test data.
This turns fuzzing into a gray-box testing technique that also utilizes symbolic
computation that is usually understood as a technique used for static program analysis.
Probably the first and also most successful application of gray-box fuzzing is SAGE from
Microsoft [70, 71], which combines symbolic execution (a static source code analysis
technique) and dynamic testing. This combination is today known as concolic testing and
inspired several advanced security testing eg, [72, 73], as well as functional test
approaches.
usual strategy is to focus on the identification of modified parts of the SUT and to select
test cases relevant to them. For instance, the retest-all technique is one naive type of
regression test selection by reexecuting all tests from the previous version on the new
version of the system. It is often used in industry due to its simple and quick
implementation. However, its capacity in terms of fault detection is limited [79].
Therefore, considerable amount of work is related to the development of effective and
scalable selective techniques.
In the following, we discuss available security testing approaches according to the
categories minimization, prioritization, and selection. The selected approaches are based
on a systematic classification of security regression testing approaches by Felderer and
Fourneret [76].
based on a cost function measuring the difference between the original execution and the
mutable replay.
potential cause(s) of failure are found by comparing control flow behavior of the passing
and failing inputs and identifying code fragments where the control flows diverge.
FIGURE 6 Number of entires in the common vulnerabilities and exposures (CVE) index by
category.
For instance, we exclude techniques for ensuring the security of the underlying
infrastructure such as the network configuration, as, eg, discussed in Refs. [96, 97] as well
as model-based testing techniques (as discussed in Section 5.1) that are in particular useful
for finding logical security flaws. While a holistic security testing strategy makes use of
all available security testing strategies, we recommend to concentrate efforts first on
techniques for the most common vulnerabilities. Furthermore, we also do not explicitly
discuss retesting after changes of the system under test, which is addressed by suitable
(security) regression testing approaches (as discussed in Section 5.4).
Performance and resource utilization: different tools and methods require different
computing power and different manual efforts.
Costs for licenses, maintenance, and support: to use security testing tools efficiently in a
large enterprise, they need to be integrated into, eg, bug tracking or reporting solutions
often they provide their own server applications for this. Thus, buying a security
testing tool is usually not a one-time effortit requires regular maintenance and
support.
Quality of results: different tools that implement the same security testing technique
provide a different quality level (eg, in terms of fix recommendations or false positives
rates).
Supported technologies: security testing tools usually only support a limited number of
technologies (eg, programming languages, interfaces, or build systems). If these tools
support multiple technologies, they do not necessary support all of them with the same
quality. For example, a source analysis tool that supports Java and C might work well
for Java but not as well for C.
In the following, we focus on the first two aspects: the attack surface and the
application type. These two aspects are, from a security perspective the first ones to
consider for selecting the best combinations of security testing approaches for a specific
application type (product). In a subsequent step, the other factors need to be considered for
selecting a specific tool that fits the needs of the actual development as well as the
resource and time constraints.
Adobe Flash applets. It does so by decompiling the source code and statically detecting
suspicious situations. Then it constructs an attack payload and executes the exploit via
dynamic testing.
Recently, Bau et al. [4] and Doupe et al. [61] prepared comprehensive overviews on
commercial and academic black-box vulnerability scanners and their underlying
approaches.
7 Summary
In this chapter, we provided an overview of recent security testing techniques and their
practical application in context of a three-tiered business application. For this purpose, we
first summarized the required background on software testing and security engineering.
Testing consists of static and dynamic life cycle activities concerned with evaluation of
software products and related artifacts. It can be performed on the component, integration,
and system level. With regard to accessibility of test design artifacts white-box testing (ie,
deriving test cases based on design and code information) as well as black-box testing (ie,
relying only on input/output behavior of software) can be distinguished. Security testing
validates software system requirements related to security properties of assets that include
confidentiality, integrity, availability, authentication, authorization, and nonrepudiation.
Security requirements can be positive and functional, explicitly defining the expected
security functionality of a security mechanism, or negative and nonfunctional, specifying
what the application should not do. Due to the negative nature of many security
requirements and the resulting broad range of subordinate requirements, it is essential to
take testing into account in all phases of the secure software development life cycle (ie,
analysis, design, development, deployment as well as maintenance) and to combine
different security testing techniques.
For a detailed discussion of security testing techniques in this chapter, we therefore
classified them according to their test basis within the secure software development life
cycle into four different types: (1) model-based security testing is grounded on
requirements and design models created during the analysis and design phase, (2) codebased testing and static analysis on source and byte code created during development, (3)
penetration testing and dynamic analysis on running systems, either in a test or production
environment, as well as (4) security regression testing performed during maintenance.
With regard to model-based security testing, we considered testing based on architectural
and functional models, threat, fault and risk models, as well as weakness and vulnerability
models. Concerning, code-based testing and static analysis we took manual code reviews
as well as static application security testing into account. With regard to penetration
testing and dynamic analysis, we considered penetration testing itself, vulnerability
scanning, dynamic taint analysis, as well as fuzzing. Concerning security regression
testing, we discussed approaches to test suite minimization, test case prioritization, and
test case selection. To show how the discussed security testing techniques could be
practically applied, we discuss their usage for a three-tiered business application based on
a web client, an application server, as well as a database back end.
Overall, this chapter provided a broad overview of recent security testing techniques. It
fulfills the growing need for information on security testing techniques to enable their
effective and efficient application. Along these lines, this chapter is of value both for
researchers to evaluate and refine existing security testing techniques as well as for
practitioners to apply and disseminate them.
Acknowledgments
The work was supported in part by the research projects QE LaBLiving Models for
Open Systems (FFG 822740) and MOBSTECO (FWF P 26194-N15).
References
[1] Schieferdecker I., Grossmann J., Schneider M. Model-based security testing. In:
Proceedings 7th Workshop on Model-Based Testing. 2012.
[2] ISO/IEC. ISO/IEC 9126-1:2001 software engineeringproduct qualityPart 1:
quality model. 2001.
[3] ISO/IEC. ISO/IEC 25010:2011 systems and software engineeringsystems and
software quality requirements and evaluation (SQuaRE)system and software
quality models. 2011.
[4] Bau J., Bursztein E., Gupta D., Mitchell J. State of the art: automated black-box
web application vulnerability testing. In: 2010 IEEE Symposium on Security and
Privacy (SP). IEEE; 2010:332345.
[5] MITRE, Common vulnerabilities and exposures, http://cve.mitre.org.
[6] NIST. The economic impacts of inadequate infrastructure for software testing.
2002 (available at www.nist.gov/director/planning/upload/report02-3.pdf
[accessed April 7, 2015]).
[7] Bourque P., Dupuis R., eds. Guide to the Software Engineering Body of
Knowledge Version 3.0 SWEBOK. IEEE; 2014.
http://www.computer.org/web/swebok.
[8] Ammann P., Offutt J. Introduction to Software Testing. Cambridge, UK:
Cambridge University Press; 2008.
[9] ISTQB. Standard glossary of terms used in software testing. ISTQB; 2012
Version 2.2, Tech. Rep.
[10] OWASP Foundation, OWASP Testing Guide v4,
https://www.owasp.org/index.php/OWASP_Testing_Project (accessed March 11,
2015).
[11] Tian-yang G., Yin-sheng S., You-yuan F. Research on software security testing.
World Acad. Sci. Eng. Technol. 2010;69:647651.
[12] Bachmann R., Brucker A.D. Developing secure software: a holistic approach to
security testing. Datenschutz und Datensicherheit (DuD). 2014;38(4):257261.
[13] ISO/IEC. Information technologyopen systems interconnectionconformance
testing methodology and framework. 1994 (international ISO/IEC multi-part
standard No. 9646).
[14] Utting M., Legeard B. Practical Model-Based Testing: A Tools Approach. San
Francisco, CA: Morgan Kaufmann Publishers Inc. 2007.0123725011.
[15] Zander J., Schieferdecker I., Mosterman P.J. Model-Based Testing for Embedded
Systems. CRC Press; 2012;vol. 13.
[16] IEEE. IEEE standard glossary of software engineering terminology. Washington,
Reliab. 2011;21(1):5571.
[35] Pretschner A. Defect-based testing. In: IOS Press; 2015:. Dependable Software
Systems Engineering. http://www.iospress.nl/book/dependable-software-systemsengineering/.
[36] Zhu H., Hall P.A.V., May J.H.R. Software unit test coverage and adequacy. ACM
Comput. Surv. 0360-03001997;29(4):366427.
[37] Morell L.J. A theory of fault-based testing. IEEE Trans. Softw. Eng. 009855891990;16(8):844857.
[38] Pretschner A., Holling D., Eschbach R., Gemmar M. A generic fault model for
quality assurance. In: Moreira A., Schtz B., Gray J., Vallecillo A., Clarke P., eds.
Model-Driven Engineering Languages and Systems. Berlin: Springer; 978-3-64241532-687103. Lecture Notes in Computer Science. 2013;vol. 8107.
[39] Felderer M., Agreiter B., Zech P., Breu R. A classification for model-based
security testing. In: The Third International Conference on Advances in System
Testing and Validation Lifecycle (VALID 2011). 2011:109114.
[40] Schieferdecker I., Grossmann J., Schneider M. Model-based security testing. In:
Proceedings 7th Workshop on Model-Based Testing. 2012.
[41] Bchler M., Oudinet J., Pretschner A. Semi-automatic security testing of web
applications from a secure model. In: 2012 IEEE Sixth International Conference
on Software Security and Reliability (SERE). IEEE; 2012:253262.
[42] Mouelhi T., Fleurey F., Baudry B., Traon Y. A model-based framework for
security policy specification, deployment and testing. In: Proceedings of the 11th
International Conference on Model Driven Engineering Languages and Systems,
MoDELS 08, Toulouse, France. Berlin: Springer; 2008:978-3-540-87874-2537
552.
[43] Gerrard P., Thompson N. Risk-Based e-Business Testing. Artech House
Publishers; 2002.
[44] Felderer M., Schieferdecker I. A taxonomy of risk-based testing. Int. J. Softw.
Tools Technol. Transf. 2014;16(5):559568.
[45] Wendland M.-F., Kranz M., Schieferdecker I. A systematic approach to riskbased testing using risk-annotated requirements models. In: The Seventh
International Conference on Software Engineering Advances, ICSEA 2012.
2012:636642.
[46] Grossmann J., Schneider M., Viehmann J., Wendland M.-F. Combining risk
analysis and security testing. In: Springer; 2014:322336. Leveraging
Applications of Formal Methods, Verification and Validation. Specialized
Techniques and Applications.
[47] Botella J., Legeard B., Peureux F., Vernotte A. Risk-based vulnerability testing
using security test patterns. In: Springer; 2014:337352. Leveraging Applications
1998;15(1):4044.
[75] Felderer M., Katt B., Kalb P., Jrjens J., Ochoa M., Paci F., Tran L.M.S., Tun
T.T., Yskout K., Scandariato R., Piessens F., Vanoverberghe D., Fourneret E.,
Gander M., Solhaug B., Breu R. Evolution of security engineering artifacts: a
state of the art survey. Int. J. Secur. Softw. Eng. 2014;5(4):4897.
[76] Felderer M., Fourneret E. A systematic classification of security regression
testing approaches. Int. J. Softw. Tools Technol. Transf. 2015;115.
[77] Leung H.K.N., White L. Insights into regression testing (software testing). In:
Proceedings Conference on Software Maintenance 1989. IEEE; 1989:6069.
[78] Yoo S., Harman M. Regression testing minimisation, selection and prioritisation:
a survey. Softw. Test. Verif. Reliab. 2010;1(1):121141.
[79] Fourneret E., Cantenot J., Bouquet F., Legeard B., Botella J. SeTGaM:
generalized technique for regression testing based on UML/OCL models. In:
2014 Eighth International Conference on Software Security and Reliability
(SERE). 2014:147156.
[80] Tth G., Kszegi G., Hornk Z. Case study: automated security testing on the
trusted computing platform. In: Proceedings of the 1st European Workshop on
System Security, EUROSEC 08, Glasgow, Scotland. ACM; 2008:978-1-60558119-43539.
[81] He T., Jing X., Kunmei L., Ying Z. Research on strong-association rule based
web application vulnerability detection. In: 2nd IEEE International Conference
on Computer Science and Information Technology, 2009, ICCSIT 2009.
2009:237241.
[82] Garvin B.J., Cohen M.B., Dwyer M.B. Using feature locality: can we leverage
history to avoid failures during reconfiguration? In: Proceedings of the 8th
Workshop on Assurances for Self-adaptive Systems, ASAS 11, Szeged, Hungary.
ACM; 2011:978-1-4503-0853-32433.
[83] Huang Y.-C., Peng K.-L., Huang C.-Y. A history-based cost-cognizant test case
prioritization technique in regression testing. J. Syst. Softw. 016412122012;85(3):626637.
http://www.sciencedirect.com/science/article/pii/S0164121211002780 (novel
approaches in the design and implementation of systems/software architecture).
[84] Yu Y.T., Lau M.F. Fault-based test suite prioritization for specification-based
testing. Inf. Softw. Technol. 2012;54(2):179202.
http://www.sciencedirect.com/science/article/pii/S0950584911001947.
[85] Viennot N., Nair S., Nieh J. Transparent mutable replay for multicore debugging
and patch validation. In: Proceedings of the Eighteenth International Conference
on Architectural Support for Programming Languages and Operating Systems,
ASPLOS 13, Houston, Texas, USA. ACM; 2013:127138.
[86] Felderer M., Agreiter B., Breu R. Evolution of security requirements tests for
service-centric systems. In: Engineering Secure Software and Systems: Third
International Symposium, ESSoS 2011. Springer; 2011:181194.
[87] Kassab M., Ormandjieva O., Daneva M. Relational-model based change
management for non-functional requirements: approach and experiment. In: 2011
Fifth International Conference on Research Challenges in Information Science
(RCIS). 2011:19.
[88] Anisetti M., Ardagna C.A., Damiani E. A low-cost security certification scheme
for evolving services. In: 2012 IEEE 19th International Conference on Web
Services (ICWS). 2012:122129.
[89] Huang C., Sun J., Wang X., Si Y. Selective regression test for access control
system employing RBAC. In: Park J.H., Chen H.-H., Atiquzzaman M., Lee C.,
Kim T.-h., Yeo S.-S., eds. Advances in Information Security and Assurance.
Berlin: Springer; 978-3-642-02616-47079. Lecture Notes in Computer Science.
2009;vol. 5576.
[90] Hwang J., Xie T., El Kateb D., Mouelhi T., Le Traon Y. Selection of regression
system tests for security policy evolution. In: Proceedings of the 27th IEEE/ACM
International Conference on Automated Software Engineering. ACM; 2012:266
269.
[91] Vetterling M., Wimmel G., Wisspeintner A. Secure systems development based
on the common criteria: the PalME project. In: Proceedings of the 10th ACM
SIGSOFT Symposium on Foundations of Software Engineering, SIGSOFT
02/FSE-10, Charleston, South Carolina, USA. ACM; 2002:129138.
[92] Bruno M., Canfora G., Penta M., Esposito G., Mazza V. Using test cases as
contract to ensure service compliance across releases. In: Benatallah B., Casati F.,
Traverso P., eds. Service-Oriented ComputingICSOC 2005. Berlin: Springer;
978-3-540-30817-187100. Lecture Notes in Computer Science. 2005;vol. 3826.
[93] Kongsli V. Towards agile security in web applications. In: Companion to the 21st
ACM SIGPLAN Symposium on Object-Oriented Programming Systems,
Languages, and Applications, OOPSLA 06, Portland, Oregon, USA. New York,
NY: ACM; 2006:1-59593-491-X805808.
[94] Qi D., Roychoudhury A., Liang Z., Vaswani K. DARWIN: an approach to
debugging evolving programs. ACM Trans. Softw. Eng. Methodol.
2012;21(3):19:119:29.
[95] ISO/IEC. ISO/IEC 15408-1:2009 information technologysecurity techniques
evaluation criteria for IT securitypart 1: introduction and general model. 2009.
[96] Bjrner N., Jayaraman K. Checking cloud contracts in Microsoft Azure. In:
Proceedings of the 11th International Conference Distributed Computing and
Internet Technology ICDCIT 2015, Bhubaneswar, India, February 5-8, 2015.
2015:2132.
[97] Brucker A.D., Brgger L., Wolff B. Formal Firewall Conformance Testing: An
Application of Test and Proof Techniques. Softw. Test. Verif. Reliab.
2015;25(1):3471.
[98] Huang Y.-W., Yu F., Hang C., Tsai C.-H., Lee D.-T., Kuo S.-Y. Securing web
application code by static analysis and runtime protection. In: International
Conference on the World Wide Web (WWW), WWW 04, New York, NY, USA. New
York, NY: ACM; 2004:1-58113-844-X4052.
[99] Foster J.S., Fhndrich M., Aiken A. A theory of type qualifiers. SIGPLAN Not.
1999;34(5).
[100] Foster J.S., Terauchi T., Aiken A. Flow-sensitive type qualifiers. SIGPLAN Not.
2002;37(5).
[101] Jovanovic N., Kruegel C., Kirda E. Pixy: a static analysis tool for detecting web
application vulnerabilities (short paper). In: IEEE Symposium on Security and
Privacy, SP 06. Washington, DC: IEEE Computer Society; 2006:0-7695-25741258263.
[102] Jovanovic N., Kruegel C., Kirda E. Precise alias analysis for static detection of
web application vulnerabilities. In: Workshop on Programming languages and
analysis for security, PLAS 06, Ottawa, Ontario, Canada. New York, NY: ACM;
2006:1-59593-374-32736.
[103] Xie Y., Aiken A. Static detection of security vulnerabilities in scripting
languages. In: USENIX Security Symposium. 179192. 2006;vol. 15.
[104] Dahse J., Holz T. Static detection of second-order vulnerabilities in web
applications. In: Proceedings of the 23rd USENIX Security Symposium. 2014.
[105] Dahse J., Holz T. Simulation of built-in PHP features for precise static code
analysis. In: ISOC-NDSS. 2014.
[106] Wassermann G., Su Z. Static detection of cross-site scripting vulnerabilities. In:
ICSE 08, Leipzig, Germany. New York, NY: ACM; 2008:978-1-60558-0791171180.
[107] Minamide Y. Static approximation of dynamically generated web pages. In:
International Conference on the World Wide Web (WWW). 2005.
[108] Saxena P., Akhawe D., Hanna S., Mao F., McCamant S., Song D. A symbolic
execution framework for javaScript. In: IEEE Symposium on Security and
Privacy, SP 10. Washington, DC: IEEE Computer Society; 2010:978-0-76954035-1513528.
[109] Jin X., Hu X., Ying K., Du W., Yin H., Peri G.N. Code injection attacks on
HTML5-based mobile apps: characterization, detection and mitigation. In: 21st
ACM Conference on Computer and Communications Security (CCS). 2014.
[110] Fu X., Lu X., Peltsverger B., Chen S., Qian K., Tao L. A static analysis
framework for detecting SQL injection vulnerabilities. In: 31st Annual
[124] Lekies S., Stock B., Johns M. 25 million flows later-large-scale detection of
DOM-based XSS. In: ACM Conference on Computer and Communications
Security (CCS). 2013.
[125] Saxena P., Hanna S., Poosankam P., Song D. FLAX: systematic discovery of
client-side validation vulnerabilities in rich web applications. In: ISOC-NDSS.
The Internet Society; 2010.
[126] McGraw G. Software Security: Building Security In. Addison-Wesley
Professional; 2006.0321356705.
[127] Tripp O., Pistoia M., Fink S.J., Sridharan M., Weisman O. TAJ: effective taint
analysis of web applications. SIGPLAN Not. 0362-13402009;44:8797.
[128] Monate B., Signoles J. Slicing for security of code. In: TRUST. 2008:133142.
[129] WALA, T. J. Watson Libraries for Analysis, http://wala.sf.net.
[130] Hubert L., Barr N., Besson F., Demange D., Jensen T.P., Monfort V., Pichardie
D., Turpin T. Sawja: static analysis workshop for Java. In: FoVeOOS. 2010:92
106.
[131] Haller I., Slowinska A., Neugschwandtner M., Bos H. Dowsing for overflows: a
guided fuzzer to find buffer boundary violations. In: Proceedings of the 22nd
USENIX Conference on Security, SEC13, Washington, DC. Berkeley, CA:
USENIX Association; 2013:978-1-931971-03-44964.
[132] Mazzone S.B., Pagnozzi M., Fattori A., Reina A., Lanzi A., Bruschi D.
Improving Mac OS X security through gray box fuzzing technique. In:
Proceedings of the Seventh European Workshop on System Security, EuroSec 14,
Amsterdam, The Netherlands. New York, NY: ACM; 2014:978-1-4503-271522:12:6.
[133] Woo M., Cha S.K., Gottlieb S., Brumley D. Scheduling black-box mutational
fuzzing. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer &
Communications Security, CCS 13, Berlin, Germany. New York, NY: ACM;
2013:978-1-4503-2477-9511522.
[134] Lanzi A., Martignoni L., Monga M., Paleari R. A smart fuzzer for x86
executables. In: Third International Workshop on Software Engineering for
Secure Systems, 2007, SESS 07: ICSE Workshops 2007. 2007:7 7.
[135] Buehrer G., Weide B.W., Sivilotti P.A.G. Using parse tree validation to prevent
SQL injection attacks. In: Proceedings of the 5th International Workshop on
Software Engineering and Middleware, SEM 05, Lisbon, Portugal. New York,
NY: ACM; 2005:1-59593-205-4106113.
[136] Appelt D., Nguyen C.D., Briand L.C., Alshahwan N. Automated testing for SQL
injection vulnerabilities: an input mutation approach. In: Proceedings of the 2014
International Symposium on Software Testing and Analysis, ISSTA 2014, San
Jose, CA, USA. New York, NY: ACM; 2014:978-1-4503-2645-2259269.
[137] Wang J., Zhang P., Zhang L., Zhu H., Ye X. A model-based fuzzing approach for
DBMS. In: 2013 8th International Conference on Communications and
Networking in China (CHINACOM). Los Alamitos, CA: IEEE Computer Society;
2013:426431.
[138] Garcia R. Case study: experiences on SQL language fuzz testing. In:
Proceedings of the Second International Workshop on Testing Database Systems,
DBTest 09, Providence, Rhode Island. New York, NY: ACM; 2009:978-160558-706-63:13:6.
[139] Brubaker C., Jana S., Ray B., Khurshid S., Shmatikov V. Using frankencerts for
automated adversarial testing of certificate validation in SSL/TLS
implementations. In: Proceedings of the 2014 IEEE Symposium on Security and
Privacy, SP 14. Washington, DC: IEEE Computer Society; 2014:978-1-47994686-0114129.
[140] Banks G., Cova M., Felmetsger V., Almeroth K.C., Kemmerer R.A., Vigna G.
SNOOZE: toward a stateful netwOrk prOtocol fuzZEr. In: Proceedings of the 9th
International Conference Information Security ISC 2006, Samos Island, Greece,
August 30-September 2, 2006. 2006:343358.
[141] Tsankov P., Dashti M.T., Basin D.A. SECFUZZ: fuzz-testing security protocols.
In: 7th International Workshop on Automation of Software Test, AST 2012,
Zurich, Switzerland, June 2-3, 2012. 2012:17.
[142] Bertolino A., Traon Y.L., Lonetti F., Marchetti E., Mouelhi T. Coverage-based
test cases selection for XACML policies. In: 2014 IEEE Seventh International
Conference on Software Testing, Verification and Validation, Workshops
Proceedings, March 31 - April 4, 2014, Cleveland, Ohio, USA. 2014:1221.
[143] Brucker A.D., Brgger L., Kearney P., Wolff B. An approach to modular and
testable security models of real-world health-care applications. In: ACM
symposium on Access Control Models and Technologies (SACMAT), Innsbruck,
Austria. New York, NY: ACM Press; 2011:978-1-4503-0688-1133142.
[144] Martin E. Testing and analysis of access control policies. In: 29th International
Conference on Software EngineeringCompanion, 2007, ICSE 2007. 2007:75
76.
[145] Rogers R., Rogers R. Nessus Network Auditing. second ed. Burlington, MA:
Syngress Publishing; 2008 ISBN 9780080558653, 9781597492089.
Michael Felderer is a senior researcher and project manager within the Quality
Engineering research group at the Institute of Computer Science at the University of
Innsbruck, Austria. He holds a Ph.D. and a habilitation in computer science. His research
interests include software and security testing, empirical software and security
engineering, model engineering, risk management, software processes, and industryacademia collaboration. Michael Felderer has coauthored more than 70 journal,
conference, and workshop papers. He works in close cooperation with industry and also
transfers his research results into practice as a consultant and speaker on industrial
conferences.
Martin Johns is a research expert in the Product Security Research unit within SAP SE,
where he leads the Web application security team. Furthermore, he serves on the board of
the German OWASP chapter. Before joining SAP, Martin studied Mathematics and
Computer Science at the Universities of Hamburg, Santa Cruz (CA), and Passau. During
the 1990s and the early years of the new millennium he earned his living as a software
engineer in German companies (including Infoseek Germany, and TC Trustcenter). He
holds a Diploma in Computer Science from University of Hamburg and a Doctorate from
the University of Passau.
Achim D. Brucker is a research expert (architect), security testing strategist, and project
lead in the Security Enablement Team of SAP SE. He received his masters degree in
computer science from University Freiburg, Germany and his Ph.D. from ETH Zurich,
Switzerland. He is responsible for the Security Testing Strategy at SAP. His research
interests include information security, software engineering, security engineering, and
formal methods. In particular, he is interested in tools and methods for modeling, building,
and validating secure and reliable systems. He also participates in the OCL standardization
process of the OMG.
Ruth Breu is head of the Institute of Computer Science at the University of Innsbruck,
leading the research group Quality Engineering and the competence center QE LaB. She
has longstanding experience in the areas of security engineering, requirements
engineering, enterprise architecture management and model engineering, both with
academic and industrial background. Ruth is coauthor of three monographs and more than
150 scientific publications and serves the scientific community in a variety of functions
(eg Board Member of FWF, the Austrian Science Fund, Member of the NIS Platform of
the European Commission).
CHAPTER TWO
Abstract
This chapter gives an overview of the field of model-based testing (MBT), particularly the recent advances in the
last decade. It gives a summary of the MBT process, the modeling languages that are currently used by the various
communities who practice MBT, the technologies used to generate tests from models, and discusses best practices,
such as traceability between models and tests. It also briefly describes several findings from a recent survey of
MBT users in industry, outlines the increasingly popular use of MBT for security testing, and discusses future
challenges for MBT.
Keywords
Model-based testing; Modeling approaches; Test generation Technology; Security testing
1 Introduction
Broadly speaking, model-based testing (MBT) is about designing tests from some kind of
model of the system being tested and its environment. In this sense, all test design is based
on some mental model, so could perhaps be called model-based testing. But it is common,
and more useful, to use the term model-based testing to refer to:
more formal models (expressed in some machine-readable, well-defined, notation);
more formal test generation (we are interested in test generation algorithms that are
automatic, or are capable of being automated);
and more automated execution (the generated tests must be sufficient precise that they
are capable of being executed automatically).
Testing is an important, but painful and costly, part of the software development
lifecycle. So the promise, or hope, of MBT is that if we can only obtain a model from
somewhere (preferably at zero cost), then all those tests will be able to be generated
automatically, and executed automatically, in order to find all the faults in the system, at
greatly reduced cost and effort.
That is obviously a silver bullet, a dream that cannot be true. The truth about MBT lies
somewhere between that dream, and the other extreme: a pessimistic dismissal that it
could be of no help whatsoever. This chapter aims to shine some light on the current
reality of MBT, the range of practices, the use of MBT in industry, some of the recent
MBT research and tool advances that have happened in the last decade, and new
application areas where MBT is being applied.
We first set the scene with an overview of MBT: the process, the people, the range of
MBT practices, and a brief history. Then in Section 3 we discuss current usage of MBT,
particularly in industry, in Section 4 we discuss recent advances in the languages used for
the test models, in Section 5 we review recent advances in the test generation
technologies, in Section 6 we discuss the use of MBT for security testing, which is a
recent growth area in the use of MBT, and finally we conclude and discuss future
challenges for MBT.
2 MBT Overview
MBT refers to the process and techniques for the automatic derivation of test cases from
models, the generation of executable scripts, and the manual or automated execution of the
resulting test cases or test scripts.
Therefore, the key tenets of MBT are the modeling principles for test generation, the
reusability of requirements models, the test selection criteria, the test generation strategies
and techniques, and the transformation of abstract tests into concrete executable tests.
The essence of MBT is to bridge the domain and product knowledge gap between the
business analysts and test engineers. Models are expected to be true representations of
business requirements and to associate those requirements with the different states that the
product will take as it receives various inputs. Ideally, the models will cover all of the
business requirements and will be sufficiently complete to thus ensure near 100%
functional coverage.
Table 1
Terminology Glossary of Model-Based Testing Terms Following ISTQB
Software Resting Glossary of Rerms v3.0
Term
MBT model
Model coverage
Offline MBT
Online MBT
Model-based
testing
Test adaption
layer
Test model
Test selection
criteria
Definition
Any model used in model-based testing.
The degree, expressed as a percentage, to which model elements are planned to be or have been exercised by a test suite.
Model-based testing approach whereby test cases are generated into a repository for future execution.
Model-based testing approach whereby test cases are generated and executed simultaneously.
Testing based on or involving models.
The layer in a test automation architecture that provides the necessary code to adapt test scripts on an abstract level to the various components,
configuration or interfaces of the SUT.
A model describing testware that is used for testing a component or a system under test.
The criteria used to guide the generation of test cases or to select test cases in order to limit the size of a test.
1. Designing models for test generation. The models, generally called MBT models,
represent the expected behavior and some process workflow of the system under test
(SUT), in the context of its environment, at a given abstraction level. The purpose of
modeling for test generation is to make explicit the control and observation points of the
system, the expected dynamic behavior or workflows to be tested, the equivalence classes
of system states, and the logical test data. The model elements and the requirements can be
linked in order to ensure bidirectional traceability between the three main artifacts: the
requirements, the MBT model, and the generated test cases. MBT models must be precise
and complete enough to allow the automated derivation of tests from these models.
2. Selecting some test selection criteria. There are usually an infinite number of possible
tests that can be generated from an MBT model, so the test analyst has to apply some test
selection criteria to select the pertinent tests, or to ensure satisfactory coverage of the
system behaviors. One common kind of test generation criteria is based on structural
model coverage, using well-known test design strategies [1], for instance equivalence
partitioning, process cycle coverage or pairwise testing. Another useful kind of test
strategy ensures that the generated test cases cover all the requirements and business
processes, possibly with more tests generated for requirements and processes that have a
higher level of risk. In this way, MBT can be used to implement a risk and requirementsbased testing approach. For example, for a noncritical application, the test analyst may
choose to generate just one test for each of the nominal behaviors in the model and each of
the main error cases; but for the more critical requirements, the test analyst could apply
more demanding coverage criteria such as process cycle testing, to ensure that the
businesses processes associated with that part of the MBT models are more thoroughly
tested.
3. Generating the tests. This is a fully automated process that generates the required
number of test cases from the MBT models on the basis of the test selection criteria
configured by the test analyst. Each generated test case is typically a sequence of SUT
actions, with input parameters and expected output values for each action. These
generated test sequences are similar to the test sequences that would be designed manually
using an action-word approach [2]. They are easily understood by humans and are
complete enough to be directly executed on the SUT by a manual tester. The purpose of
automated test generation is to generate fully complete and executable tests: MBT models
should make it possible to compute the input parameters and the expected results. Data
tables may be used to link abstract values from the model with concrete test values. To
make the generated tests executable, a further phase automatically translates each abstract
test case into a concrete (executable) script, using a user-defined mapping from abstract
data values to concrete SUT values, and a mapping from abstract operations to GUI
actions or API calls of the SUT. For example, if the test execution is via the GUI of the
SUT, then the action words could be linked to the graphical object map using a test robot.
If the test execution of the SUT is API based, then the action words need to be
implemented on this API. This could be a direct mapping or a more complex automation
layer. The expected results for each abstract test case are translated into oracle code that
will check the SUT outputs and decide on a test pass/fail verdict. The tests generated from
MBT models may be structured into multiple test suites, and published into standard test
management tools. Maintenance of the test repository is done by updating the MBT
models, then automatically regenerating and republishing the test suites into the test
management tool.
4. Executing manually or automatically the tests. The generated tests can be executed
either manually or in an automated test execution environment. Either way, the result is
that the tests are executed on the SUT, and that tests either pass or fail. The failed tests
indicate disparity between the actual SUT results and the expected ones, as designed in the
MBT models, which then need to be investigated to decide whether the failure is caused
by a bug in the SUT, or by an error in the model and/or the requirements. Experience
shows that MBT is good at finding SUT errors, but is also highly effective at exposing
requirements errors [3, 4], even before executing a single test (thanks to the modeling
phase). In the case of automated test execution, test cases can be executed either offline,
most commonly used, or online. With offline execution, the test cases are first generated,
and then in a second step, they are executed on the system under test. With online
execution, the test execution results influence the path taken by the test generator through
the model, so test case generation and execution are combined into one step.
This process is highly incremental and helps to manage test case life cycle when the
requirements change. MBT generators are able to manage the evolution of the test
repository with respect to the change in the requirements that have been propagated to the
test generation model.
In the MBT process, the test repository documentation is fully managed by the
automated generation (from MBT models): documentation of the test design steps,
requirements traceability links, test scripts and associated documentation are automatically
provided for each test case. Therefore, the maintenance of the test repository is done only
through the maintenance of MBT models and then regeneration from these models.
A key element of the added value of MBT is the automation of bidirectional traceability
between requirements and test cases. Bidirectional traceability is the ability to determine
links between two parts of the software development process. The starting point of the
MBT process is the various functional descriptions of the tested application, such as use
cases, functional requirements, and descriptions of business processes. To be effective,
requirements traceability implies that the requirements repository is structured enough so
that each individual requirement can be uniquely identified. It is desirable to link these
requirements to the generated tests, and to link each generated test to the requirements that
it tests.
A best practice in MBT, provided by most of the tools on the market, is to link the
model elements to the related test requirements. These links in the MBT models enable the
automatic generation and maintenance of a traceability matrix between requirements and
test cases.
1. Business analysts (or subject matter experts) are the reference persons for the SUT
requirements, business processes and business needs. They refine the specification and
clarify the testing needs based on their collaboration with the test analysts. In agile
environments, they contribute in definition and discussion of user stories and attend sprint
meetings to make sure that the evolving user stories are properly developed in the models.
Their domain knowledge and experience allow them to easily understand dependencies
between different modules and their impact on the MBT models and to provide useful
input to test analysts during reviews of MBT models.
2. Test analysts design the MBT models, based on interaction with customers and business
analysts or subject matter experts. They use the test generation tool to automatically
generate tests that satisfy the test objectives and produce a repository of tests. Test analysts
are also in charge of reviewing the manual test cases generated through models and
validating the correctness and coverage of the tests.
3. Test engineers (or testers) are in charge of manual execution of tests, relying on the
available information in the test repository, which is generated by the test analysts based
on MBT models.
4. Test automation engineers are in charge of the automated execution of tests, by linking
the generated tests to the system under test. The input for the test automation engineers is
the specification of the adaptation layer and action words, defined in the test generation
model and to be implemented. This is delivered by the test analysts.
Test analysts are in charge of the test repository quality, which concerns the
requirements coverage and the detection of defects. On the one hand, they interact with
the subject matter experts, which makes the quality of their interaction crucial. On the
other hand, the test analysts interact with the testers in order to facilitate manual test
execution or with the test automation engineers to facilitate automated test execution
(implementation of keywords). This interaction process is highly iterative.
The input-only models have the disadvantage that the generated tests will not be able to
act as an oracle and are incapable of verifying the correctness of the SUT functional
behavior. Input models can be seen as models of the environment. Pure business process
models that represent some user workflows (but not the expected behavior) are a
prominent example; Domain modeling with combinatorial algorithms such as pairwise, is
another one. Generated tests from input-only models are incomplete and must be manually
completed before execution.
Inputoutput models of the SUT not only model the allowable inputs that can be sent to
the SUT, but must also capture some of the intended behavior of the SUT. That is, the
model must be able to predict in advance the expected outputs of the SUT for each input,
or at least be able to check whether an output produced by the SUT is allowed by the
model or not. Inputoutput models make it possible to automatically generate complete
tests, including input parameters and expected results for each step.
Table 3
Determinist/Nondeterminist Models Characteristics
Nondeterminism can occur in the model and/or the SUT. If the SUT exhibits hazards in
the time or value domains, this can often be handled when the verdict is built (which
might be possible only after all input has been applied). If the SUT exhibits genuine
nondeterminism, as a consequence of concurrency, for instance, then it is possible that test
stimuli as provided by the model depend on prior reactions of the SUT. In these cases, the
nondeterminism must be catered for in the model, and also in the test cases (they are not
sequences anymore, but rather trees or graphs).
Table 4
Modeling Paradigm Characteristics
There are two main subcharacteristics to consider, which defines two different families
of test selection criteria (Table 6):
Table 6
Test Selection Criteria Characteristics
selection criteria with respect to model changes. The second kind of test selection criteria
is more precise because each scenario can be explicitly defined, but, because the definition
is done scenario by scenario, scenario-based selection criteria are more fragile with respect
to model evolution.
These two kinds of test selection criteria are further refined by more precise attributes,
particularly coverage-based selection criteria, which refer to a large set of coverage
criteria such as transition-based coverage, data-flow coverage, decision coverage and data
coverage. For the scenario-based selection criteria, the attribute is the language to express
scenarios. An example of such a language is UML sequence diagrams.
In the case of automated generation of manual test cases, this means that the models for
test generation should include one way or another the documentation of abstract
operations and attributes. Then, the test generator has to manage the propagation and
adaptation of this information in the generated tests.
Table 8
Test ExecutionOffline/Online Characteristics
With online MBT, the test generation algorithms can react to the actual outputs of the
SUT. This is sometimes necessary if the SUT is nondeterministic, so that the test generator
can see which path the SUT has taken, and follow the same path in the model. In that case,
MBT tools interact directly with the SUT and test it dynamically.
Offline MBT means that test cases are generated strictly before they are run. The
advantages of offline testing are directly connected to the generation of a test repository.
The generated tests can be managed and executed using existing test management tools,
which means that fewer changes to the test process are required. One can generate a set of
tests once, then execute it many times on the SUT (eg, regression testing). Also, the test
generation and test execution can be performed on different machines or in different
environments, as well as at different times. Moreover, if the test generation process is
slower than test execution, which is often the case, then there are obvious advantages to
doing the test generation phase just once.
partitioning or boundary value analysis [4] and also combinatorial test generation
techniques such as pairwise techniques [8]. Moreover, the principles of automated
bidirectional traceability between requirements and tests were set up at this time.
Early 2000s saw the emergence of MBT as a regular testing practice in the software
industry. This implied integrating MBT with key industry standards such as UML and
BPMN for the modeling phase, and with the industrial test management and test execution
environments to set up a continuous and systematic test engineering process. At the same
time, the methodology of MBT was being studied in order to clarify the best practices to
go from requirements to models for test generation, for managing the test generation
process and to help quality assurance (QA) teams to adopt MBT. This decade was also the
period of life-size pilot projects and empirical evidence analysis to confirm the efficiency
of MBT practice on testing projects (eg, [3, 4, 9]).
MBT is now in adoption phase in the software testing industry. The next section
provides some evidence about the penetration of MBT in industry and the application
domains.
early for a definitive answer on this point, but another question of the 2014 MBT User
Survey provides us with some information on the level of satisfaction obtained by the
respondents.
The results show that, for the majority of respondents, MBT generally fulfils their
expectations, and they therefore get the value they are looking for.
This positive vision is consistent with the answers to the next question on how effective
has MBT been in their situation (see Fig. 7). A majority of respondents viewed MBT as a
useful technology: 64% found MBT moderately or even extremely effective, whereas only
13% rated the method as ineffective. More than 70% of the respondents stated that it is
very likely or even extremely likely that they will continue with the method.
Finally, we may have a look at the current distribution of MBT across different areas of
industry, as shown in Fig. 8. Nearly 40% of the respondents come from the embedded
domain. Enterprise IT accounts for another 30%, web applications for roughly 20%. Other
application domains for the SUT are software infrastructure, communications, and
gaming. The main lesson learned is that MBT is distributed over the main areas of
software applications, with an overrepresentation in the embedded domain.
FIGURE 8 What is the general application domain of the system under test?
sponsored by the ETSI (European Telecom Standard Institute). The conference started its
first edition in 2010. In 2013, the conference has been renamed UCAAT (User Conference
on Advanced Automated Testing) to broaden the subject area, but it is still a conference
consisting for the most part of experience reports on MBT applications. This is a
professional conference with more than 80% of people coming from industry, and between
160 and 220 attendees each year, depending mainly on the location (due to the attraction
for local people).
Table 10 provides the list of application areas covered during each year of the
conference. This confirms the large scope of applications of model-based techniques and
tools. The presentations are available on each conference web site (see URL).
Table 10
MBT Application Areas
These conferences are also good showcases of the diversity of MBT deployment. They
show the variety of MBT approaches, in the way that MBT test models are obtained (for
instance by reusing existing models or developing specific test models), the nature of such
models (variety of languages and notations), and also the way generated tests are executed
(offline test execution or online test execution).
Table 11
Example of Language
In the next section, we describe a higher level notation for MBT models, which is
further away from the application implementation level.
dependencies between events, so that longer test sequences can be generated to exercise
those dependencies. This generates fewer test sequences, and longer test sequences, but
matches or improves the error detection rates of pure black-box GUI ripping. On the other
hand, the EXSYST test generator for interactive Java programs [48] uses completely
different technology (genetic algorithms to search through the massive search space of all
event sequences) to generate GUI tests that have high code coverage of the application. It
does this by monitoring the branch coverage of the underlying application as the GUI tests
are executed, and using that coverage level as the fitness function for the genetic search.
Although these two tools use completely different technologies, they both use the
technique of observing the execution of the application code (white-box), and connecting
those observations to the black-box model of the GUI input events in order to generate
smarter and longer input event sequences that can find more GUI faults.
The second approach to improving the depth of the generated test sequences is to base
test generation on common user-interface patterns [49], such as the click-login; enterusername; enter-password; click-submit sequence, followed by the two expected
outcomes of either an invalid login message, or correct login as shown by transfer to
another page. Such use of patterns goes some way towards incorporating more of the
application semantics into the MBT model, which enables richer tests to be generated. A
complementary approach to improving the semantic richness of the GUI model is to infer
specifications and invariants of the GUI from the traces of GUI executions. This is a
challenging task given the infinite space of possible traces, but for example, AutoInSpec
[50] does this in a goal-directed way, guided by coverage criteria and by a test suite repair
algorithm, and thus finds more invariants than nongoal-directed approaches.
An example of an approach that uses MBT models with even deeper semantic
knowledge of the application is the model-driven testing of web applications by Bolis et
al. [51]. This uses models written as sequential nets of abstract state machines (ASM).
The models may be developed specifically for testing purposes, or may be adapted from
the high-level development models of the application. Either way, the development of
such models requires application knowledge and modeling expertise, but the benefit of
having the richer models is that they can generate more comprehensive tests that include
stronger oracles.
Finally, all these different approaches to automated or semiautomated GUI testing raise
the interesting question: which approach is best? Lelli et al. [52] have made steps towards
answering this question by developing a fault model for GUI faults, validating this against
faults found in real GUIs, and developing a suite of GUI mutants that can be used to
evaluate the effectiveness of GUI testing tools.
is a tool that can determine whether there is a solution satisfying a propositional formula
represented by a Boolean expression (ie, written only with Boolean variables, parentheses
and operators of disjunction, conjunction, and negation). The central algorithm around
which the SAT provers are built is called the DavisPutnamLogemannLoveland
algorithm [56], DPLL for short. It is a complete solving algorithm based on propositional
logic formulae in Conjunctive Normal Form and using a backtracking process.
An SMT problem is expressed as a logical first-order formula combining different
theories such as equality, linear arithmetic, and bit-vectors. In this way, an SMT problem
is a generalization of a SAT problem where the Boolean variables are replaced by
predicates using different theories. An SMT prover is thus based on a SAT prover, which
is used to solve the logical formulas of first order on a particular theory by using a
dedicated decision procedure. As a consequence, an SMT prover attempts to determine if
an SMT problem has a solution, and, if a solution exists, returns that solution in the form
of a valuation of each of the variables.
Nowadays, the most popular SMT solvers are Z3 [57], CVC3 [58], CVC4 [59], Yices
[60] and MathSAT5 [61]. Z3 is an SMT prover developed by Microsoft Research and is
part of professional tools such as Visual Studio via the white box code analyzer Pex [62].
CVC3, (Cooperating Validity Checker 3), and its successor CVC4 are academic provers
developed jointly by the Universities of New York and Iowa. Yices is developed by SRI
Internationals Computer Science Laboratory, and finally, MathSAT5 is the latest version
of the MathSat prover, which has been jointly developed by the University of Trento and
the FBK-IRST.
For test generation purposes, all the elements of the model such as conditions,
assignments and structures (eg, ifthenelse) are translated into first-order predicates and
conjoined. The resulting formula, called an SMT instance, describes all the instructions
that should be executed along a test case execution path, including the global sequence of
SUT operation calls and the local choices within each of those operations. The satisfying
instances of this formula correspond to valid input values that enable a desired test case
sequence to be activated. This approach can be used to compute input data to activate a
particular path in a targeted operation, as well as to provide sequences of operation calls.
For example, Cantenot et al. [63] describe a test generation framework, based on SMT
solving, from UML/OCL model. This framework is able to translate the UML/OCL model
into SMT instances, and various simulation strategies are applied on these SMT instances
to compute test sequences. Their paper proposes and compares five different strategies to
build the first order formulas used by an SMT instance. Other researchers propose
optimizations to improve the test generation performance, including optimizations that
exploit the features of the SAT/SMT prover, as well as the form of the first order formulas
[64]. These experiments promise to make SAT/SMT test generation techniques as efficient
as other existing methods, especially when dealing with Boolean expressions.
However, as underlined by Cristi and Frydmans experimental feedback about test case
generation from Z specifications using the two SMT provers Yices and CVC3 [65], the
main weakness regarding the use of SMT provers for test generation purpose, is currently
the lack of native decision procedures for the theory of sets. This implies that the
representation of sets must be mapped onto some other mathematical structure (such as
uninterpreted functions, arrays, lists, etc.), which can result in severely below-average
performance. This observation has been recently investigated and confirmed by Cristi,
Rossi and Frydman [66], where experiments have shown that CLP solvers with set
constraints can be more effective and efficient than SMT provers for these kinds of
problems. The next section introduces such constraint-based approaches.
To illustrate this kind of approach, we can cite [71], in which the authors present an
original approach and experimental results about using constraint solving to compute
functional test cases for controllers for robotic painters. The testing was done within a
continuous integration process, so the test generation and execution needed to be fast. The
lessons learnt from this industrial experimentation showed that the testing strategy,
implemented using the finite domain constraint solving library CLP(FD) [72] of SICStus
Prolog, is faster and more effective than current test methodologies currently used in the
company, even if this strategy does not ensure a complete coverage (not every possible
transition) of the behaviors formalized in the model.
Other work [66], used the {log} solver [73, 74] to generate test cases from specifications
written in the Z notation, which is based on first-order logic over a set theory. Indeed, to
handle the Z notation, the {log} solver is able to manipulate and compute constraint
solving on sets using native set structures and primitive operations. As such, it can find
solutions of first-order logic formulas involving set-theoretic operators, translated from Z
specifications to {log}s predicates. The feedback of this experiments showed that such a
CLP solver over sets is able to tackle two common problems within MBT:
the elimination of unsatisfiable test objectives, built by partitioning the state space and
collecting (dead) path conditions;
improvement of the computational effectiveness of the reachability problem of finding
states verifying the satisfiable test objectives.
However, regarding scalability issues, constraint solvers are not able to reason about a
programs environment and may be less scalable with large-scale specification that include
a high number of variables linked by complex constraints. Moreover, when a CSP has
several solutions, a CLP solver is not able to prioritise them, so just returns the first
solution found. Hence, the desirability of the solution may be increased by the use of an
objective function. Innovative solutions to respond to this CLP challenge have been
addressed by the search-based research and practitioners community. We discuss its
challenges in a model-based framework in the next section.
These basic principles were adapted early on to perform model-based test generation
[89]. Basically, in this context, the automaton plays the role of MBT model whereas the
temporal logic property defines a particular criterion to be covered. More precisely, the
temporal logic property is expressed as the negation of a given test objective: in this way,
when the model checker finds a state violating the property (ie, satisfying the test
objective), it will return the related counterexample, which thus constitutes a test case
covering the given test objective [90].
On the basis of this simple but efficient interpretation, the next challenge is forcing the
model checker to find all the possible counterexamples in order to achieve a given
coverage criterion of the automaton. This can be done by generating a separate temporal
logic property (test objective) for each state that satisfies the coverage criterion, and then
running the model checking on each one of those test objectives. Hence, at the end of the
whole process, each computed counterexample defines a specific test case, including
expected output to assign the test verdict.
Obviously, test generation techniques based on model checking have benefited from the
increasing performance of model checkers and from the improved expressiveness of the
automata and properties to be verified. For example, some model checkers are able to
manipulate timed automata [91], like UPPAAL [92], or use a dedicated language to avoid
describing a model directly, like Maude [93].
To illustrate such MBT approaches based on model checking, we can for example
mention the tool-supported methodology proposed in [94]. It consists of translating an
initial MBT model, expressed as a Business Process Model, into an algebraic Petri net in
order to generate test cases using model checking techniques on an equivalent decision
diagram. The test generation is driven by dedicated test intentions, which generate test
cases including their oracles according to the transition system. More recently, another
toolbox [95], based on UPPAAL, has been developed for generating test cases by applying
model checking techniques on networks of timed automata. Finally, it should be noted
that, based on the UPPAAL model checker, an online testing tool, called UPPAAL for
Testing Real-time systems ONline (UPPAAL TRON) [96], supports the generation of test
cases from models, and their online execution on the SUT.
Recently, authors in [97] combined formal specification and MBT approaches and
evaluated their approach through the EnergyBus standard. They used the TGV tool [98]
for verifying formal specification inconsistencies. They further constructed an MBT
platform for conformance testing of EnergyBus implementations. This combination of
tools and approaches has been applied for the first time in a mandatory step of a new
industrial standard introduction, for instance EnergyBus.
Contrary to the TGV tool, which uses dedicated algorithms for test case generation, the
authors in [99] expressed the system and its properties as boolean equations, and then used
an equation library to check the equations on-the fly. The model checker used on a model
with faults produces counter examples, seen as negative abstract test cases.
These technologies, using model-checking techniques to derive test cases, appear to be
efficient and well-used solutions to automating MBT approaches. However, since model
checkers are not natively devoted to test case generation, these techniques suffer from a
lack of improvements with regard to test suite quality and performance [100], such as the
native implementation of test coverage strategies. Moreover, they also suffer from the
major weakness of model-checking techniques: state space explosion. Indeed, state space
explosion remains a critical problem, even if some tools are now capable of handling the
state spaces associated with realistic problems. Nevertheless, the performance of model
checkers regularly increases due to innovative approaches, and so it makes MBT
approaches more and more scalable. Among these innovative approaches, we can mention
symbolic model checking [101], which allows the representation of significantly larger
state spaces by using ordered binary decision diagrams to represent sets of states and
function relations on these states. Finally, bounded model checking [102] is also a very
relevant approach for test generation [103]. It aims to accelerate the generation of
counterexamples by translating the model-checking problem into a SAT problem, but to
achieve this, it does not perform an exhaustive verification.
model elements, whereas dynamic selection criteria relate to the dynamic aspects of the
system, for instance using the experts experience.
In MBFST, most of the techniques based on static test selection criteria focus on
access-control policy testing. For instance Le Traon et al. defined new structural test
selection criteria for access control policies [108] and test generation based on access
control models [109]. Further, they have presented an approach for test generation using a
combinatorial testing technique by combining roles, permissions and contexts [110].
Recently, they worked on a tool-supported process for building access-control MBT
models from contracts and access-rules [111].
Other work based on static selection criteria focuses in general on privacy properties.
Anisetti et al. express the privacy properties of a service (P-ASSERT) and generate test
cases based on service models, which is used further in their certification scheme for
digital privacy of services [112].
Experience in industry showed that static test selection criteria cannot cover high-level
security properties, so to ensure the security of a critical application it is necessary to use
dynamic criteria in order to produce tests that cover such properties. In general, the test
scenarios are expressed in a dedicated language that can be either textual or graphical,
describing the sequences of steps (usually operation calls) that can be performed, along
with possible intermediate states reached during the unfolding of the scenario. We now
briefly describe several approaches that use dynamic test selection criteria.
Mallouli et al. provided a formal approach to integrate timed security rules, expressed
in the Nomad language, into a TEFSM functional specification of a system. Then they use
the TestGen-IF tool to generate test cases, which are later executed on the system using
tclwebtest scripts [113].
Another MBFST approach is the one proposed by Julliand et al. [114]. They generate
test cases based on B-models and use dynamic test selection criteria (also called test
purposes) for producing test objectives, represented as regular expressions.
Legeard et al., for security testing of cryptographic components, use an approach based
on dynamic test selection criteria: Test Purpose (TP) [115]. The TP language allows one to
express high-level scenarios based on security expert experience and it has been
successfully deployed and used at the French army computer science department.
Moreover, Cabrera et al. have created a textual language for expressing high-level user
scenarios that takes into account the temporal aspect of the security properties that are
written in Temporal OCL (TOCL) [116]. TOCL has been successfully applied for
Common Criteria evaluation. These languages, TP and TOCL, are both integrated within
the Smartesting CertifyIt Tool and allow users to guide the generation of functional
security tests. We discuss these approaches in detail in the last section.
[117]. In the scope of MBFST, Pellegrino et al. use a model checker for ASLan and
ASLan++4 to generate abstract test cases as counter examples for the security properties.
To create concrete test cases executable on the system under test they use a test adapter
[118].
Jrjens presented an approach to generate traces by injecting faults into UML models,
and UMLsec stereotypes, used for verification of properties. Furthermore, Fourneret et al.
applied the UMLsec verification technique for security properties in the domain of a smart
card industry, and then based on their transformation into a Test Purpose Language they
generate tests covering the security properties [119].
Another approach proposed by Aichernig et al. uses a model checking technique on an
Input/Output Label Transition System (IOLTS) model to generate test cases. They further
inject faults using mutation operators, and generate traces used as test objectives for the
TGV tool [120]. Thus, we can also classify this as a robustness technique.
The DIAMONDS project introduces novel techniques in the research field of modelbased security testing and particularly in fuzzing [105]. Shieferdecker et al. designed a
mutation-based fuzzing approach that uses fuzzing operators on scenario models specified
by sequence diagrams. The fuzzing operators perform a mutation of the diagrams resulting
in an invalid sequence. Contrary to the previous work, where tests are executed after their
generation, referred as offline testing (see Section 2), Shieferdecker et al. use online
behavioral fuzzing, which generates tests at run time [125].
Another work by Johansson et al., close to the DIAMONDS project, developed T-Fuzz,
a generation-based fuzzing framework for protocol implementation testing based on the
TTCN-3 language [126]. This approach relies on protocol models used for conformance
testing, by reusing the already existing test environment. In addition, they present its
successful application to the validation of the Non-Access Stratum (NAS) protocol, used
in telecommunication networks for carrying signaling messages.
reveal more complex weaknesses than single-purpose test cases that cover functional
requirements.
More precisely, the tool helps to uncover the potentially dangerous behaviors resulting
from the interactions of the application with the security component. Then, in addition to
functional testing that targets the coverage of the systems behavior, it supports testing of
the security requirements for the system, by combining two dynamic test selection criteria
(based on TOCL [116] and Test Purposes (TP) [115]) for generation of test targets and
tests that cover the security requirements. The TP language is based on regular
expressions, and by combining keywords it allows the test engineers to conceive scenarios
in terms of states to be reached and operations to be called [115]. The TOCL language is
based on temporal logic and it allows the expression of temporal properties that are
composed of two artifacts: a temporal pattern and a scope. The scopes are defined from
events and delimit the impact of the pattern. To define the sequences appropriate for
execution, the patterns are applied on a scope and they are defined from event and state
properties, expressed by OCL constraints [116].
These TOCL and TP approaches are complementary, since they cover different types of
security requirements. On one hand, TP covers security requirements that are required to
express specific application scenarios, and they are not able to express the temporal
aspects of precedence or succession of events. On the other hand, with TOCL it is possible
to capture these temporal aspects in the time axis. However, in both cases, the generated
security tests exercise as much as possible the unusual interactions with the security
component.
The Smartesting CertifyIt tool further monitors the test coverage of the security
requirements expressed in TOCL, and generates new test cases if it is necessary to
increase the coverage [116]. Finally, the tool generates a coverage report that ensures the
traceability between the specification, the security requirements and the generated tests, so
that that coverage report can be used in a product certification. Fig. 9 depicts the TOCL
plugin integrated within the CertifyIt tool and shows its features for coverage monitoring
and report generation.
To illustrate the approach based on TOCL and the tool, we use the specification of
PKCS#11. PKCS#11 defines the API Cryptoki that offers an interface for managing the
security and interoperability of security components. The specification defines various
security requirements for which we were able to generate test cases, for example: A user
cannot verify a signed-message using the C_Verify operation without login to Cryptoki
(using the operation C_Login). From a testing perspective, this requirement is interpreted
as the user must call a C_Login operation before calling C_VerifyInit, which initiates the
verification function. The TOCL language allows this requirement to be expressed by two
properties: one that defines the nominal case and a second complementary property that
defines the flawed case.
The first property defines whenever a verification function is performed with success
(model behavior @CKR:OK, CKR being a tag to represent a function return value), it
must be preceded by a login operation, performed also with a success. We can distinguish
the temporal pattern (before the first occurrence of a successful call of C_VerifyInit
function) and the scope (eventually a successful call of the login function follows the
previous event, for instance C_VerifyInit).
eventually isCalled(C_Login, @CKR:OK) before isCalled(C_VerifyInit, @CKR:OK)
The second property expresses that when a user is logged out, the user must go through
the login state before performing any message verification function.
eventually isCalled(C_Login,@CKR:OK)
between isCalled(C_Logout, @CKR:OK) and isCalled(C_VerifyInit, @CKR:OK)
Each TOCL property is translated into an automaton, which allows the coverage of the
property to be measured, and also supports the generation of additional tests to augment
the coverage of the TOCL property.
Measuring the coverage of a property is based on measuring the coverage of the
automaton transitions by each already existing test. This step is illustrated in Fig. 9. The
automaton also has an error state, represented by the state containing a cross. If this state
is reached by any tests it means that the property is violated, which needs a further
investigation to define whether the security property is too restrictively written, or the
MBT model contains errors. In the latter case, our experience found that the TOCL
properties help in debugging the MBT model. Indeed, MBT models may contain errors, as
well as the code, and their correctness is often tackled by researchers and practitioners.
Once the coverage is evaluated, if any transitions of the automaton are not covered,
CertifyIt can produce test targets based on the uncovered automaton transitions and then
generate additional abstract test cases to augment the property coverage.
vulnerability types like code injection, source disclosure, file enumeration, remote file
inclusion (RFI), cross-site request forgery (CSRF), among others.
Bozic and Wotawa [129] present a MBT approach relying on attack patterns to detect
web application vulnerabilities such as SQL injections and XSS. An attack pattern is a
specification of a malicious attack. Represented by a UML state machine, it specifies the
goal, conditions, individual actions and postconditions of the represented attack. Test
cases are computed and executed by branching through the states of the state machine and
executing the corresponding methods of the SUT. This approach has been implemented as
a toolchain using several existing tools, such as Yakindu for the state machine modeling,
Eclipse to encapsulate the entire system, and WebScarab for the interpretation of
communication between the Web application and clients, and for manual submission of
attacks. Experiments have been conducted on three vulnerable applications (DVWA,
Mutillidae, and BodgeIt) and one real life application (WordPress Anchor). SQLI and XSS
vulnerabilities were found on Mutillidae and DVWA, on various security levels. No
vulnerability was found on Wordpress Anchor because an administrator needs to approve
each post submitted by users. It requires a more detailed model of the attack.
Wei et al. [130] focus on penetration test case inputs and propose a model-based
penetration test method for SQL injections. First, they provide attack models using the
Security Goal Model notation, which is a modeling method used to describe
vulnerabilities, security properties, attacks, and so on. Models are generic and describe
goals in a top-down fashion. A typical goal is for instance steal system information, and
is modeled as two subparts: error-message utilizing and blind injection. Hence, each topdown path in a model represents an attack process that realizes a certain attack goal. Each
top-down successful attack process represents the attack scheme, defined as a triple <
OBJ,INP,OUT >, OBJ being the attack goal, INP being the attack input, and OUT being
the vulnerable response of the Web application. To perform an actual attack, one must
instantiate the test case model according to the fingerprint of the web application and use
certain coverage criteria to generate executable test cases. The authors created an
automated web application SQL injection vulnerability penetration test tool called NKSI
scan: it applies the widely used crawling-attack-analysis method to detect the SQL
injection vulnerability in subject applications. They compared their technique with popular
scanners IBM AppScan and Acunetix. Results show that NKSI was able to discover more
flaws than those two scanners.
Xu et al. [131] present an approach to automate the generation of executable security
tests from Threat Model-Implementation Description (TMID) specifications, which
consist of threat models represented as Predicate/Transition (PrT) nets and a ModelImplementation Mapping (MIM) description. A threat model describes how a malicious
individual may trigger the system under test to violate a security goal. A MIM description
maps the individual elements of a threat model to their implementation constructs.
Abstract test cases (ie, complete attack paths) are computed in two steps. First a
reachability graph is generated from the threat net. It represents all states and state
transitions reachable from the initial marking. Then the reachability graph is transformed
to a transition tree containing complete attack paths by repeatedly expanding the leaf
nodes that are involved in attack paths but do not result from firings of attack transitions.
Concrete test cases are derived by automatically composing the attack paths and the MIM
description. The approach has been implemented in ISTA, a framework for automated test
code generation from Predicate/Transition nets, and experiments have been conducted on
two real-world systems. It shows good results with most vulnerabilities being found
(90%), whether they are web-related vulnerabilities (XSS, SQLi, CSRF, etc.) or protocolbased vulnerabilities (FTP).
Salva et al. [132] present a Model-Based Data Testing approach for Android
applications that automatically generates test cases from intent-based vulnerabilities, using
vulnerability patterns. It specifically targets the Android Intent Messaging mechanism,
whose objective is to allow sharing of actions and data between components using content
providers, in order to perform operations. The concern is that attackers may exploit this
mechanism to pass on payloads from component to component, infecting the whole
system and making their attack more severe. This approach therefore searches for data
vulnerabilities inside components. The automated generation of test cases relies on three
artifacts: vulnerability patterns, class diagrams, and specifications. Vulnerability patterns
are specialized InputOutput Symbolic Transition Systems, which allow formal
expression of intent-based vulnerabilities. A pattern formally exhibits intent-based
vulnerabilities and helps to define test verdicts. Class diagrams are partially generated
from the decompiled Android application under test, and represent Android components
with their types and their relationships. They typically provide the Activities (these are
Android components that display screens to let users interact with programs) or Services
composed with content providers. Specifications are generated from the Android manifest.
They express the behavior of components after the receipt of intents combined with
content-provider requests. Test case generation is performed by composing the three
artifacts. This method has been implemented in a tool called APSET, and has been applied
to several real life applications. Results support the effectiveness of the tool, finding
vulnerabilities in popular Android applications such as YouTube and Maps.
A pattern-driven and MBT approach has been developed by Vernotte et al. [133] for
various vulnerability types, technical and logical. The approach relies on attack patterns
and a behavioral model of the SUT. The test generator uses attack patterns as guides, and
follows each step into the model. If each step has been fulfilled, an abstract test case is
computed. A more thorough presentation of this approach may be found in Section 6.2.4.
Language (ASLan++) model, where all traces fulfill the specified security properties. A
library of fault injection operators has been developed. The goal is to apply a fault
injection operator to the model, and use a model checker to report any violated security
goal. If a security goal has indeed been violated, the reported trace then constitutes an
Abstract Attack Trace (AAT). The attack traces are translated into concrete test cases by
using a two-step mapping: the first step is to translate an AAT into WAAL (Web
Application Abstract Language) actions, the second step is to translate WAAL actions into
executable code. An attack may be conducted in a fully automated fashion, at the browser
level. In some specific cases (disabled input elements, etc.), a test expert may be required
to craft HTTP level requests in order to recover from the error. This approach is highly
amenable to full automation.
Rocchetto et al. [135] present a formal model-based technique for automatic detection
of CSRF during the design phase. It is based on the ASLan++ language to define the
several entities involved (client, server) and their interactions. The client is used as an
oracle by the attacker, and the model is centered around the web server and extends the
work of DolevYao (usually used for security protocol analysis). To generate tests, the
model is submitted to the AVANTSSAR platform, which, when a CSRF is found, returns
an abstract attack trace reporting the list of steps an attacker has to follow in order to
exploit the vulnerability. This technique takes into account that the web server may have
some CSRF protection in place, and will try to bypass it. It will typically look for CSRF
token-related flaws, for instance if the tokens are unique for each client, and for each
client/server interaction. If no attack trace is produced, the specification is considered safe
regarding CSRF. The authors assume that attackers can listen to the network and build
their attack upon the transactions between a client and the server.
Felmetsger et al. [136] present advances toward the automated detection of application
logic vulnerabilities, combining dynamic execution and model checking in a novel way.
Dynamic execution allows for the inference of specifications that capture a web
applications logic, by collecting likely invariants. A likely invariant is derived by
analyzing the dynamic execution traces of the web application during normal operation,
and captures constraints on the values of variables at different program points, as well as
relationships between variables. The intuition is that the observed, normal behavior allows
one to model properties that are likely intended by the programmer. Model checking is
used with symbolic inputs to analyze the inferred specifications with respect to the web
applications code, and to identify invariants that are part of a true program specification. A
vulnerability is therefore any violation of such an invariant. This technique has been
implemented in a tool called Waler (Web Application Logic Errors AnalyzeR), which
targets servlet-based web applications written in Java. Up to now, Waler detects a
restricted set of logic flaws and is currently limited to servlet-based web applications, but
was still able to find previously undetected vulnerabilities in real-life applications while
producing a low number of false positives.
mutate nominal values to trigger flawed code in applications. Fuzzing techniques are
usually very cheap to deploy, do not suffer from false positives, but lack an expectedresult model and therefore rely on crashes and fails to assign a verdict. Two main fuzzing
techniques exist: mutation based and generation based. Mutation fuzzing consists of
altering a sample file or data following specific heuristics, while generation-based fuzzers
take the input specification and generate test cases from it. Fuzzing may be used for
crafting malicious input data [138], or crafting erroneous communication messages [139].
The approach presented by Duchene [138] consists of modeling the attackers behavior,
and driving this model by a genetic algorithm that evolves SUT input sequences. It
requires a state-aware model of the SUT, either derived from an ASLan++ description or
inferred from traces of valid/expected SUT execution. This model is then annotated using
input taint data-flow analysis, to spot possible reflections. Concrete SUT inputs are
generated with respect to an Attack Input Grammar which produces fuzzed values for
reflected SUT input parameters. The fitness function depends on the obtained SUT output
following the injection of a concrete SUT input. It computes the veracity of an input by
looking for correlations, using the string distance between a given input parameter value
and a substring of the output. Two genetic operators are used: mutation and cross-over. It
is an efficient technique for detecting XSS, as it goes beyond the classical XSS evasion
filters that may not be exhaustive. Such a technique also tackles multistep XSS discovery
by using a more complex string matching algorithm to generate an annotated FSM, in
order to inspect the SUT to find the possibilities of XSS at certain places.
A model-based behavioral fuzzing approach has been designed by Wang et al. [139] to
discover vulnerabilities of Database Management Systems (DBMS). A DBMS defines a
format rule that specifies packet format and a behavior rule that specifies its semantics and
functionality. This approach is based on two main artifacts. The first artifact is a
behavioral model, which includes fuzzing patterns and behavioral sequences. This is
obtained from a behavior analysis of DBMS (protocol format analysis, attack surface
analysis, etc.). A fuzzing pattern expresses the data structure of packets, the needs of
security testing, and the design strategy for vulnerability discovery. A behavioral sequence
defines the message transfer order between client and DBMS. The second artifact is a
DBMS Fuzzer composed of a test instance (a detailed test script based on fuzzing
patterns), and a finite state machine model EXT-NSFSM used for semivalid test case
generation based on behavioral sequences and test instances. The authors describe a
general framework for behavioral fuzzing that has been implemented and used in several
experiments. It allows for the generation of thousands of fuzzing instances, and despite a
few errors of analysis and script, the tool was able to discover buffer overflow
vulnerabilities, 10 of which were not released yet.
fuzzing techniques, and drives the test generation by security test patterns resulting from
risk assessment. This approach aims to improve the accuracy and precision of
vulnerability testing. It is supported by tools that automate the detection of vulnerabilities,
particularly in web applications.
The process, shown in Fig. 10, is composed of the four following activities:
1. The Modeling activity. As for every MBT approach, the modeling activity consists of
designing an MBT model that can be used to automatically generate abstract test cases.
The PMVT approach, based on the CertifyIt technology, requires a model designed using
the UML4MBT notation: UML class diagrams specify the static structure, while state
diagrams describe the dynamic behavior of the application (notably the navigation
between pages).
To ease and accelerate this modeling activity, a Domain Specific Modeling Language
(DSML) has been developed, called DASTML, which allows the global structure of a web
application to be modeled. It is composed of three entities: Page, Action and Data with
various link possibilities between the three. Only relevant information to vulnerability test
case generation is represented, such as the available pages (or screens in case of single-url
applications), the available actions on each page, and the user inputs of each action
(potentially used to inject an attack vector). An algorithm performs the automatic
instantiation of the UML4MBT notation based on a given DASTML model.
2. The Test Purpose design activity. This activity consists of formalizing a test procedure
from each vulnerability test pattern (vTP) that the generated test cases have to cover. vTPs
provide a starting point for security test case derivation by giving information on how to
compute appropriate vulnerability test cases depending on the kind of vulnerability. These
patterns are typically gathered from public databases such as CVE and OWASP, and from
research projects such as the ITEA2 DIAMONDS project.
Because vTPs are informal specifications they need to be translated into a machinereadable language, to allow the automatic computation of test cases by the generation
engine. Hence each procedure targeting a dedicated vulnerability is given by a test
purpose, which is a high-level expression that formalizes a testing objective to drive the
automated test generation on the test model. Basically, such a test purpose can be seen as a
partial algorithm that defines a sequence of significant steps that has to be executed by the
test case scenario. Each step takes the form of a set of operations or behaviors to be
covered, or specific state to be reached on the test model, in order to assess the robustness
of the application under test with respect to the vulnerability that is being tested. The test
purpose language supports complex pattern modeling by making use of OCL constraints
to define specific states to reach and data to collect, and foreach statements to iterate
over enumeration literals (abstract data) and thus unfold a given test purpose into
numerous abstract test cases.
3. The Test Generation activity. The test generation process automatically produces
abstract vulnerability test cases, including the expected results. It consists of instantiating
the vulnerability test purposes on the test model of the application under test: the test
model and the test purposes are both translated into elements and data directly computable
by the test generator CertifyIt. Test case generation is performed by instantiating the
selected test purposes on the behavioral UML4MBT test model specifying the web
application under test.
Notably, test purposes are transformed into test targets, which are defined by a sequence of
intermediate objectives used by the test generation engine. The test targets are then
executed on the test model to generate the abstract test cases. In this way, each test
purpose produces one or more abstract test cases that verify the test purpose specification,
while satisfying the constraints of the behavioral test model.
4. The Adaptation, Test Execution and Observation activity. The abstract test cases are
finally exported into the test execution environment. This consists of automatically
creating a JUnit test suite, in which each abstract test case is exported as a JUnit test case
skeleton that embeds the test sequence and the observation procedures in order to
automate the verdict assignment.
However, during the modeling activity all data used by the application is modeled at an
abstract level. As a consequence, test cases are abstract and cannot be executed directly as
they are. To bridge the gap, test engineers must link the abstract data to concrete data in
order to provide executable test scripts. It should be emphasized that all abstract
operations (login, register, goto_page, and so on) are automatically concretized
using basic HTMLUnit primitives. As this sequence of primitives is rather generic, test
engineers may have to tweak the generated code if the web application under test requires
it.
In summary, the key aspects of this approach are:
1. The formalization of vulnerability test patterns using generic test purposes to drive the
test generation engine;
2. The use of a DSML to ease and accelerate the functional modeling of the Web
application under test;
3. The full automation of the testing process, including test generation, test execution and
verdict assignment.
This PMVT approach has been found to be suitable for a range of vulnerability types,
including technical ones (XSS, SQL injections, CSRF) as well as logical ones
(authentication bypass, privilege escalation).
web applications, and more generally, model-based GUI testing (see Section 4.3), GUIripping techniques can automate the construction of the model. This has to be extended
to other types of testing by analyzing existing artifacts to provide the right information.
One limit of this approach is of course the oracle problem: extracting information from
the system under test can help to determine sequence of test actions. But the expected
results need to be derived from an external source, not from the buggy application.
Provide DSML to simplify the creation and maintenance of the MBT model. DomainSpecific Modeling Languages adapt the model creation process to the semantics of the
application domain. Domain-specific modeling allows using existing domain
terminology, with known semantics, and familiar notation in a specific application
domain. For example, DSML may be used for ERP MBT, by adapting the modeling
language to the targeted business domain, or in the context of aircraft traffic monitoring
systems, by specifying MBT models using specific domain concepts such as plane,
navigation path and monitoring zone. The use of DSML in MBT may allow faster
model development and wider model accessibility, as compared to the use of generalpurpose modeling languages.
Better reuse of existing requirements artifacts. At system or acceptance testing levels,
testing objectives are strongly related to system requirements (particularly functional
requirements). Requirements engineering leads to a large variety of artifacts, which are
often very informal such as user stories or use case descriptions [142]. This leads also to
structured artifacts like business process models or interface requirements specification.
Automating the reuse of such informal artifacts may facilitate and accelerate MBT
modeling activities. For example, deriving test cases or partial MBT models from use
cases may help to capture the basic flow of events and the alternate flows of events.
Some other practical challenges for the future, particularly for the increased use of MBT
in industry, are:
the portability of models between different MBT toolseven when two tools use similar
modeling notations such as UML, they tend to use different subsets of those notations,
so models are not immediately transferable between MBT tools. This is likely to
improve gradually as certain notations and tools achieve market dominance;
the training of new MBT users in how to model for testing, and how to use MBT
techniques in practice. The proposed ISTQB certified tester extension for MBT is
expected to make a big improvement in this area, but learning how to design good MBT
models is a nontrivial task, so on-going training, shepherding and support systems will
be needed;
the urgency for further studies that compare the use of MBT with other testing
approaches.
References
[1] Lee C. A Practitioners Guide to Software Test Design. Norwood, MA: Artech
House, Inc. 2004.
[2] Hung Q., Michael H., Brent K. Global Software Test Automation: A Discussion
of Software Testing for Executives. Cupertino, USA: Happy About; 2006.
[3] Pretschner A., Prenninger W., Wagner S., Khnel C., Baumgartner M., Sostawa
B., Zlch R., Stauner T. One evaluation of model-based testing and its
automation. In: Proceedings of the 27th International Conference on Software
Engineering, ICSE 05, St. Louis, MO, USA. New York, NY: ACM; 2005:158113-963-2392401. doi:10.1145/1062455.1062529.
[4] Utting M., Legeard B. Practical Model-Based TestingA Tools Approach. San
Francisco, CA: Morgan Kaufmann; 2006.0123725011.
[5] Mark U., Alexander P., Bruno L. A taxonomy of model-based testing
approaches. Softw. Test. Verif. Rel. 1099-16892012;22(5):297312.
doi:10.1002/stvr.456.
[6] Jorgensen P.C. A Craftsmans Approach. first ed Boca Raton, FL: Auerbach
Publications; 2009.
[7] Chow T.S. Testing software design modeled by finite-state machines. IEEE
Trans. Softw. Eng. 0098-55891978;4(3):178187.
[8] Pairwise web site at http://www.pairwise.org/, 2015
[9] Wolfgang G., Nicolas K., Keith S., Victor B. Model-based quality assurance of
protocol documentation: tools and methodology. Softw. Test. Verif. Rel. 109916892011;21(1):5571.
[10] Binder R., Bruno L., Anne K. Model-based testing: where does it stand?
Commun. ACM. 0001-07822015;58(2):5256. doi:10.1145/2697399.
[11] Aichernig B.K. Contract-based testing. In: Berlin: Springer; 3-540-20527-634
48. Formal Methods at the Crossroads: From Panacea to Foundational Support,
Lecture Notes in Computer Science. 2003;vol. 2757.
[12] Baudin P., Cuoq P., Fillitre J.-C., March C., Monate B., Moy Y., Prevosto V.
ACSL: ANSI/ISO C specification language version 1.7. 2013. http://framac.com/download/acsl-implementation-Fluorine-20130601.pdf.
[13] Barnett M., Leino K.R.M., Schulte W. The Spec# programming system: an
overview. In: Proceedings of the International Workshop on Construction and
Analysis of Safe, Secure and Interoperable Smart Devices (CASSIS04),
Marseille, France, Lecture Notes in Computer Science, vol. 3362, Springer;
2004:4969.
[14] Jacky J., Veanes M. NModel, online 2006. 2015. http://nmodel.codeplex.com/
(last access March 2015).
[15] Cheon Y., Leavens G.T. A simple and practical approach to unit testing: the JML
and JUnit way. In: Magnusson B., ed. 16th European Conference on ObjectOriented Programming, ECOOP 2002, Lecture Notes in Computer Science, vol.
2374, Springer, Berlin; 2002:3-540-43759-2231255.
[16] Zimmerman D.M., Nagmoti R. JMLUnit: the next generation. In: Formal
Verification of Object-Oriented Software, Lecture Notes in Computer Science.
Springer; 183197. 2010;vol. 6528.
[17] Leavens G.T., Baker A.L., Ruby C. JML: a notation for detailed design. In: Kilov
H., Rumpe B., Simmonds I., eds. Behavioral Specifications of Businesses and
Systems. Boston, MA: Kluwer Academic Publishers; 1999:175188 (36
references).
[18] Gligoric M., Gvero T., Jagannath V., Khurshid S., Kuncak V., Marinov D. Test
generation through programming in UDITA. In: ICSE (1). 2010:225234.
http://doi.acm.org/10.1145/1806799.1806835.
[19] Visser W., Havelund K., Brat G.P., Park S., Lerda F. Model checking programs.
Autom. Softw. Eng. 2003;10(2):203232.
http://dx.doi.org/10.1023/A:1022920129859.
[20] Heidegger P., Thiemann P. JSConTest: contract-driven testing and path effect
inference for JavaScript. J. Obj. Technol. 2012;11(1):129.
http://dx.doi.org/10.5381/jot.2012.11.1.a6.
[21] Mirshokraie S. Effective test generation and adequacy assessment for javascriptbased web applications. In: Proceedings of the 2014 International Symposium on
Software Testing and Analysis, ISSTA 2014, San Jose, CA, USA. New York, NY:
ACM; 2014:978-1-4503-2645-2453456. doi:10.1145/2610384.2631832.
[22] Enderlin I., Dadeau F., Giorgetti A., Ben Othman A. Praspel: a specification
language for contract-based testing in PHP. In: ICTSS. 2011:6479.
[23] Baker P., Dai Z.R., Grabowski J., Haugen O., Samuelsson E., Schieferdecker I.,
Williams C.E. The UML 2.0 testing profile. In: Proceedings of the 8th
Conference on Quality Engineering in Software Technology (CONQUEST),
Nuremberg, Germany. 2004:181189.
[24] Dai Z.R., Grabowski J., Neukirchen H., Pals H. From design to test with UML.
In: Groz R., Hierons R.M., eds. Testing of Communicating Systems. Berlin:
Springer; 978-3-540-21219-53349. Lecture Notes in Computer Science.
2004;vol. 2978.
[25] Sawant V., Shah K. Construction of test cases from UML models. In: Shah K.,
Lakshmi Gorty V.R., Phirke A., eds. Technology Systems and Management.
Berlin: Springer; 978-3-642-20208-76168. doi:10.1007/978-3-642-20209-4.
Communications in Computer and Information Science. 2011;vol. 145.
[26] Cantenot J., Ambert F., Bouquet F. Test generation with satisfiability modulo
theories solvers in model-based testing. Softw. Test. Verif. Rel. 1099-
16892014;24(7):499531. doi:10.1002/stvr.1537.
[27] Yue T., Ali S., Briand L. Automated transition from use cases to UML state
machines to support state-based testing. In: France R.B., Kuester J.M., Bordbar
B., Paige R.F., eds. Modelling Foundations and Applications. Berlin: Springer;
978-3-642-21469-1115131. doi:10.1007/978-3-642-21470-7\_9. Lecture Notes
in Computer Science. 2011;vol. 6698.
[28] Nogueira S., Sampaio A., Mota A. Test generation from state based use case
models. Form. Asp. Comput. 0934-50432014;26(3):441490.
doi:10.1007/s00165-012-0258-z.
[29] Pickin S., Jezequel J.-M. Using UML sequence diagrams as the basis for a formal
test description language. In: Boiten E.A., Derrick J., Smith G., eds. Integrated
Formal Methods. Berlin: Springer; 978-3-540-21377-2481500.
doi:10.1007/978-3-540-24756-2\_26. Lecture Notes in Computer Science.
2004;vol. 2999.
[30] Rountev A., Kagan S., Sawin J. Coverage criteria for testing of object
interactions in sequence diagrams. In: Cerioli M., ed. Fundamental Approaches to
Software Engineering. Berlin: Springer; 978-3-540-25420-1289304.
doi:10.1007/978-3-540-31984-9\_22. Lecture Notes in Computer Science.
2005;vol. 3442.
[31] Tripathy A., Mitra A. Test case generation using activity diagram and sequence
diagram. In: Aswatha K.M., Selvarani R., Kumar T.V.S., eds. Proceedings of
International Conference on Advances in Computing. India: Springer; 978-81322-0739-9121129. doi:10.1007/978-81-322-0740-5\_16. Advances in
Intelligent Systems and Computing. 2012;vol. 174.
[32] Panthi V., Mohapatra D. Automatic Test Case Generation Using Sequence
Diagram. In: Kumar A., Ramaiah M.S., Kumar T.V.S., eds. Proceedings of
International Conference on Advances in Computing, Advances in Intelligent
Systems and Computing, vol. 174, Springer, India; 2012:978-81-322-0739-9277
284. doi:10.1007/978-81-322-0740-5\_33.
[33] Kumar R., Bhatia R.K. Interaction diagram based test case generation. In:
Krishna P.V., Babu M.R., Ariwa E., eds. Global Trends in Information Systems
and Software Applications. Berlin: Springer; 978-3-642-29215-6202211.
doi:10.1007/978-3-642-29216-3\_23. Communications in Computer and
Information Science. 2012;vol. 270.
[34] Jena A.K., Swain S.K., Mohapatra D.P. Test case creation from UML sequence
diagram: a soft computing approach. In: Jain L.C., Patnaik S., Ichalkaranje N.,
eds. Intelligent Computing, Communication and Devices. India: Springer; 97881-322-2011-4117126. doi:10.1007/978-81-322-2012-1\_13. Advances in
Intelligent Systems and Computing. 2015;vol. 308.
[35] Reijers A.H., van Wijk S., Mutschler B., Leurs M. BPM in practice: who is doing
what? In: Hull R., Mendling J., Tai S., eds. Business Process Management.
Berlin: Springer; 978-3-642-15617-54560. doi:10.1007/978-3-642-15618-2\_6.
Lecture Notes in Computer Science. 2010;vol. 6336.
[36] Jensen S.H., Thummalapenta S., Sinha S., Chandra S. Test Generation from
Business Rules. IBM Research Report; 2014 Tech. Rep. RI14008.
[37] Mecke C. Automated testing of mySAP business processes. In: Meyerhoff D.,
Laibarra B., van der Pouw Kraan R., Wallet A., eds. Software Quality and
Software Testing in Internet Times. Berlin: Springer; 2002:978-3-540-426325261279. doi:10.1007/978-3-642-56333-1\_17.
[38] Andreas H., Tobias G., Volker G., Holger F. Business process-based testing of
web applications. In: zur Muehlen M., Su J.W., eds. Business Process
Management Workshops. Berlin: Springer; 978-3-642-20510-1603614.
doi:10.1007/978-3-642-20511-8. Lecture Notes in Business Information
Processing. 2011;vol. 66.
[39] Anand S., Burke E.K., Chen T.Y., Clark J., Cohen M.B., Grieskamp W., Harman
M., Harrold M.J., McMinn P. An orchestrated survey of methodologies for
automated software test case generation. J. Syst. Softw. 016412122013;86(8):19782001. doi:10.1016/j.jss.2013.02.061.
[40] Wang Y., Yang N. Test case generation of web service composition based on CPnets. J. Softw. 2014;9(3).
http://ojs.academypublisher.com/index.php/jsw/article/view/jsw0903589595.
[41] Yuan X., Cohen M.B., Memon A.M. GUI interaction testing: incorporating event
context. IEEE Trans. Softw. Eng.g. 0098-55892011;37(4):559574.
doi:10.1109/TSE.2010.50.
[42] Memon A., Banerjee I., Nguyen B.N., Robbins B. The first decade of GUI
ripping: extensions, applications, and broader impacts. In: 20th Working
Conference on Reverse Engineering (WCRE), 2013. 2013:1120.
doi:10.1109/WCRE.2013.6671275.
[43] Memon A., Nguyen B.N. GUITAR. 2015. http://sourceforge.net/projects/guitar/
(last access March 2015).
[44] Hackner D.R., Memon A.M. Test case generator for GUITAR. In: Companion of
the 30th International Conference on Software Engineering, ICSE Companion
08, Leipzig, Germany. New York, NY: ACM; 2008:978-1-60558-079-1959960.
doi:10.1145/1370175.1370207.
[45] Amalfitano D., Fasolino A.R., Tramontana P., De Carmine S., Memon A.M.
Using GUI ripping for automated testing of android applications. In: Proceedings
of the 27th IEEE/ACM International Conference on Automated Software
Engineering, ASE 2012, Essen, Germany. New York, NY: ACM; 2012:978-14503-1204-2258261. doi:10.1145/2351676.2351717.
[46] Arlt S., Borromeo P., Schf M., Podelski A. Parameterized GUI Tests. In: Nielsen
B., Weise C., eds. Testing Software and Systems. Berlin: Springer; 978-3-64234690-3247262. Lecture Notes in Computer Science. 2012;vol. 7641.
[47] Arlt S., Podelski A., Bertolini C., Schaf M., Banerjee I., Memon A.M.
Lightweight static analysis for GUI testing. In: 23rd IEEE International
Symposium on Software Reliability Engineering (ISSRE), 2012. IEEE; 2012:301
310.
[48] Gross F., Fraser G., Zeller A. EXSYST: Search-based GUI testing. In: 34th
International Conference on Software Engineering (ICSE), 2012. 2012:1423
1426. doi:10.1109/ICSE.2012.6227232 ISSN 0270-5257.
[49] Moreira R.M.L.M., Paiva A.C.R., Memon A. A pattern-based approach for GUI
modeling and testing. In: 24th IEEE International Symposium on Software
Reliability Engineering (ISSRE), 2013. 2013:288297.
doi:10.1109/ISSRE.2013.6698881.
[50] Cohen M.B., Huang S., Memon A.M. AutoInSpec: using missing test coverage to
improve specifications in GUIs. In: 23rd IEEE International Symposium on
Software Reliability Engineering (ISSRE), 2012. 2012:251260.
doi:10.1109/ISSRE.2012.33 ISSN 1071-9458.
[51] Bolis F., Gargantini A., Guarnieri M., Magri E., Musto L. Model-driven testing
for web applications using abstract state machines. In: Grossniklaus M., Wimmer
M., eds. Current Trends in Web Engineering. Berlin: Springer; 978-3-642-3562237178. Lecture Notes in Computer Science. 2012;vol. 7703.
[52] Lelli V., Blouin A., Baudry B. Classifying and qualifying GUI defects. In: IEEE
International Conference on Software Testing, Verification and Validation (ICST
2015). IEEE; April 2015:110. doi:10.1109/ICST.2015.7102582.
[53] Zhu H., Belli F. Advancing test automation technology to meet the challenges of
model-based software testing. J. Inform. Softw. Technol. 2009;51(11):14851486.
[54] Dustin E., Garrett T., Gauf B. Implementing Automated Software Testing: How to
Save Time and Lower Costs While Raising Quality. Indianapolis, USA: Addison
Wesley Professional; 2009.0-32-158051-6.
[55] Thimbleby H.W. The directed Chinese postman problem. Soft. Pract. Exp.
2003;33(11):10811096. http://dx.doi.org/10.1002/spe.540.
[56] Davis M., Logemann G., Loveland D. A machine program for theorem-proving.
Commun. ACM. 1962;5(7):394397.
[57] de Moura L., Bjrner N. Z3: an efficient SMT solver. In: 14th International
Conference on Tools and Algorithms for the Construction and Analysis of
Systems (TACAS08), Budapest, Hungary, Lecture Notes in Computer Science,
vol. 4963, Springer, Berlin; 2008:337340.
[58] Barrett C., Tinelli C. CVC3. In: 19th International Conference on Computer
Aided Verification (CAV07), Berlin, Germany. 2007:298302.
[59] Barrett C., Conway C.L., Deters M., Hadarean L., Jovanovic D., King T.,
Reynolds A., Tinelli C. CVC4. In: 23rd International Conference on Computer
Aided Verification (CAV11), Snowbird, UT, USA; 2011:171177.
[60] Dutertre B. Yices 2.2. In: Berlin: Springer; 737744. Computer-Aided
Verification (CAV14), Lecture Notes in Computer Science. 2014;vol. 8559.
[61] Cimatti A., Griggio A., Schaafsma B., Sebastiani R. The MathSAT5 SMT Solver.
In: International Conference on Tools and Algorithms for the Construction and
Analysis of Systems (TACAS13), Lecture Notes in Computer Science, vol.
7795; 2013:93107.
[62] Tillmann N., de Halleux J. Pex white box test generation for .NET. In: Berlin:
Springer; 134153. Tests and Proofs (TAP08), Lecture Notes in Computer
Science. 2008;vol. 4966.
[63] Cantenot J., Ambert F., Bouquet F. Test generation with SMT solvers in modelbased testing. STVR, Softw. Test. Verif. Rel. 2014;24(7):499531.
[64] Arcaini P., Gargantini A., Riccobene E. Optimizing the automatic test generation
by SAT and SMT solving for Boolean expressions. In: 26th IEEE/ACM
International Conference on Automated Software Engineering (ASE11).
Washington, DC: IEEE Computer Society; 2011:388391.
[65] Cristi M., Frydman C.S. Applying SMT solvers to the test template framework.
In: 7th Workshop on Model-Based Testing (MBT12), Tallinn, Estonia,
Electronic Proc. in Theoretical Computer Science, vol. 80; 2012:2842.
[66] Cristi M., Rossi G., Frydman C.S. {log} as a test case generator for the test
template framework. In: 11th International Conference on Software Engineering
and Formal Methods (SEFM13), Madrid, Spain, Lecture Notes in Computer
Science, vol. 8137; 2013:229243.
[67] Macworth A.K. Consistency in networks of relations. J. Artif. Intell.
1977;8(1):99118.
[68] Golomb S.W., Baumert L.D. Backtrack programming. J. ACM. 1965;12(4):516
524.
[69] van Hentenryck P., Dincbas M. Domains in logic programming. In: Nat. Conf. on
Artificial Intelligence (AAAI86). 1986:759765.
[70] Tsang E.P.K. Foundations of constraint satisfaction. Computation in cognitive
science. San Diego, CA: Academic Press; 1993.978-0-12-701610-8.
[71] Mossige M., Gotlieb A., Meling H. Testing robot controllers using constraint
programming and continuous integration. Inform. Softw. Technol. 2015;57:169
185.
[72] Carlsson M., Ottosson G., Carlson B. An open-ended finite domain constraint
solver. In: 9th International Symposium on Programming Languages:
Implementations, Logics, and Programs (PLILP97). London, UK: Springer-
Verlag; 1997:191206.
[73] Dovier A., Piazza C., Pontelli E., Rossi G. Sets and constraint logic
programming. ACM Trans. Program. Lang. Syst. 2000;22(5):861931.
[74] Dovier A., Piazza C., Rossi G. A uniform approach to constraint-solving for lists,
multisets, compact lists, and sets. ACM Trans. Comput. Log. 2008;9(3):130.
[75] Shirole M., Kumar R. UML behavioral model based test case generation: a
survey. SIGSOFT Softw. Eng. Notes. 0163-59482013;38(4):113.
[76] Doungsa-ard C., Dahal K., Hossain A., Suwannasart T. Test data generation from
UML state machine diagrams using GAs. In: International Conference on
Software Engineering Advances, ICSEA 2007. 2007:doi:10.1109/ICSEA.2007.70
pp. 4747.
[77] Lefticaru R., Ipate F. Functional Search-based Testing from State Machines. In:
1st International Conference on Software Testing, Verification, and Validation,
2008. 2008:525528. doi:10.1109/ICST.2008.32.
[78] Ali S., Iqbal M.Z., Arcuri A. Improved heuristics for solving OCL constraints
using search algorithms. In: Genetic and Evolutionary Computation Conference,
GECCO 14, Vancouver, BC, Canada, July 12-16, 2014. 2014:12311238.
[79] Ali S., Zohaib Iqbal M., Arcuri A., Briand L.C. Generating test data from OCL
constraints with search techniques. IEEE Trans. Softw. Eng. 2013;39(10):1376
1402.
[80] Shirole M., Kommuri M., Kumar R. Transition sequence exploration of UML
activity diagram using evolutionary algorithm. In: Proceedings of the 5th India
Software Engineering Conference, ISEC 12, Kanpur, India. New York, NY:
ACM; 2012:97100.
[81] Shirole M., Kumar R. A hybrid genetic algorithm based test case generation
using sequence diagrams. In: Ranka S., Banerjee A., Biswas K., Dua S., Mishra
P., Moona R., Poon S.-H., Wang C.-L., eds. Contemporary Computing. Berlin:
Springer; 5363. Communications in Computer and Information Science.
2010;vol. 94.
[82] Albert E., de la Banda M.J.G., Gmez-Zamalloa M., Rojas J.M., Stuckey P. A
CLP heap solver for test case generation. Theory Pract. Logic Program.
2013;13:721735 (special Issue 4-5).
[83] Godefroid P., Klarlund N., Sen K. DART: directed automated random testing. In:
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI05), Chicago, IL, USA. New York, NY: ACM; 2005:213
223.
[84] Sen K., Marinov D., Agha G. CUTE: a concolic unit testing engine for C. In:
10th European Software Engineering Conference (ESEC05), Lisbon, Portugal.
New York, NY: ACM; 2005:263272.
[85] Tillmann N., De Halleux J. Pex: white box test generation for .NET. In: 2nd
International Conference on Tests and Proofs (TAP08), Prato, Italy. Berlin:
Springer-Verlag; 2008:134153.
[86] Cadar C., Godefroid P., Khurshid S., Psreanu C.S., Sen K., Tillmann N., Visser
W. Symbolic execution for software testing in practice: preliminary assessment.
In: 33rd International Conference on Software Engineering (ICSE11), Waikiki,
Honolulu, HI, USA. New York, NY: ACM; 2011:10661071.
[87] Cadar C., Sen K. Symbolic execution for software testing: three decades later.
Commun. ACM. 2013;56(2):8290.
[88] Clarke E.M., Emerson E.A., Sistla A.P. Automatic verification of finite-state
concurrent systems using temporal logic specifications. ACM Trans. Program.
Lang. Syst. 1986;8(2):244263.
[89] Callahan J., Schneider F., Easterbrook S. Specification-based testing using model
checking. In: SPIN Workshop, Rutgers University. 1996:10661071 (Tech. Report
NASA-IVV-96-022).
[90] Fraser G., Wotawa F., Ammann P.E. Testing with model checkers: a survey.
Softw. Test. Verif. Rel. 2009;19(3):215261.
[91] Alur R., Dill D.L. A theory of timed automata. Theor. Comput. Sci.
1994;126:183235.
[92] Johan B., Larsen K.G., Fredrik L., Paul P., Wang Y. UPPAALa tool suite for
automatic verification of real-time systems. In: Workshop on Verification and
Control of Hybrid Systems III, no. 1066 in Lecture Notes in Computer Science.
Berlin: Springer-Verlag; 1995:232243.
[93] Bae K., Meseguer J. The linear temporal logic of rewriting Maude model
checker. In: Berlin: Springer; 208225. Rewriting Logic and Its Applications,
Lecture Notes in Computer Science. 2010;vol. 6381.
[94] Buchs D., Lucio L., Chen A. Model checking techniques for test generation from
business process models. In: Berlin: Springer; 5974. Reliable Software
Technologies, Ada-Europe 2009, Lecture Notes in Computer Science. 2009;vol.
5570.
[95] Enoiu E.P., Cauevi A., Ostrand T.J., Weyuker E.J., Sundmark D., Pettersson P.
Automated test generation using model checking: an industrial evaluation. Softw.
Tools Technol. Transf. 2014;119.
[96] Larsen K.G., Mikucionis M., Nielsen B. Online testing of real-time systems using
UPPAAL. In: Berlin: Springer; 7994. Formal Approaches to Testing of Software,
Linz, Austria, Lecture Notes in Computer Science. 2004;vol. 3395.
[97] Graf-Brill A., Hermanns H., Garavel H. A model-based certification framework
for the energy bus standard. In: Abraham E., Palamidessi C., eds. Formal
Techniques for Distributed Objects, Components, and Systems. Berlin: Springer;
[111] Xu D., Thomas L., Kent M., Mouelhi T., Le Traon Y. A model-based approach to
automated testing of access control policies. In: Proceedings of the 17th ACM
Symposium on Access Control Models and Technologies, SACMAT 12, Newark,
New Jersey, USA. New York, NY: ACM; 2012:978-1-4503-1295-0209218.
[112] Anisetti M., Ardagna C.A., Bezzi M., Damiani E., Sabetta A. Machine-readable
privacy certificates for services. In: Meersman R., Panetto H., Dillon T., Eder J.,
Bellahsene Z., Ritter N., De Leenheer P., Dou D., eds. On the Move to
Meaningful Internet Systems: OTM 2013 Conferences. Berlin: Springer; 978-3642-41029-1434450. Lecture Notes in Computer Science. 2013;vol. 8185.
[113] Mallouli W., Lallali M., Mammar A., Morales G., Cavalli A. Modeling and
testing secure web applications. In: Paris, France: Atlantis Press; 207255. WebBased Information Technologies and Distributed Systems, Atlantis Ambient and
Pervasive Intelligence. 2010;vol. 2.
[114] Masson P.-A., Potet M.-L., Julliand J., Tissot R., Debois G., Legeard B., Chetali
B., Bouquet F., Jaffuel E., Van Aertrick L., Andronick J., Haddad A. An access
control model based testing approach for smart card applications: results of the
POS project. J. Inform. Assur. Secur. 2010;5(1):335351.
[115] Botella J., Bouquet F., Capuron J.-F., Lebeau F., Legeard B., Schadle F. Modelbased testing of cryptographic componentslessons learned from experience. In:
Sixth IEEE International Conference on Software Testing, Verification and
Validation, Luxembourg, Luxembourg, March 18-22, 2013. 2013:192201.
[116] Dadeau F., Castillos K.C., Ledru Y., Triki T., Vega G., Botella J., Taha S. Test
generation and evaluation from high-level properties for common criteria
evaluationsthe TASCCC testing tool. In: Sixth IEEE International Conference
on Software Testing, Verification and Validation, Luxembourg, Luxembourg,
March 18-22, 2013. 2013:431438.
[117] Bouquet F., Peureux F., Ambert F. Model-based testing for functional and
security test generation. In: Aldini A., Lopez J., Martinelli F., eds. Foundations of
Security Analysis and Design VII. Switzerland: Springer International Publishing;
978-3-319-10081-4133. Lecture Notes in Computer Science. 2014;vol. 8604.
[118] Pellegrino G., Compagna L., Morreggia T. A tool for supporting developers in
analyzing the security of web-based security protocols. In: Yenign H., Yilmaz,
A. Ulrich C., eds. Testing Software and Systems. Berlin: Springer; 978-3-64241706-1277282. Lecture Notes in Computer Science. 2013;vol. 8254.
[119] Fourneret E., Ochoa M., Bouquet F., Botella J., Jrjens J., Yousefi P. Modelbased security verification and testing for smart-cards. In: Sixth International
Conference on Availability, Reliability and Security, ARES 2011, Vienna, Austria,
August 22-26, 2011. 2011:272279.
[120] Aichernig B.K., Weiglhofer M., Wotawa F. Improving fault-based conformance
testing. Electron. Notes Theor. Comput. Sci. 1571-06612008;220(1):6377.
[121] Jia Y., Harman M. An analysis and survey of the development of mutation
testing. IEEE Trans. Softw. Eng. 0098-55892011;37(5):649678.
[122] Traon Y.L., Mouelhi T., Baudry B. Testing security policies: going beyond
functional testing. In: ISSRE 2007, The 18th IEEE International Symposium on
Software Reliability, Trollhttan, Sweden, 5-9 November 2007. 2007:93102.
[123] Wimmel G., Jrjens J. Specification-based test generation for security-critical
systems using mutations. In: George C., Miao H., eds. Formal Methods and
Software Engineering. Berlin: Springer; 471482. Lecture Notes in Computer
Science. 2002;vol. 2495.
[124] Dadeau F., Ham P.-C., Kheddam R. Mutation-based test generation from
security protocols in HLPSL. In: Harman M., Korel B., eds. 4th Int. Conf. on
Software Testing, Verification and Validation, ICST 2011, Berlin, Germany. IEEE
Computer Society Press; 2011:240248.
[125] Schneider M., Gromann J., Schieferdecker I., Pietschker A. Online model-based
behavioral fuzzing. In: Sixth IEEE International Conference on Software Testing,
Verification and Validation Workshops (ICSTW), 2013. 2013:469475.
[126] Johansson W., Svensson M., Larson U.E., Almgren M., Gulisano V. T-fuzz:
model-based fuzzing for robustness testing of telecommunication protocols. In:
Seventh IEEE International Conference on Software Testing, Verification and
Validation (ICST), 2014. 2014:323332.
[127] MITRE. Common Weakness Enumeration. 2015. http://cwe.mitre.org/ (last
accessed April 2015).
[128] Blome A., Ochoa M., Li K., Peroli M., Dashti M.T. VERA: a flexible modelbased vulnerability testing tool. In: Proc. of the 6th Int. Conference on Software
Testing, Verification and Validation (ICST13). Luxembourg: IEEE Computer
Society; 2013:471478.
[129] Bozic J., Wotawa F. Security testing based on attack patterns. In: IEEE Seventh
International Conference on Software Testing, Verification and Validation
Workshops (ICSTW), 2014. IEEE; 2014:411.
[130] Wei T., Ju-Feng Y., Jing X., Guan-Nan S. Attack model based penetration test for
SQL injection vulnerability. In: 2012 IEEE 36th Annual Computer Software and
Applications Conference Workshops (COMPSACW). IEEE; 2012:589594.
[131] Xu D., Tu M., Sanford M., Thomas L., Woodraska D., Xu W. Automated
security test generation with formal threat models. IEEE Trans. Depend. Secure
Comput. 2012;9(4):526540.
[132] Salva S., Zafimiharisoa S.R. Data vulnerability detection by security testing for
android applications. In: Information Security for South Africa, 2013. IEEE;
2013:18.
[133] Vernotte A., Dadeau F., Lebeau F., Legeard B., Peureux F., Piat F. Efficient
Mark Utting is a Senior Lecturer in ICT at the University of the Sunshine Coast.
Previously, he worked as Senior Research Fellow in software engineering at QUT for
several years, developing computer simulations of future Queensland Electricity
Networks, and as Associate Professor at the University of Waikato in New Zealand,
teaching programming and software engineering. He has also worked in industry,
developing next-generation genomics software and manufacturing software. Mark is
coauthor of the book Practical Model-Based Testing: A Tools Approach, as well as more
than 60 publications on model-based testing, verification techniques for object-oriented
and real-time software, and language design for parallel computing.
Fabrice Bouquet studied computer science and received his PhD from the University of
Provence, France in 1999. He is a full Professor of Software Engineering at the University
of Franche-Comt, France. He researches the validation of complex systems from
requirements to models, including operational semantics, testing, model transformation,
functional and nonfunctional properties, with applications in vehicle, aircraft, smart
objects, and energy.
Fabien Peureux received his PhD in Computer Science from the University of FrancheComt in 2002, where he works since 2003 as Assistant Professor and does his research
activities with the FEMTO-ST Institute. Since 2005, he is also senior scientific consultant
for the Smartesting company. His main expertise is focused on the automation of
validation process in the domains of smartcard applications, information systems, and
embedded software, with a particular interest in Model-Based Testing techniques and agile
approaches.
Alexandre Vernotte received his PhD at the Institut Femto-ST, Besancon in 2015 in
Model-Based Security Testing for Web applications. He recently obtained a postdoc
position at the Department of Industrial Information and Control Systems at the Royal
Institute of Technology (KTH) in Stockholm, Sweden. His research centers on enterprise
system architectures security. His interests also include threat, risk, and behavioral
modeling, Model-Based Testing and Model-Based Engineering.
http://utp.omg.org
The Chinese Postman algorithm finds the shortest path that covers all the transitions of a finite state machine.
http://www.avantssar.eu/
http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pkcs-11-cryptographic-token-interface-standard.htm
http://www.globalplatform.org/
CHAPTER THREE
Abstract
For the last few decades, embedded systems have expanded their reach into major aspects of human lives. Starting
from small handheld devices (such as smartphones) to advanced automotive systems (such as anti-lock braking
systems), usage of embedded systems has increased at a dramatic pace. Embedded software are specialized
software that are intended to operate on embedded devices. In this chapter, we shall describe the unique challenges
associated with testing embedded software. In particular, embedded software are required to satisfy several nonfunctional constraints, in addition to functionality-related constraints. Such non-functional constraints may include
(but not limited to), timing/energy-consumption related constrains or reliability requirements, etc. Additionally,
embedded systems are often required to operate in interaction with the physical environment, obtaining their inputs
from environmental factors (such as temperature or air pressure). The need to interact with a dynamic, often nondeterministic physical environment, further increases the challenges associated with testing, and validation of
embedded software. In the past, testing and validation methodologies have been studied extensively. This chapter,
however, explores the advances in software testing methodologies, specifically in the context of embedded
software. This chapter introduces the reader to key challenges in testing non-functional properties of software by
means of realistic examples. It also presents an easy-to-follow, classification of existing research work on this topic.
Finally, the chapter is concluded with a review of promising future directions in the area of embedded software
testing.
Keywords
Non-functional property testing; Performance testing; Energy consumption of software; Search-based
software testing; Symbolic execution
1 Introduction
Over the last few decades, research in software testing has made significant progress. The
complexity of software has also increased at a dramatic pace. As a result, we have new
challenges involved in validating complex, real-world software. In particular, we are
specifically interested in testing and validation of embedded software. In this modern
world, embedded systems play a major role in human lives. Such software can be found
ubiquitously, in electronic systems such as consumer electronics (eg, smartphones, mp3
players, and digital cameras) and household appliances (eg, washing machines and
microwave ovens) to automotive (eg, electric cars and antilock braking systems) and
avionic applications. Software designed for embedded systems have unique features and
constraints that make its validation a challenging process. For instance, unlike Desktop
applications, the behavior of an embedded systems often depends on the physical
environment it operates in. As a matter of fact, many embedded systems often take their
inputs from the surrounding physical environment. This, however, poses unique
challenges to testing of such systems because the physical environment may be nondeterministic and difficult to recreate during the testing process. Additionally, most
embedded systems are required to satisfy several non-functional constraint such as timing,
energy consumption, reliability, to name a few. Failure to meet such constraints can result
in varying consequences depending upon the application domain. For instance, if the
nature of constraints on the software are hard real time, violation may lead to serious
consequences, such as damage to human life and property. Therefore, it is of utmost
importance that such systems be tested thoroughly before being put to use. In the
proceeding sections, we shall discuss some of the techniques proposed by the software
engineering community that are targeted at testing and validation of real life, embedded
systems from various application domains and complexities. However, first we shall
present an example, inspired from a real life embedded system, that will give the reader an
idea on the nature of constraints commonly associated with embedded systems.
Fig. 1 provides the schematic representation of a wearable fall detection application
[1]. Such an application is used largely in the health care domain to assist the frail or
elderly patients. The purpose of the system, as shown in Fig. 1, is to detect a potential fall
of its wearer and to invoke appropriate safety measures. In order to detect a fall, the
system needs to monitor the users movement. This task is accomplished via a number of
sensors, that are positioned at different parts of the patients body. These sensors detect
physical motions and communicate the information via wireless sensor networks. In the
scenario when the system detects a potential fall it activates appropriate safety measures,
such as informing the health care providers over mobile networks. Testing the falldetection system is essential to ensure its functional correctness, such as a potential fall
must not go undetected. However, such a testing requires the inputs from the sensors. To
properly test the system, its designers should be able to systematically model the inputs
from sensors and the surrounding environment.
Apart from the functional correctness, the fall-detection system also needs to satisfy
several non-functional constraints. For instance, the detection of a fall should meet hard
timing constraints. In the absence of such constraints, the respective patient might get
seriously injured, making the system impractical to use. Moreover, if the application is
deployed into a battery operated device, its energy consumption should be acceptable to
ensure a graceful degradation of battery life. Finally, due to the presence of unreliable
hardware components (eg, sensors) and networks (eg, sensor and mobile networks), the
application should also guarantee that a potential fall of the patient is detected with
acceptable reliability.
Non-functional properties of embedded software, such as timing and energy, are
extremely sensitive to the underlying execution platform. This makes the testing process
complicated, as the underlying execution platform may not be available during the time of
testing. Besides, if the embedded software is targeted at multiple execution platforms, its
non-functional properties need to be validated for each such platform. To alleviate these
issues, a configurable model for the execution platform might be used during the testing
process. For instance, such a configurable model can capture the timing or energy
behavior of different hardware components. Building such configurable models, however,
may turn out challenging due to the complexity of hardware and its (vendor-specific)
intellectual properties.
Over the last two decades, numerous methods in software testing have been proposed.
These include random testing, search-based testing, and directed testing (eg, based on
symbolic execution), among several others. These testing methodologies have focused
primarily on the validation of functional properties. Validation of non-functional software
properties, have gained attention only recently. In this Chapter, we explore the potential of
different testing methodologies in the context of embedded software. For an embedded
software, its non-functional aspects play a crucial role in the validation process. We
introduce some salient properties of validating typical embedded systems in Section 2.
Subsequently, we shall explore the recent advances in testing embedded systems in
Section 3. We first categorize all testing methodologies into three broader categories. Such
categories reflect the level of abstraction, in which embedded systems are validated. In
particular, our first category captures black-box testing, where the system is abstracted
away and test inputs are generated via sampling of the input space. The remaining
categories either use an abstract model of the system or the actual implementation. We
shall discuss that different testing machineries (eg, evolutionary testing and symbolic
execution) can be employed for such categories. Based on our categorization of testing
embedded systems, we shall argue that no single category can be decided to be superior
than others. In general, the choice of abstraction, for testing embedded system, largely
depends on the intention of the designer. For instance, if the designer is interested in
detecting fine-grained events (eg, memory requests and interrupts), it is recommended to
carry out the testing process on the actual implementation (eg, binary code). On the
contrary, testing binary code may reveal non-functional bugs too late in the design
process, leading to a complete redesign of the software.
Through this chapter, we aim to bring the attention of software engineering community
towards the unique challenges involved in embedded software testing. Specifically, testing
of non-functional properties is an integral part of validating embedded software. In order
to validate non-functional properties, software testing methodologies should explicitly
target to discover non-functional bugs, such as the loss of performance and energy.
Moreover, in order to test functional properties of embedded software, the designer should
be able to simulate the interaction of software with the physical environment. We shall
discuss several efforts in recent years to discover functional as well as non-functional bugs
in embedded software. In spite of these efforts, numerous challenges still exist in
validating embedded software. For instance, non-functional behaviors of embedded
software (eg, time and power) can be exploited to discover secret inputs (eg, secret keys in
cryptographic algorithms). Testing of timing and energy-related properties is far from
being solved, not to mention the immaturity of the research field to validate security
constraints in embedded software. We hope this chapter will provide the necessary
background to solve these existing challenges in software testing.
The physical environment (eg, inputs read from sensors) might be made completely
unconstrained during the time of testing. This enables the testing of software under all
operating conditions of the physical environment. However, such an approach might
turn infeasible for complex embedded software. Besides, unconstraining the physical
environment might lead to unnecessary testing for irrelevant inputs. Such inputs may
include sensor readings (such as 300 K for air temperature readings) that may never
appear in the environment where the software is deployed.
The physical environment might be simulated by randomly generating synthetic inputs
(eg, generating random temperatures readings). However, such an approach may fail to
generate relevant inputs. However, like traditional software testing, search-based
techniques might improve the simulation of physical environment via evolutionary
methods and metaheuristics.
With a clear knowledge of the embedded software, the testing process can be improved.
For instance, in the fall-detection system, it is probably not crucial to simulate the
movement for all possible movement angles. It is, however, important to test the
application for some inputs that indicate a fall of the patient (hence, indicating safety)
and also for some inputs that does not capture a fall (hence, indicating the absence of
false positives). In general, building such abstractions on the input space is challenging
and it also requires a substantial domain knowledge of the input space.
We shall now discuss some non-functional properties that most embedded software are
required to satisfy.
precisely, the timeframe between the sampling of sensor inputs and triggering an alarming
situation should have strict timing constraints. Violation of such constraints may lead to
the possibility of detecting a fall too late, hence, making the respective software
impractical. Therefore, it is crucial that the validation process explicitly targets to discover
the violation of timing-related constraints. It is, however, challenging to determine the
timing behavior of an application, as the timing critically depends on the execution
platform. The execution platform, in turn, may not be available during the testing phase.
As a result, the validation of timing-related constraints, may often involve building a
timing model of the underlying execution platform. Such a timing model should be able to
estimate the time taken by each executed instruction. In general, building such timing
models is challenging. This is because, the time taken by each instruction depends on the
specific instruction set architecture (ISA) of the processor, as well as the state of different
hardware components (eg, cache, pipeline, and interconnect). To show the interplay
between the ISA and hardware components, let us consider the program fragment shown
in Fig. 3.
FIGURE 3 The timing interplay between hardware components (eg, caches) and instructions.
In Fig. 3, the true leg of the conditional executes an add instruction and the false leg
of the branch executes a multiply instruction. Let us assume that we want to check
whether this code finishes within some given time budget. In other words, we wish to find
out if the execution time of branch with the longer execution time is less than the given
time budget. In a typical processor, a multiplication operation generally takes longer than
an addition operation. However, if the processor employs a cache between the CPU and
the memory, the variable z will be cached after executing the statement z := 3. Therefore,
the statement x := x * z can be completed without accessing the memory, but the
processor may need to access the memory to execute x := x + y (to fetch y for the first
time). As a result, even though multiplication is a costly operation compared to addition,
in this particular scenario, the multiplication may lead to a faster completion time. This
example illustrates that a timing model for an execution platform should carefully
consider such interaction between different hardware components.
Once a timing model is built for the execution platform, the respective software can be
tested against the given timing-related constraints. Broadly, the validation of timing
constraints may involve the following procedures:
The testing procedure may aim to discover the violation of constraints. For instance, let
software, such as a fall detector. Besides, the reliability of a component and its cost has
nontrivial trade-offs. For instance, a more accurate sensor (or a reliable network) might
incur higher cost. Overall, the designer must ensure that the respective software operates
with an acceptable level of reliability. As an example, in the fall detector, the designer
would like to ensure that a physical fall is alarmed with x% reliability. Computing the
reliability of an entire system might become challenging when the system consists of
several components and such components might interact with each other (and the physical
world) in a fairly complex fashion.
To summarize, apart from the functionality, most embedded software have several nonfunctional aspects to be considered in the testing process. Such non-functional aspects
include timing, energy, and reliability, among others. In general, the non-functional
aspects of embedded software may lead to several complex trade-offs. For instance, an
increased rate of sampling sensor inputs (which capture the data from the physical world)
may increase energy consumption; however, it might increase the reliability of the
software in terms of monitoring the physical environment. Similarly, a naive
implementation to improve the functionality may substantially increase the energy
consumption or it may lead to the loss of performance. As a result, embedded software are
required to be systematically tested with respect to their non-functional aspects. In the
next section, we shall discuss several testing methodologies for embedded software, with a
specific focus on their non-functional properties.
Black-Box Abstraction : Such techniques often consider the SUT as a black-box. Test
cases are generated by sampling, randomized testing techniques.
Grey-Box Abstraction : Such techniques do not treat the SUT as a black-box. The SUT is
represented by a model, which captures only the information related to the property of
interest. Test cases are generated by exploring the search space of the model.
White-Box Abstraction : Techniques in this category often require the source code or
binary of the implemented system for the testing process. In other words, the source
code and binary serves as the model of the system. Test cases are generated by
searching the input space of the implemented system.
In subsequent sections, we shall elaborate on each of the categorization as described in
the preceding paragraphs.
4 Black-Box Abstraction
One of the most simple (but not necessarily effective) approaches of testing complex
systems is to uniformly sample its input space. The goal of such sampling is to generate
test inputs. As exceedingly simple as such a method might seem, the effectiveness of such
uniform (or unguided) sampling remains questionable. When testing a system, in general,
the objective is to produce test inputs that bears witnesses to failure of the system. Such a
failure might capture the violation of a property of interest. Besides, such violations
should be manifested within a certain time budget for testing.1 Testing approaches, which
are purely based on uniform random sampling, clearly do not adhere to the
aforementioned criteria. For example, consider a system that expects an integer value as
an input. For such a system uniform random sampling may blindly continue to generate
test inputs forever without providing any information about the correctness (or incorrectness) of the system. However, there will be systems in the wild that are too
complex to model. Such systems require some sort of mechanism by which they can be
tested to some extent. For such systems, the sampling based technique, as discussed in the
following paragraphs, might be useful.
The work in [2, 3] proposes sampling based techniques to generate failure-revealing test
inputs for complex embedded systems. In particular, they focus on generating test inputs
that lead to violation of timing-related properties. For these techniques to work, the
essential timing-related properties of the system must be formulated via Metric Temporal
Logic (MTL). An MTL formula can be, in a broad way, described as a composition of
propositional as well as temporal operators. Common examples of propositional operators
are conjunction, disjunction, and negation, whereas some example of temporal operators
would be until, always, and eventually. Besides, MTL extends the traditional linear
temporal logic (LTL) with timing constraints. For instance, consider our example in Fig. 1.
Let us consider that a potential fall of the patient must be reported within 100 time units.
Such a criteria can be captured via the following MTL formula:
fall captures the event of a potential fall and alarm captures the event to notify the health
care providers. Besides, the temporal operators and capture always and eventually,
respectively. Once the timing-related properties of the system have been identified and
encoded as MTL formulas, the next step is to identify test inputs (as shown in Fig. 5), for
which the aforementioned formula do not hold true (ie, the system fails).
5 Grey-Box Abstraction
This class of techniques work by creating an abstract model of the SUT. As shown in Fig.
6, in general, frameworks discussed in this category require three key components as
follows:
FIGURE 7 Simple example showing (A) timed usage model and (B) timed automata.
Similar to the conventional MCUMs, a TUM has a set of states to capture the feasible
usage of the system. However, in TUM, an additional probability distribution function
(pdf) is associated with each state. This pdf encodes the time, for which the SUT will be
in the respective state.
In TUM, each transition between two states is triggered by a stimulus. Additionally,
edges connecting the states are associated with two variables, a transition probability
and a probability distribution function (pdf) of stimulus time. As the name suggests, the
transition probability captures the probability of the respective transition between two
states. Therefore, the transition probability has a similar role to that of conventional
MCUMs. The pdf of the stimulus time represents the duration of execution of the
stimulus on the system, at a given state.
In a deterministic MCUM, there could be at most one transition (from a given state) for a
given stimulus. However, in a TUM, the next state not only depends on the stimulus,
but also on the duration of the execution of the stimulus. This feature is required to
capture timing-related dependencies in the system. Additionally, to maintain
consistency, the pdfs of stimulus time, originating from a state, do not overlap.
Once the model of the system has been created, a variety of model-exploration
techniques can be used to generate test cases. For instance [5] and [6] perform a simple
random walk of the TUM model to generate test cases while other works such as [7] and
[8], have designed coverage metrics to guide the test-generation process. In particular,
works in [7] and [8], combine the usage of TUMs with dependencies between the different
components of the SUT. This allows them to generate test cases that not only represent
different timing scenarios, but also capture dependencies between critical system
components.
Another line of work [9] propose to extend finite state machines model (FSM) to
incorporate timing-related constraints. Such a model is most commonly known as timed
automata (TA). In timed automata, an FSM is augmented with a finite number of clocks.
These clocks are used to generate boolean constraints and such constraints are labeled on
the edges of the TA. Additionally, clock values can be manipulated and reset by different
transitions of the automata. The boolean constraints succinctly captures the criteria for the
respective transition being triggered. Timed automata also has the feature to label timecritical states. For instance, states marked as Urgent or Committed imply that no time can
be spent in these states. Besides, while exploring the model, certain states (such as states
marked as Committed), have priority over other states. These additional features make the
process of modeling intuitive and also make the model easier to read. Figure 7B provides
a simple example of timed automata. A major difference between the works (eg, works in
[58]) that use TUM as a modeling approach as compared to works (eg, work in [9]) that
use timed automata, is in the model exploration. Whereas the former use either random or
guided walks of the model to generate test cases, the later use evolutionary algorithms to
explore the model and generate test cases.
the sensors and human behavior) makes the system complex and make it challenging to
produce the required reliability assurances. However, the work of [11] has shown that an
MDP-based approach can be effectively used to test complex, real life systems in a
scalable and efficient manner.
after relationship between any two events. It is possible (and often the case) that EFGs of
mobile applications have cycles (such as the example shown in Fig. 8). Such cycles
typically do not have an explicit iteration bounds. Therefore, although an EFG has a finite
number of events, an unbounded number of event sequences can be generated from the
same. This further complicates the process of test generation, as any effective testing
technique should not only be able to generate all failure-revealing test cases, but also do so
in a reasonable amount of time.
FIGURE 8 Modern smartphones have a wide variety of I/O and power management utilities,
improper use of which in the application code can lead to suboptimal energy-consumption
behavior. Smartphone application are usually nonlinear pieces code, systematic testing of which
requires addressing a number of challenges.
The framework presented in [15] has two key innovations that helps it to tackle the
challenges described in the preceding paragraph. The first of those two innovations being
the definition of a metric that captures the energy inefficiency of the system, for a given
input. To design such a metric, it is important to understand what exactly qualifies as
energy-inefficient behavior. In other words, let us consider the following question: Does
high-energy consumption always imply higher energy-inefficiency? As it turns out [15],
the answer to this question is not trivial. For instance, consider a scenario where two
systems have similar energy-consumption behavior but one is doing more work (has a
higher utilization of its hardware components) than the other. In such a scenario, it is quite
intuitive that the system with higher utilization is the more energy-efficient one. Taking
inspiration from this observation, the work in [15] defines the metric of E/U ratio (energy
consumption vs utilization) to measure the energy inefficiency of a system. For a given
input, the framework executes the application on a real hardware device and analyses the
E/U ratio of the device at runtime. An anomalously high E/U ratio, during the execution
of the application, indicates the presence of an energy hotspot. Additionally, a consistently
high E/U ratio, after the application has completed execution, indicates the presence of an
energy bug. In general, energy bugs can cause more wastage of battery power than energy
hotspots and can drastically reduce the operational time of the smartphone. With the
metric of E/U ratio, it is possible to find energy-inefficient behavior in the SUT, for a
given input. However, another challenge is to generate inputs to stress energy behavior of
6 White-Box Abstraction
In this section, we shall discuss software testing methodologies that are carried out
directly on the implementation of an application. Such an implementation may capture the
source code, the intermediate code (after various stages of compilation) or the compiled
binary of an embedded software. Whereas we only specialize the testing procedures at the
level of abstractions they are carried out, we shall observe in the following discussion that
several methodologies (eg, evolutionary testing and symbolic execution) can be used to
test the implementation of embedded software. The idea of directly testing the
implementation is promising in the context of testing embedded software. In particular, if
the designer is interested in accurately evaluating the non-functional behaviors (eg, energy
and timing) of different software components, such non-functional behaviors are best
observed at the level of implementation. On the flip side, if a serious bug was discovered
in the implementation, it may lead to a complete redesigning of the respective application.
In general, it is important to figure out an appropriate level of abstraction to run the testing
procedure. We shall now discuss several works to test the implementation of embedded
software and reason about their implications. In particular, we discuss testing
methodologies for timing-related properties in Section 6.1 and for functionality-related
behaviors in Section 6.2. Finally, in Section 6.3, we discuss challenges to build an
appropriate framework to observe and control test executions of embedded software and
we also describe some recent efforts in the software engineering community to address
such challenges.
FIGURE 9 Interrupt latency, (A) single interrupt and (B) Nested interrupts.
The work in [17] discusses a genetic algorithm to find the maximum interrupt latency.
In particular, this work shows that a testing method based on genetic algorithm is
substantially more effective compared to random testing. This means that the interrupt
latency discovered via the genetic algorithm is substantially larger than the one discovered
using random testing. An earlier work [18] also uses genetic algorithm to find the WCET
of a program. In contrast to [17], the work in [18] focuses on the uninterrupted execution
of a single program. More specifically, the testing method, as proposed in [18], aims to
search the input space and more importantly, direct the search toward WCET revealing
inputs.
It is well known that the processing power of CPUs have increased dramatically in the
last few decades. In contrast, memory subsystems are several order of magnitudes slower
than the CPU. Such a performance gap between the CPU and memory subsystems might
be critical for embedded software, when such software are restricted via timing-related
constraints. More specifically, if the software is spending a substantial amount of time in
accessing memory, then the performance of an application may have a considerable
slowdown. In order to investigate such problems, some recent efforts in software testing
[19, 20] have explicitly targeted to discover memory bottlenecks. Such efforts directly test
the software binary to accurately determine requests to the memory subsystems. In
particular, requests to the memory subsystems might be reduced substantially by
employing a cache. Works in [19, 20] aim to exercise test inputs that lead to a poor usage
of caches. More specifically, the work in [19] aims to discover cache thrashing scenarios.
A cache thrashing scenario occurs when several memory blocks replace each other from
the cache, hence, generating a substantial number of requests to the memory subsystems.
For instance, the code fragment in Fig. 10 may exhibit a cache thrashing when the cache
can hold exactly one memory block. In the code fragment, m1 and m2 replace each other
from the cache, leading to a cache thrashing. This behavior is manifested only for the
program input t.
The work in [19] shows that the absence of such cache thrashing scenarios can be
formulated by systematically transforming the program with assertions. Subsequently, a
search procedure on the software input space can be invoked to find violation of such
assertions. Any violation of an assertion, thus, will produce a cache thrashing scenario.
The methodology proposed in [19] uses a combination of static analysis and symbolic
execution to search the input space and discover inputs that violate the formulated
assertions.
The work in [20] lifts the software testing of embedded software for massively parallel
applications, with a specific focus on general-purpose graphics processing units (GPGPU).
It is well known that future technology will be dominated by parallel architectures (eg,
multicores and GPGPUs). For such architectures, software testing should take into account
the input space of the application, as well as the non-deterministic nature of scheduling
multiple threads. The work in [20] formally defines a set of scenarios that capture memory
bottlenecks in parallel architectures. Subsequently, a search procedure is invoked to
systematically traverse the input space and the space consisting of all possible scheduling
decisions among threads. Like the approach in [19], the work in [20] also uses a
combination of static analysis and symbolic execution for the search. In summary, both the
works [19, 20] revolve around detecting fine-grained events such as memory requests. In
general, such fine-grained events are appropriate to test only at the implementation level
(eg, software binary). This is because the occurrence of such events would be significantly
difficult to predict at intermediate stages of the development.
embedded application may contain multiple tasks (eg, programs) and such tasks might be
active simultaneously. For instance, in our fall-detection application, the access to
hardware components (eg, gyroscope and accelerometers) might be controlled by a
supervisory software, such as operating systems (OS). Similarly, sampling signals from
sensors and computation of a potential fall might be accomplished by different tasks that
run simultaneously in the system. The work in [21] argues the importance of testing
interactions between different hardware/software layers and different tasks. Fig. 11
conceptually captures such interactions in a typical embedded system.
In order to exercise interactions between tasks and different software layers, authors of
[21] have described a suitable coverage criteria for testing embedded systems. For
instance, the interaction between application layer and OS layer can happen via system
calls. Similarly, the application might directly access some hardware components via a
predefined set of application programmer interfaces (APIs). The work in [21] initially
performs a static analysis to infer data dependencies across different layers of the
embedded system. Besides, if different tasks of the system use shared resources, such an
analysis also tracks the data dependencies across tasks. For instance, consider the piece of
code fragment in Fig. 12, where syscall captures a system call implemented in the kernel
mode. In the code shown in Fig. 12, there exists a data dependency between application
layer variable g and the system call syscall. As a result, it is important to exercise this
data dependency to test the interaction between application layer and OS layer. Therefore,
the work in [21] suggests to select test cases that can manifest the data dependency
between variable g and syscall. To illustrate the dependency between multiple tasks, let
us consider the code fragment in Fig. 13.
FIGURE 13 Interaction between tasks via shared resources (shared variable s).
The keyword __shared__ captures shared variables. In Fig. 13, there is a potential data
dependency between Task 1 and Task 2. However, to exercise this data dependency, the
designer must be able to select an input that satisfies the condition input == x. The
work in [21] performs static analysis to discover the data dependencies across tasks, as
shown in this example. Once all data dependencies are determined via static analysis, the
chosen test inputs aim to cover these data dependencies.
execution. Designing appropriate oracles is difficult even for traditional software testing.
In the context of embedded software, designing oracles may face additional challenges. In
particular, as embedded systems consist of many tasks and exhibit interactions across
different hardware and software layers, they may often have nondeterministic output. As a
result, oracles, which are purely based on output, are insufficient to observe faults in
embedded systems. Moreover, it is cumbersome to build output-based oracles for each test
case. In order to address these challenges, authors in [22] propose to design propertybased oracles for embedded systems. Property-based oracles are designed for each
execution platform. Therefore, any application targeting such execution platform might
reuse the oracles and thereby, it can avoid substantial manual efforts to design oracles for
each test case. The work in [22] specifically targets concurrency and synchronization
properties. For instance, test oracles are designed to specify proper usage of binary
semaphores and message queues, which are used for synchronization and interprocess
communication, respectively. Such synchronization and interprocess communication APIs
are provided by the operating system. Once test oracles are designed, a test case can be
executed, while instrumenting the application, OS and hardware interfaces simultaneously.
Each execution can subsequently be checked for violation of properties captured by an
oracle. Thus property-based test oracles can provide a clean interface to observe faulty
executions. Apart from test oracles, authors in [23] discuss the importance of giving the
designer appropriate tools that control the execution of embedded systems. Since the
execution of an embedded system is often non-deterministic, it is, in general difficult to
reproduce faulty executions. For instance, consider the fall detection application where a
task reads sensor data from a single queue. If new data arrives, an interrupt is raised to
update the queue. It is worthwhile to see the presence of a potential data race between the
routine that services the interrupt and the task which reads the queue. Unfortunately, the
arrival of interrupts is highly non-deterministic in nature. As a result, even after multiple
test executions, the testing may not reveal a faulty execution that capture a potential data
race. In order to solve this, authors in [23] design appropriate utilities that gives designer
the power to raise interrupts explicitly. For instance, the designer might choose a set of
locations where she suspects the presence of data races due to interrupts. Subsequently, a
test execution can be carried out that raise interrupts exactly at the locations specified by
the designer.
Summary
To summarize, in this section, we have seen efforts to generate test inputs and test oracles
to validate both functional and non-functional aspects of embedded software. A common
aspect of all these techniques is that the testing process is carried out directly on the
implementation. This might be appealing in certain scenarios, for instance, when the
designer is interested in events that are highly sensitive to the execution platform. Such
events include interrupts, memory requests and cache misses, among others.
7 Future Directions
As discussed in this chapter, analysis of non-functional properties is crucial to ensure that
embedded systems behave as per its specification. However, there exists an orthogonal
direction of work, where analysis of non-functional properties, such as power
consumption, memory accesses and computational latencies, have been used for securityrelated exploits. Such exploits are commonly referred to as side-channel attacks and are
designed to extract private keys2 from cryptographic algorithms, such as algorithms used
in smart cards and smart tokens. The intention of the attacker is not to discover the
theoretical weaknesses of the algorithm. Instead, the attacker aims to break the
implementation of the algorithms through side channels, such as measuring execution time
or energy consumption. In particular, the attacker tries to relate such measurements with
the secret key. For instance, if different secret keys lead to different execution time, the
attacker can perform statistical analysis to map the measured execution time with the
respective key. In general, any non-functional behavior that has a correlation with
cryptographic computation, is capable of leaking information, if not managed
appropriately. For example, the differential power attack, as proposed in [24], uses a
simple, yet effective statistical analysis technique to correlate the observed powerconsumption behavior to the private key. Since then, a number of subsequent works have
proposed counter-measures (eg, [25]) against side-channel vulnerabilities and bypasses to
those counter-measures (eg, [26]). Similarly, researchers have also studied side-channel
attacks (and their counter-measures) based on other non-functional behaviors, such as
computational latency [27, 28] and memory footprint [29]. Even though works on sidechannel attacks have a very different objective compared to those on non-functional
testing, there exists a number of commonalities. In essence, both lines of work are looking
for test inputs that lead to undesirable non-functional behavior. The definition of the
phrase undesirable non-functional behavior is based on the system under test (SUT). For
instance, in an embedded system that has hard timing-related constraints, an undesirable
input would be the violation of such constraints. On the contrary, for a cryptographic
algorithm, such as implemented in a smart card, an undesirable input may lead to
information leaks via side channels. Undesirable non-functional behavior in one scenario
may lead to performance loss, sometimes costing human lives (such as in an anti-lock
braking system), whereas, in the other scenario undesirable non-functional behavior may
cause information leaks, which, in turn may often lead to financial losses. It is needless to
motivate the fact that testing embedded cryptographic systems for such undesirable nonfunctional behaviors is crucial. More importantly, testing methodologies for detecting
side-channel attacks need to be automated. However, as of this writing, this line of
research is far from being solved. New works on this topic could draw inspiration from
earlier works on non-functional testing, such as works described in Section 3.
Another more generic direction is related to the detection of root cause and automatic
repair of non-functional properties in embedded systems. In general, the purpose of
software testing is to expose suboptimal or unwanted behavior in the SUT. Such
suboptimal behaviors, once identified, should be rectified by modifying the system. More
specifically, the rectification process can be subdivided into two parts: fault-localization3
8 Conclusion
Embedded systems are ubiquitous in the modern world. Such systems are used in a wide
variety of applications, ranging from common consumer electronic devices to automotive
and avionic applications. A property common to all embedded systems is that they interact
with the physical environment, often deriving their inputs from the surrounding
environment. Due to the application domains such systems are used in, their behavior is
often constrained by functional (such as the inputoutput relationship) as well as nonfunctional properties (such as execution time or energy consumption). This makes the
testing and validation of such systems a challenging task. In this chapter, we discussed a
few challenges and their solutions in the context of testing embedded systems. In
particular, we take a closer look into existing works on testing non-functional properties,
such as timing, energy consumption, reliability, for embedded software. To put the
existing works in perspective, we classify them in three distinct categories, based on the
level of system abstraction used for testing. These categories include, black-box, grey-box
and white-box abstraction based testing approaches. In general, black-box abstraction
based testing methods use sampling based techniques to generate failure-revealing test
cases for the system under test. Such methods consider the system as a black-box and
hence are equally applicable to simple and complex systems alike. However, such ease of
use usually comes at the cost of effectiveness. In particular, these methods often cannot
provide completeness guarantees (ie, by the time the test-generation process completes, all
failure revealing test inputs must have been uncovered). The grey-box abstraction based
approaches are usually more effective than the black-box abstraction based approaches.
This is because such methods often employ an abstract model of the system under test to
generate failure-revealing test cases. Effectiveness of these test-generation methodologies
is often dictated by the level of system abstraction being used.White-box abstraction based
testing approaches use the actual system implementation to generate failure revealing test
cases and hence are capable of providing maximum level of guarantee to discover failure
revealing inputs. We observe that existing techniques vary hugely in terms of complexity
and effectiveness. Finally, we have discussed future research directions related to
embedded software testing. One of which was automated fault-localization and repairing
of bugs related to non-functional properties. Another direction was related to the
development of secure embedded systems. In particular, we explored the possibility of
testing techniques to exploit the vulnerability toward side-channel attacks. Over the recent
years, there have been a number of works, which analyze non-functional behavior to
perform side-channel (security related) attacks. It would be appealing to see how existing
testing methodologies can be adapted to test and build secure embedded software.
Acknowledgment
The work was partially supported by a Singapore MoE Tier 2 grant MOE2013-T2-1-115
entitled Energy aware programming and the Swedish National Graduate School on
Computer Science (CUGS).
References
[1] A wearable miniaturized fall detection system for the elderly.
http://www.fallwatch-project.eu/press_release.php.
[2] Nghiem T., Sankaranarayanan S., Fainekos G., Ivanci F., Gupta A., Pappas G.J.
Monte-carlo techniques for falsification of temporal properties of non-linear
hybrid systems. In: Proceedings of the 13th ACM International Conference on
Hybrid Systems: Computation and Control, HSCC 10; 2010.
[3] Sankaranarayanan S., Fainekos G. Falsification of temporal properties of hybrid
systems using the cross-entropy method. In: Proceedings of the 15th ACM
International Conference on Hybrid Systems: Computation and Control, HSCC
12; 2012.
[4] Annapureddy Y.S.R., Fainekos G.E. Ant colonies for temporal logic falsification
of hybrid systems. In: IECON 201036th Annual Conference on IEEE Industrial
Electronics Society. 2010.
[5] Siegl S., Hielscher K., German R. Introduction of time dependencies in usage
model based testing of complex systems. In: Systems Conference, 2010 4th
Annual IEEE; 2010:622627.
[6] Siegl S., Hielscher K., German R., Berger C. Formal specification and
systematic model-driven testing of embedded automotive systems. In: 4th Annual
IEEE Systems Conference, 2010; 2011.
[7] Siegl S., Caliebe P. Improving model-based verification of embedded systems by
analyzing component dependences. In: 2011 6th IEEE International Symposium
on Industrial Embedded Systems (SIES). 2011:5154.
[8] Luchscheider P., Siegl S. Test profiling for usage models by deriving metrics
from component-dependency-models. In: 2013 8th IEEE International
Symposium on Industrial Embedded Systems (SIES). 2013:196204.
[9] Hansel J., Rose D., Herber P., Glesner S. An Evolutionary algorithm for the
generation of timed test traces for embedded real-time systems. In: 2011 IEEE
Fourth International Conference on Software Testing, Verification and Validation
(ICST); 2011.
[10] Gui L., Sun J., Liu Y., Si Y.J., Dong J.S., Wang X.Y. Combining model checking
and testing with an application to reliability prediction and distribution. In:
Proceedings of the 2013 International Symposium on Software Testing and
Analysis, ISSTA 2013; 2013.
[11] Liu Y., Gui L., Liu Y. MDP-based reliability analysis of an ambient assisted
living system. In: FM 2014: Formal Methods. Springer International Publishing;
Lecture Notes in Computer Science. 2014;vol. 8442 2014.
[12] Arcuri A., Iqbal M.Z., Briand L. Black-box system testing of real-time embedded
systems using random and search-based testing. In: Proceedings of the 22Nd IFIP
[25] Akkar M.-L., Giraud C. An implementation of DES and AES, secure against
some attacks. In: Proceedings of the Third International Workshop on
Cryptographic Hardware and Embedded Systems, CHES 01; 2001.
[26] Mangard S., Pramstaller N., Oswald E. Successfully attacking masked AES
hardware implementations. In: Cryptographic Hardware and Embedded Systems,
CHES 2005, Lecture Notes in Computer Science. 2005.
[27] P. Kocher, Timing attacks on implementations of diffe-hellman, RSA, DSS, and
other systems. http://www.cryptography.com/public/pdf/TimingAttacks.pdf.
[28] Kpf B., Mauborgne L., Ochoa M. Automatic quantification of cache sidechannels. In: Proceedings of the 24th International Conference on Computer
Aided Verification, CAV12, Berkeley, CA; Berlin: Springer-Verlag; 2012:978-3642-31423-0564580. doi:10.1007/978-3-642-31424-7_40.
[29] Jana S., Shmatikov V. Memento: learning secrets from process footprints. In:
Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP 12;
Washington, DC: IEEE Computer Society; 2012:978-0-7695-4681-0143157.
doi:10.1109/SP.2012.19.
[30] Jones J.A., Harrold M.J. Empirical evaluation of the tarantula automatic faultlocalization technique. In: Proceedings of the 20th IEEE/ACM International
Conference on Automated Software Engineering, ASE 05, Long Beach, CA,
USA; New York, NY: ACM; 2005:1-58113-993-4273282.
doi:10.1145/1101908.1101949.
Otherwise, the testing process should terminate with assurance that the system functionality is expected under all
feasible circumstances.
2
Cryptographic algorithms such as AES and DES are used to encrypt a message in a manner such that only the person
having the private key is capable of decrypting the message.
3
In this context, the word fault implies all type of suboptimal, non-functional behavior.
CHAPTER FOUR
Abstract
As web applications increase in popularity, complexity, and size, approaches and tools to automate testing the
correctness of web applications must continually evolve. In this chapter, we provide a broad background on web
applications and the challenges in testing these distributed, dynamic applications made up of heterogeneous
components. We then focus on the recent advances in web application testing that were published between 2010
and 2014, including work on test-case generation, oracles, testing evaluation, and regression testing. Through this
targeted survey, we identify trends in web application testing and open problems that still need to be addressed.
Keywords
web applications; Software testing; Web testing; Test case generation; Oracles; Test effectiveness;
Regression testing
1 Introduction
When you do just about anything on the web through a web browser, you are likely
interacting with a web application. Web applications are applications accessible through
the web that dynamically generate web pages, often based on user interactions, the
applications data, or other information (eg, current time and the users location). Web
applications are one of the most common ways that people use to interact with other
people (eg, Wordpress, Facebook, Twitter) or businesses (eg, bank accounts, travel,
shopping). Web applications are ideal for such interactions because they are available 24
hours a day to anyone with internet access and a web browser. Maintaining web
applications is simpler for both businesses and clients: since the web application code
resides on the web application server, changes to the application can be updated in one
location and all users see the changes, without needing special software installed on each
clients computer.
To maintain the high reliability required of web applications, we must develop effective
testing strategies to identify problems in web applications. While there have been
advances in web application testing, there are still many open problems in this
nontraditional domain [1]. However, the dynamic, distributed nature of web applications
makes testing difficult.
While previous survey papers focused on broader time periods [2, 3] or on specific
subfields [47], we will focus on web application testing approaches for correctness
published between 2010 and 2014. With the number of publications increasing and
researchers abilities to focus on each publication decreasing [8], such focused surveys are
increasingly important.
In this chapter, we describe web application architecture, technologies, and
characteristics in Section 2. Section 3 presents the challenges, common research questions,
and approaches to testing web applications. In Section 4, we present the state of the art in
web application testing, including a distant reading of the papers we covered. We conclude
in Section 5 with the open questions in web application testing.
2 Web Applications
Web applications are an example of a distributed systemspecifically, a client/server
architecture, where the clients are web browsers and the servers are the web application
servers. Fig. 1 shows the simplest, three-tiered version of the web application architecture.
The web application server could be implemented as multiple, load-balanced servers
handling requests from many clients. Similarly, the data store tier could also be
implemented on multiple machines, thus lending to an n-tier architecture. The application
data store could be maintained in databases, the file system, and external services.
The browsers and servers communicate via the HTTP protocol [9], a stateless protocol,
meaning that each request is independent of other requests. Human users make requests
using a client browser, eg, Google Chrome, Mozilla Firefox, Microsofts Internet Explorer,
Apples Safari, and Opera to the server, eg, Apache [10], Apache Tomcat [11], IBMs
WebSphere [12], and Google App Engine [13].
A simplified HTTP request is shown in Fig. 2. A request has a request type, typically
either GET or POST, a resource (the R in URL), and optional parameter name/value
pairs. The parameters are the data inputs to the web application. A request may also
include cookies [14], which contain data that is passed between the browser and the server
to maintain state for the session.
The web application server processes the request, serving up the requested resource,
based on the inputs. The servers responsetypically an HTML [15] documentis
rendered by the browser for the user.
A web application is typically implemented using a variety of programming languages.
HTML is the standard markup language used to create web pages. The HTML document
often references Cascading Style Sheets (CSS) [16] that define the presentation, style, and
layout of the document. Some HTML documents also include scriptingmost commonly
JavaScriptto allow dynamic user interaction with the web page. Ajaxasynchronous
JavaScript and XML [17]is a set of technologies that allow developers to update parts of
the web page through communication with the web application server without updating
the whole page. The result of using Ajax is that the users experience is more like using a
desktop application. JavaScript libraries (eg, jQuery1) and frameworks (eg, AngularJS2,
Bootstrap3) have been developed to (1) improve the responsiveness of web sites and
applications for the variety of devices on which web sites are viewed, (2) provide crossbrowser compatibility, and (3) allow faster development of dynamic user experiences on
the web.
Many different programming languages can be used to implement the server side of a
web application to generate web responses. According to a survey by Web Technology
Surveys [18], PHP, ASP.NET, and Java are the most common server-side programming
languages for the top 10 million sites where the programming language is known. While
the application server used depends on the programming language used to implement the
web application, the client browser is independent of the servers programming language
choice. More recent development in web applications are to use web services such as
RESTful APIs.
Web applications heterogenous environment in terms of languages, architectures,
components, and platforms gives rise to a number of testing challenges, which we discuss
in detail in the next section.
A test case is made up of input to the web application and the expected output from the
web application. Sometimes, state information is also included in a test case because web
application test cases, in particular, may not be independent of each other and the
underlying session/database state. The inputs and expected outputs depend on the part of
the web application being tested. For example, if server-side code is being tested, the input
is likely an HTTP request and the output is likely the HTTP response, typically an HTML
document, as well as other relevant outputs.
Academic and industry researchers and commercial tools have proposed and evaluated
different forms of these artifacts. For example, the popular web testing tool Selenium IDE
[19] uses a test case format of steps that a user performs on a web site, stored in tabular
form, written in a domain specific language called Selenese. The Selenium test case also
contains the expected output and the oracles are in the form of assertions. Researchers in
the domain of capture-replay testing called user-session-based testing have defined a test
case as a sequence of user actions stored as text files with HTTP requests or using an
XML notation.
To address this open area of web application test case generation, researchers have
asked the following research question:
Research Question 1: How should testers define, model, and generate test cases
(inputs and expected outputs), such that, when executed, the test cases will cover a
large portion of the underlying application code and will expose faults in web
applications?
In Section 4.1, we review the advances in research for test case generation for web
application testing that broadly answer the above question.
Researchers have also defined several oracle comparators based on different formats of
expected output, such as the entire HTML page, or only the structure of the HTML page in
the form of a sequence of HTML tags, etc. Assertions have also been used as oracles in
the literature. Developing oracles is a difficult problem, especially the subareas of
automating the fault detection process and defining oracle comparators that are effective at
identifying faulty executions. To this end, researchers have studied the following research
question:
Research Question 2: Given a test case, how can a tester determine automatically if
an application fails?
In Section 4.2, we review advances in the development of test oracles.
The notion of when to stop testing is one that is often discussed in the software testing
industry. Adequacy criteria are used to determine when to stop testing, as well as to
evaluate the thoroughness of the testing conducted so far. This is an area of research that is
still growing for web applications, with most researchers using the traditional adequacy
criteria of statement, branch, and condition coverage. Researchers are beginning to ask the
question:
Research Question 3: What techniques and criteria can be developed and evaluated
to determine thoroughness of a web application test suite?
To address this question, in our literature search, we found that research has focused
on developing adequacy criteria (Section 4.3.1), operators and techniques for
mutation testing (Section 4.3.2), and fault severity classification (Section 4.3.3).
Another aspect of web application testing refers to the testing and maintenance of
applications as the application evolves and new versions of the web application are
created. Here, additional challenges arise, such as creating new test cases for testing the
new and changed parts of the application, repairing, and reusing test cases from previous
versions of the system, as well as, managing the size of a regression test suite to maintain
high effectiveness. In this subdomain, researchers have addressed the question:
Research Question 4: How can a tester create test cases for changed/new parts of
the code as well as maintain an existing large regression test suite?
We elaborate on several different ways in which this question is addressed in the
literature in Section 4.4.
representing dynamic pages. The ASM is made up of two components: (1) the unit-level
model, the component interaction model (CIM) and (2) the system-level model, the
application transition graph (ATG). Thummala and Offutt [24] implemented the ASM in a
tool called WASP (Web Atomic Section Project) and evaluated the effectiveness of the
model.
Chen et al. [25] propose modeling a users navigation of a web application, specifically
the browser interactions and so-called Advanced Navigations, ie, requested navigations
whose responses depend on the users or applications state or history. The authors model
the page navigations with an augmented finite state machine (FSM). To generate test
cases, the authors suggest traversing the model with modifications to handle cycles in the
navigations and generating finite sequences.
Torsel [26] model the web application as a directed graph. In addition, they propose to
capture variables of basic types, such as String, in two scopes, permanent and session, and
also a variable type to hold data from external sources, like a database. For test case
generation, they conduct a breadth-first search to explore the directed graph to identify
logical dependencies between navigation paths and build a dependency graph. From the
dependency graph, paths are selected for execution as test cases. They also provide some
annotation in their model that can serve as test oracles.
Song et al. [27] propose the use of a finite state automaton to model the server-side
interactions and user interactions in a web application. They use the notion of synchronous
product of client and server-side FSM models to build the model incrementally. Depthfirst traversal of the FSM is used to generate test cases.
Enderlin et al. [28] extend contract-based testing using grammars to generate test cases,
test data, and oracles for testing PHP applications. In prior work [29], the authors
developed the notion of contracts with realistic domains, which are used to represent all
kinds of data and to assign domains to test data for testing PHP applications. In prior
work, they developed the regular expression domain to describe simple textual data and in
this work, they develop the grammar domain to describe complex textual data. To
generate test cases, they first compute test data from contracts using three algorithms for
the grammar domain (a) uniform random, (b) bounded exhaustive, and (c) rule coverage
based, then they run the test cases, and use runtime assertion checking of the contract as
oracles.
WEBMATE is a tool built on top of Selenium [19] by Dallmeier et al. [30]. The tool
considers the server side of the web application as a black box, focusing on the browser
(HTML/CSS/JavaScript) interface. By exploring the applications interface, WEBMATE
creates a usage model, essentially a finite state automaton. To explore forms, the tool
applies heuristics to find values for input fields. The authors performed an experimental
study to show how the tool improves coverage over a traditional web crawler and present
cross-browser compatibility testing as an application of the tool. The tool was an academic
prototype but has been further developed into a commercially available tool [31, 32].
Schur et al. [33, 34] present an approach implemented in a tool, ProCrawl, to crawl the
user interface of a web application and observe behavior of the application to create a
behavior model, and then generate and execute test cases to cover the unobserved
behavior. The behavior model is a finite state automaton (FSA) where the nodes represent
states and the transitions represent actions that user performed to change the state. Their
tool can handle multiple simultaneous users accessing the web system. By using a graph
walk algorithm for path generation, they generate test cases as Selenium scripts to test the
web system.
Artzi et al. [40] expand on their previous work [41] and propose generating test suites
using a combination of static and dynamic analysis. Specifically, the authors propose
combining concrete and symbolic (concolic) and constraint solving to automatically and
dynamically discover input values for test cases. The authors focus on two types of web
application failures: (1) runtime crashes or warnings and (2) invalid/malformed HTML
documents. Thus, their approach includes an HTML validator to detect failures. Their
technique involves generating a control-flow predicate based on a given input (perhaps
empty), modifying the predicate to yield a different control-flow path, and determining the
input that will yield this new path, thus creating a new test case. In addition, the approach
maintains shared session state. The authors implement their approach for PHP applications
the most common server-side scripting languagein a tool called Apollo.
Statistical fault localization is an approach to finding the cause of faults in code by
executing test cases and then determining which executed code elements correlate with the
most failed test cases. A limitation to using statistical testing is that a large test suite must
be available. Artzi et al. [42] address this limitation by proposing concolic techniques to
generate test suites designed for fault localization. The techniques include six variations
on the Tarantula [43, 44] algorithm, which is used to predict statements that are most
likely to cause failures based on failed test cases, combined with the authors proposed
enhanced domains and output mapping. The enhanced domain for conditional statements
allows more accurate localization of errors caused by missing branches. The output
mapping maps program statements to output fragments they generate, which can then be
used to help localize the fault. The authors implemented their techniques in Apollo, a tool
for PHP applications, that automatically finds and localizes malformed HTML errors [41].
The same authors [45] explore the tradeoffs between generated test suites size and
localization effectiveness. The authors propose techniques to direct generation of new test
cases that are similar to failed test cases using various similarity criteria. Their hypothesis
is that similar, failed test cases will be better able to localize failure-causing statements.
The approach was implemented in Apollo [41]. The authors found that using pathconstraint similarity generated a smaller test suite size with the best fault localization.
In addition, the authors combined and expanded on their fault localization in a 2012
journal paper [46]. Beyond their previous variations on Tarantula [42], the authors also
enhanced the fault localization techniques of Ochiai [47] and Jaccard [48] using the
enhanced domain for conditional statements and a source mappingthe renamed output
mapping. The authors implemented the new techniques in Apollo [41] and evaluated the
techniques in a large experimental study. An enhanced version of Ochiai and the pathconstraint similarity-based generation yielded the best results for fault localization
effectiveness.
Matos and Sousa [49] propose using use cases and formal requirements to create
Selenium functional test cases and web pages which are user interface prototypes. The
inputs to their tool are use cases, system glossary and user interface specifications writing
in a controlled natural language that the tool can interpret. Their tool is implemented as an
Eclipse plugin.
Thummalapenta et al. [50] present a new technique to automatically generate test cases
by focusing only on interesting behaviors as defined by business rules, which are a form
of functional specification used in the industry. Formally, a business rule is triple
consisting of an antecedent, a consequent and a set of invariant conditions. In their
technique, they first build an abstract state-transition diagram (STD), where the nodes
represent equivalent states, as they crawl the applications GUI. In the next step of their
technique, for each business rule, they identify abstract paths relevant to the business rule
and refine the paths using a stricter notion of state equivalence until a traversable path is
found. This final set of paths are the test cases that are executed to test the application,
which also cover all the initially identified business rules. Assertion checking on the
consequent condition of the business rule serves as an oracle as well. They implemented
their approach in a tool called WATEG, Web Application Test Case Generator.
suite. The authors implemented their approach for PHP applications in a tool called SART
(State Aware Regeneration Tool).
4.2 Oracles
Determining whether a test case passes or fails is a difficult problem, especially for web
applications that have a variety of outputs (eg, web pages, data stores, email messages)
that are sometimes dynamically generated or nondeterministic. Recent advances have
focused on the web page outputsnot simply malformed HTML but more nuanced
failures manifested in the pages.
Dobloyi et al. [58] present techniques to automatically compare test case outputs, ie,
XML/HTML documents, during regression testing. Their approachimplemented in the
tool Smartis based on a model that exploits similarities in how web applications fail. The
authors first inspect a small portion of regression testing output manually and identify
structural and syntactic features in the tree structures of XML documents that indicate
differences that humans should investigate for failure. The features are then used to train a
comparator to apply a threshold to identify which output and test cases require human
inspection. Since the approach focuses on the tree-structured output, the approach misses
faults involving images and the presentation of HTML elements.
de Castro et al. [59] present an extension to Selenium RC, called SeleniumDB, that
allows for testing web applications that interact with databases, such as MySQL and
PostGreSQL. Specifically, their tool allows establishing database connections and
comparing test data with data stored in the database. They augmented Selenium RCs core
with six new assert functions that allow for comparing data outputted during testing with
data that exists in the database of the application. For example, an assert statement that
was added to the Selenium RC code checks for the last record inserted in the database.
Mahajan and Halfond developed techniques for finding HTML presentation failures
using image comparison techniques [60] combined with searching techniques to identify
the failures root cause [61]. Mahajan and Halfonds first take on finding HTML
presentation failures [60] leverages image comparison techniquescomparing an image
of the expected web page and a screenshot of the actual webpage, finding the pixel-level
differences between the two images, and mapping the pixel-level differences to which
HTML elements are likely to be the cause of the failure. Mahajan and Halfonds follow up
work [60] improved finding a failures root cause automatically using a search-based
technique, where possible faulty elements are permuted to a possibly correct value and the
resulting page is compared with the expected image. If the resulting page matches the
expected image, then the permuted faulty element is the likely root cause. Both papers
contain an experimental evaluation of their approach on web pages in wide use and
indicate promising results.
The authors suggest that future work includes customizing the search space based on
the root cause, handling multiple failures within one page, and handling when the fault
does not visually appear within the faulty HTML element. The authors have two follow-up
papers to appear in 2015 [62, 63], out of the scope of this paper and after our submission
deadline.
Tappenden and Miller [53] also proposed oracles that do not depend on their cookiebased testing methodology. The authors assert that their proposed structural DOM
similarity metrics (similar to Dobolyi and Weimers [58]), HTML content similarity
metrics, and a hybrid of the two metrics was critical to complete their cookie-based
automated testing.
of this criterion is to classify the different ways in which the output of HTML web pages
can be unique, either unique in entire content, or unique in tags structure, or unique in
content, or unique in tags and attributes. Then, they derive new test cases by mutating
each test case in an original test suite, executing the new mutated test case, and
determining if the new test case produces a unique output as per one of the earlier four
definitions.
Sakamoto et al. [66] address the integration testing problem for web applications. Their
contributions include proposing a new coverage criterion, template variable coverage
criterion, and presenting a technique to generate skeleton test code to improve the
template variable coverage criterion. Templates refer to a templating system used on the
client or server-side that helps the development of reusable HTML elements4. Template
engines replace these template variables with actual values to create HTML pages. The
template variable coverage criterion measures coverage of variables and expressions that
are embedded in HTML templates. Their work is implemented in a tool called POGen.
The tool consists of the following components: HTML template analyzer, HTML template
transformer, and a test code generator.
severity study of 17 subject applications, the authors conclude that traditional fault
seeding, which is often used to evaluate the effectiveness of testing techniques, does not
generate uniformly severe faults in terms of consumer perception and is therefore not
necessarily a good technique for evaluating testing techniques.
test cases of web applications. The tool, developed in Java, consists of two main
components. The first component is a logging module that they developed for Apache web
server that collects all pertinent usage information that is needed to create user-sessionbased test cases. The second component is the main CPUT tool that allows import of
Apache web server logs, which can then be converted into user-session-based test cases in
XML format. CPUT stores the imported log file and the test cases in a PostgreSQL
database. CPUT allows for importing new log files, appending to an existing log file that
has previously been imported, and an overwrite previously imported log file capability.
The test cases can then be prioritized or reduced by several experimentally verified
criteria, such as combinatorial, length-based, and frequency-based criteria [70]. CPUT also
displays some statistics about each user-session-based test case, such as the number of
requests in the test case, the number of parameter values. The tool writes the
prioritized/reduced test suite to a text file which can be used by testers to identify tests
cases to execute during their maintenance testing cycles.
Garg et al. [73] present a two-level approach to prioritize web application test cases.
They build a functional dependence graph (FDG), generated from a UML diagram of the
application, to model functional dependencies between modules in a web application. The
authors also create an interprocedural control graph (ICG) from the source code for each
functional module in the FDG. The test suite for the application is partitioned into
different test sets that can be tied to a functional module/node in the FDG. Test cases
within a test set are tied to submodules modeled in the ICG. The two-level approach to
prioritization proposed in this paper is based on criteria that rely on first assigning
priorities to modules in the FDG and then to submodules in the ICG. Modules in the FDG
are prioritized based on new and modified functionalities that are added or modified in the
FDG, eg, a newly introduced node in the FDG represents newly added functionality and
thus gets highest priority, modified FDG nodes get next higher priority, etc. Modules
within the ICG are prioritized based on the degree of modification of the node in the ICG
which is determined by the change in number of lines of code, eg, a new ICG node gets
the highest priority, remaining nodes are assigned priorities in decreasing order of degree
of modification. Finally, test cases are prioritized using criteria in increasing and
decreasing order of (a) distance of modified nodes from the root FDG node, (b) number of
functional modules executed, and (c) number of changes (as identified in ICG nodes)
executed. A small experimental study reveals that prioritizing tests based on the shortest
path of modified nodes from the root FDG node has the highest APFD [71] in the first
10% of the test suite. In another work [74], the same authors propose distributing the test
sets on multiple machines and executing them in parallel to reduce test execution time.
Each machine is approximately allocated an equal number of functional modules and also
equal priorities of functional modules.
Garg et al. [75] also proposed a new automated test case prioritization technique that
automatically identifies changes in the database and prioritizes test cases such that
database faults may be detected early. They use a functional dependence graph and a
schema diagram which models the relationship between the database tables and fields in
their prioritization approach. Using log files that capture details of the database, they
identify the FDG modules that are modified as a result of database changes (as captured in
the log files) and assign priorities to the modules, eg, assigning higher priority to FDG
modules whose modifications are due to new tables in the schema diagram. A small
experimental evaluation showed that their approach is able to detect 70% of the seeded
database faults by executing 10% of the test suite.
Sampath et al. [76] propose ordering reduced suites to further increase the effectiveness
of test suite reduction strategies. Test suite reduction is a regression testing strategy to
reduce the size of the test suite that is executed by using several criteria that will allow the
selection of a smaller set of test cases that are comparable in effectiveness to the entire
original suite. Test case prioritization, on the other hand, strives to keep all the test cases in
the original suite, but proposes that they be ordered based on some criteria such that the
ordered test suite can find faults early in the test execution cycle. In their work, Sampath
et al. [76] first reduce test suites using reduction criteria [77] that select a smaller test set
based on characteristics like the actual base requests covered in a test case, the actual
parameter names that are covered in a test case, etc., with the goal of creating a reduced
set of test cases that cover all base requests of the web application, and all parameter
names in the web application, respectively. They, then prioritize the reduced suites by then
applying prioritization criteria (specifically, count-, interaction-, and frequency-based
criteria) that are shown to be effective in related work [70]. This approach led to the
creation of 40 criteria that order reduced suites which are empirically evaluated by
Sampath et al. using three web applications, seeded faults and user-session-based test
cases. Another contribution in this work is the development of a new metric that can be
used to compare test suites of unequal lengths. The common metrics used to evaluate the
effectiveness of prioritized test suites, APFD [71] and APFD_C [78] require that the
compared test suites be of the same size. However, the reduced suites compared in
Sampath et al.s work were of varying sizes and thus could not be compared using the
traditional effectiveness metrics. Therefore, they developed a new effectiveness metric
Mod_APFD_C that allows the effectiveness evaluation of prioritization effectiveness of
test suites of unequal lengths. The new metric takes into account the number of unique
faults detected, the time to generate the ordered reduced suite, and the time to execute the
ordered reduced suite. Through their empirical study, they find that in several cases, the
ordered reduced suites are more effective than a pure reduced suite and a pure prioritized
suite, thus lending evidence to the creation of a promising new approach to regression
testing.
Dobuneh [79] et al. propose and evaluate a hybrid prioritization criterion for testing
web applications. Their hybrid criterion first orders test cases by number of common
HTTP requests in the test case, then, orders test cases based on the length of HTTP request
chains, and finally, by the dependency of HTTP requests. In their experimental study with
one subject application, they find that the hybrid criterion finds all the seeded faults sooner
than the first and second criteria used in the hybrid, but is comparable to the third criterion
on dependency of HTTP requests. Similar results are observed when evaluating the time
taken by the hybrid and the individual criteria to generate the prioritized test suites.
regression test paths in the new version. Through an experimental study, they find that a
large number of variable constraints can be reused from previous versions. The central
idea in their approach is to compare the definitions and uses of variables between previous
and current versions of the application to determine if the same constraints on variables
can be used.
developed in prior work to create test cases. As the application undergoes changes and the
model changes, they classify test cases as reusable, obsolete, and re-testable test cases.
Then, they evaluate the cost-benefit tradeoffs of applying brute force regression testing
and selective regression testing by quantifying the costs involved in each case. Further,
they propose two assumptions, first, that the cost of executing and validating a test case is
proportional to the length of the test case or the number of inputs on each edge, and
second, the cost of classification (as obsolete, reusable, or re-testable) is proportional to
the size of the test suite. They conduct a case study comparing the two regression testing
approaches and discuss a decision making process for practitioners.
Hirzel et al. [92] adapt an existing selective regression testing technique developed for
Java applications to work in the context of Google Web Kit compiler, which converts Java
code into JavaScript. The idea is to select test cases that execute changed parts of the code.
Web test cases execute JavaScript code but the code that is changed is the Java code.
Therefore, when the Java code undergoes changes, the changes need to be traced back to
JavaScript which is difficult to accomplish because of code obfuscation. Also, additional
code originating from libraries and dynamic typing make the trackback to Java difficult.
They first build control flow graphs of the two Java application versions and compare
them. Since the test cases are executed on the JavaScript code and not the Java code, they
need a mapping between the test cases and the Java code. To establish a mapping, they
instrument the test cases by introducing a code identifier (CID) for methods, statements or
expressions which can be traced from the JavaScript to the Java code for that code entity.
After comparing the old and new versions of the application, test cases are selected for
reexecution if they touch at least one changed Java code entity as identified by the CID.
Their technique is implemented as an Eclipse plugin and they empirically evaluate their
approach.
words specific to this corpus, alphabetically: acm, application, applications, case, cases,
conference, fig, figure, googletag, ieee, international, paper, proceedings, pubad, public,
section, software, standard, table, target, test, testing, tests, and web. These text analyses
are not perfect in that they rely on extracting the text from the PDF files. Our spot
checking of the text extraction showed that most text extraction was performed correctly;
however, there are some issues with special characters, such as dashes, quotes, and
ligatures, like ff. The analyses also do not weight words differently, for example, words
found in titles and footnotes are counted the same. Despite these limitations, we believe
these distant readings allow us to see trends in the research.
The word cloud in Fig. 4 shows the proportional prevalence of words in the corpus (in
our case, the surveyed papers), where the most common words are in the largest font. As
expected, the most common words are related to web applications, eg, user, HTML, pages,
request/s, sessions, server, database, browser, form, http, parameter; code, eg, code,
program, source, expressions, statements, line/s, variable/s; and testing, eg, values, suite/s,
fault/s, failure/s, coverage.
FIGURE 4 A word cloud of the most common words in the papers studied.
Distinct characteristic of web applications are reflected in the word cloud, such as
navigation (sequence/s, transition/s, control) and user interaction (users, GUI,
interactions). In addition, we see words like PHP and Java, which are the languages used
to implement the web application. In our survey, we did not find work that developed
techniques specific to other language/frameworks, such as ASP or Ruby on Rails. Of the
words that imply a common web application functionality (eg, search and order), login is
distinct because it is unlikely to have meanings other than the web application
functionality (as opposed to search-based testing or in order to statements). Logins
prominence seems to imply that it is important functionality that researchers must address.
Words like model/s, technique/s, approach/es, and algorithm as well as result/s and, to a
lesser extent, evaluation words (eg, study, percentage, and average) are all prominent in
the cloud, implying that researchers developed novel approaches and evaluated those
approaches. The prominence of tool/s and automate implies that researchers implemented
their approaches in automated tools. Our survey supports both of these observations.
With respect to words related to the testing research focus, generation, regression,
prioritization, reduction, and localization are all prominent in the word cloud. On the other
hand, the word oracle is much smaller. Again, our survey reflects the relative weights of
these topics.
While the word cloud presents individual words, Fig. 5 presents the proportional
prevalence of 40 topics in the papers (y-axis) by their publication date (x-axis).7 Despite
imperfections in the analysis (eg, two papers published in 2010 are represented in 2009),
the visualization helps to show the research trends. The figure highlights eight of the most
common topics. We chose not to highlight topics that represent the documents metadata
or topics that represent components of web applications (eg, {user, server, request} or
{function, document, http}). Despite being more prevalent, these topics are listed after the
highlighted topics.
FIGURE 5 A topic model of the most common groupings of words in the papers studied.
In general, the visualization supports our claim that the papers emphasize approaches
and techniques (eg, {fault, sever, model}, {user, model, session}, {user, requir, tool},
{page, navig, model}) over empirical studies. User tools (fourth from bottom) and user
modeling (third bar from the bottom) have remained fairly constant in prevalence over the
time period perhaps an indication of how important users are, with respect to testing and
use of web applications.
While fault localization (bottom bar) has remained relatively constant in prevalence
throughout the time period, other fault-related topics are on the decline. Work on fault
severity models (second bar from top) is sharply declining in prevalence, yet is still an
open problem and could be a direction for future work. The fourth bar from the top
{generat, fault, execut} is also surprisingly on the decline in prevalence; perhaps the
recent increase in prevalence of static analysis techniques (eg, the slight increase in
{model, state, behavior}, top bar, despite not including many papers on static analysis
techniques in our survey) explains the decline.
The topic of test suite selection (represented by {suit, priorit, reduc}, third from bottom)
is on an upwards trend. As application sizesand therefore their test suitescontinue to
grow, it is crucial that testers can test in more cost-effective ways, and thus it is important
that research addresses this important problem.
While most of the topics (eg, fault localization) seem to have constant prevalence
throughout the time period, {user, session, model} and {fault, sever, model} are less
prevalent more recently. Research on traditional user session-based testing{user, model,
session) in the bottom barseems to be on the decline, while {session, user, request} is
not. In our search for relevant papers, we found work on JavaScript-based capture-replay,
which is also user session based. We do not include this work in our survey or in this
analysis because it was out of scope and, therefore, is not reflected in the figure. Work on
severity models is lacking and could be a direction for future work.
5 Conclusion
As web applications continue to grow in complexity and in popularity, testing their
correctness will continue to deserve the attention of researchers. In this chapter, we studied
and reported on the advances in web application testing literature between the years of
2010 and 2014. We found that research in this period broadly could be categorized in the
areas of test case generation, oracle development, criteria, and approaches to evaluate
effectiveness of a test suite, and regression testing approaches.
In this section, we summarize some open areas of research that we see trending in the
next few years based on our analysis of the past 4 years of research.
We found that several tools and techniques developed in the literature are built on top of
Selenium [19], either as an extension to Selenium RC, or built on top of WebDriver.
The end goal of incorporating with Selenium is to create test cases that Selenium can
execute. The research community recognizes the prevalent use of Selenium in the
industry and thus has focused on developing add-ons to attract the target audience that
already uses Selenium in their testing process. This leads us to believe that Selenium is
a web testing tool that will become increasingly popular in academia and industry in the
future.
Another trend, or lack thereof, that we observed is the limited research in the area of
developing adequacy criteria for web applications. Several researchers focused on fault
detection as an evaluation measure or on using traditional coverage criteria of
statement, method, and branch coverage. With the different types of web languages
used today, the development and use of frameworks such as Rails and the popularity of
JavaScript in all aspects of web development, there might be scope for development of
new coverage criteria that target unique characteristics of web applications. In
companies where systematic testing is followed, adequacy criteria tend to be the most
common method to determine when to stop testing. Thus, advancements in this domain
and practical applicability of proposed criteria could find widespread acceptance.
In terms of evaluating effectiveness of testing techniques, we found very limited research
in the area of developing fault severity classifications. Fault severity is a commonly
used metric in the industry to prioritize testing and development efforts. Advances in
developing solid, empirically evaluated fault severity criteria for web application
testing, that addresses fault severity from the point of view of multiple stake holders
could be of significant importance in both academic and industry circles.
In the domain of test case generation for traditional applications, we find an increasing
use of search-based algorithms. Applying search-based algorithms to test case
generation web domain is only recently gaining importance. This could be an area of
research that could see growth in the near future, as scalable test case generation
continues to be a challenge in the web application testing domain.
Another trend that we observed from our analysis is that researchers tend to focus on
developing new approaches and techniques to test web applications, whether in the area
of test generation or oracle development, etc. Though most researchers include a strong
empirical component that evaluates the effectiveness of their approach, there is a lack of
empirical studies that compare and contrast the various approaches to web testing, lack
of studies that replicate existing studies and results, and a lack of surveys and
qualitative studies designed to understand testing practices in small/large companies,
etc. Empirical research can serve as a strong foundation for identifying new research
problems that the community can address in the future. We believe more empirical
research is an area the web testing community can focus on and benefit from in the near
future.
Much of the work surveyed is focused on individual components on the server-side or
the client-side to great results. Several authors mention that future work involves
combining their work with others work. We can envision many possible fruitful
collaborations. For example, the concolic test case generation work in Section 4.1.3
may be able to be combined with the oracle work that finds differences in HTML from
Section 4.2 to yield even better fault localization.
Finally, as a whole genre of applications and services are moving to the cloud, we
believe that testing techniques that can scale to the cloud [94] and the notion of offering
web testing as a service in the cloud are areas of research that could gain prominence in
the coming years.
References
[1] Orso A., Rothermel G. Software testing: a research travelogue (2000-2014). In:
Proceedings of the Future of Software Engineering, FOSE 2014, New York, NY,
USA; ACM; 2014:978-1-4503-2865-4117132. doi:10.1145/2593882.2593885
00005.
[2] Garousi V., Mesbah A., Betin-Can A., Mirshokraie S. A systematic mapping
study of web application testing. Inf. Softw. Technol. 0950-58492013.
;55(8):13741396. doi:10.1016/j.infsof.2013.02.006.
http://www.sciencedirect.com/science/article/pii/S0950584913000396 00015.
[3] Li Y.-F., Das P.K., Dowe D.L. Two decades of web application testinga survey
of recent advances. Inf. Syst. 0306-43792014. ;43:2054.
doi:10.1016/j.is.2014.02.001.
http://www.sciencedirect.com/science/article/pii/S0306437914000271 00003.
[4] Alalfi M.H., Cordy J.R., Dean T.R. Modelling methods for web application
verification and testing: state of the art. Softw. Test. Verif. Reliab. 1099-16892009.
;19(4):265296. doi:10.1002/stvr.401.
http://onlinelibrary.wiley.com/doi/10.1002/stvr.401/abstract , 00064,
[5] Li X., Xue Y. A survey on server-side approaches to securing web applications.
ACM Comput. Surv. 0360-03002014;46(4):54:154:29. doi:10.1145/2541315
00006.
[6] Mesbah A. Chapter fiveadvances in testing JavaScript-based web applications.
In: Memon A.M., ed. Advances in Computers. Elsevier; 201235. 2015. ;vol. 97.
http://www.sciencedirect.com/science/article/pii/S0065245814000114 00000.
[7] Sampath S. Chapter 3advances in user-session-based testing of web
applications. In: Hurson A., Memon A., eds. Advances in Computers. Elsevier;
87108. 2012. ;vol. 86.
http://www.sciencedirect.com/science/article/pii/B978012396535600003X
00000.
[8] Parolo P.D.B., Pan R.K., Ghosh R., Huberman B.A., Kaski K., Fortunato S.
Attention decay in science. arXiv:1503.01881 [physics]. 2015.
http://arxiv.org/abs/1503.01881 arXiv: 1503.01881.
[9] W3C. HTTPHyperText Transfer Protocol Overview. 2015.
http://www.w3.org/Protocols/.
[10] Apache. Apache HTTP server project. 2015. http://httpd.apache.org/.
[11] Apache. Apache Tomcat. 2015. http://tomcat.apache.org/.
[12] IBM. WebSphere application server. 2015. http://www03.ibm.com/software/products/en/appserv-was.
[13] Google. App enginerun your applications on a fully-managed Platform-as-a-
doi:10.1109/TASE.2010.25 00003.
[26] Torsel A. Automated test case generation for web applications from a domain
specific model. In: 2011 IEEE 35th Annual Computer Software and Applications
Conference Workshops (COMPSACW); 2011:137142.
doi:10.1109/COMPSACW.2011.32.
[27] Song B., Gong S., Chen S. Model composition and generating tests for web
applications. In: 2011 Seventh International Conference on Computational
Intelligence and Security (CIS); 2011:568572. doi:10.1109/CIS.2011.131 00001.
[28] Enderlin I., Dadeau F., Giorgetti A., Bouquet F. Grammar-based testing using
realistic domains in PHP. In: 2012 IEEE Fifth International Conference on
Software Testing, Verification and Validation (ICST); 2012:509518.
doi:10.1109/ICST.2012.136 00003.
[29] Enderlin I., Dadeau F., Giorgetti A., Othman A.B. Praspel: a specification
language for contract-based testing in PHP. In: Wolff B., Zadi F., eds. Testing
Software and Systems, no. 7019 in Lecture Notes in Computer Science. Berlin:
Springer; 2011:978-3-642-24579-46479. 978-3-642-24580-0
http://link.springer.com/chapter/10.1007/978-3-642-24580-0_6.
[30] Dallmeier V., Burger M., Orth T., Zeller A. WebMate: generating test cases for
Web 2.0. In: Winkler D., Biffl S., Bergsmann J., eds. Software Quality. Increasing
Value in Software and Systems Development, no. 133 in Lecture Notes in Business
Information Processing. Berlin: Springer; 2013:978-3-642-35701-55569. 978-3642-35702-2 http://link.springer.com/chapter/10.1007/978-3-642-35702-2_5.
[31] Zeller A. We are creating a start-up in web testing. 2013. http://andreaszeller.blogspot.com/2013/03/we-are-creating-start-up-in-web-testing.html.
[32] Testfabrik Consulting + Solutions AG. webmate. 2015. https://app.webmate.io/.
[33] Schur M., Roth A., Zeller A. ProCrawl: mining test models from multi-user web
applications. In: Proceedings of the 2014 International Symposium on Software
Testing and Analysis, ISSTA 2014, New York, NY, USA; ACM; 2014:978-14503-2645-2413416. doi:10.1145/2610384.2628051 00000.
[34] Schur M., Roth A., Zeller A. Mining behavior models from enterprise web
applications. In: Proceedings of the 2013 9th Joint Meeting on Foundations of
Software Engineering, ESEC/FSE 2013, New York, NY, USA; ACM; 2013:9781-4503-2237-9422432. doi:10.1145/2491411.2491426.
[35] Elbaum S., Rothermel G., Karre S., Fisher M. Leveraging user-session data to
support Web application testing. IEEE Trans. Softw. Eng. 009855892005;31(3):187202. doi:10.1109/TSE.2005.36.
[36] Sprenkle S., Pollock L., Simko L. A study of usage-based navigation models and
generated abstract test cases for web applications. In: Proceedings of the 2011
Fourth IEEE International Conference on Software Testing, Verification and
Validation, ICST 11, Washington, DC, USA; IEEE Computer Society; 2011:978-
0-7695-4342-0230239. doi:10.1109/ICST.2011.34.
[37] Sprenkle S.E., Pollock L.L., Simko L.M. Configuring effective navigation
models and abstract test cases for web applications by analysing user behaviour.
Softw. Test. Verif. Reliab. 1099-16892013. ;23(6):439464. doi:10.1002/stvr.1496.
http://onlinelibrary.wiley.com/doi/10.1002/stvr.1496/abstract 00003.
[38] Sprenkle S., Cobb C., Pollock L. Leveraging user-privilege classification to
customize usage-based statistical models of web applications. In: Proceedings of
the 2012 IEEE Fifth International Conference on Software Testing, Verification
and Validation, ICST 12, Washington, DC, USA; IEEE Computer Society;
2012:978-0-7695-4670-4161170. doi:10.1109/ICST.2012.96 00005.
[39] Sant J., Souter A., Greenwald L. An exploration of statistical models for
automated test case generation. In: ACM SIGSOFT Software Engineering Notes.
ACM; 17. 2005. ;vol. 30. http://dl.acm.org/citation.cfm?id=1083256.
[40] Artzi S., Kiezun A., Dolby J., Tip F., Dig D., Paradkar A., Ernst M.D. Finding
bugs in web applications using dynamic test generation and explicit-state model
checking. IEEE Trans. Softw. Eng. 0098-55892010;36(4):474494.
doi:10.1109/TSE.2010.31 00075.
[41] Artzi S., Kiezun A., Dolby J., Tip F., Dig D., Paradkar A., Ernst M.D. Finding
bugs in dynamic web applications. In: Proceedings of the 2008 International
Symposium on Software Testing and Analysis, ISSTA 08, New York, NY, USA;
ACM; 2008:978-1-60558-050-0261272. doi:10.1145/1390630.1390662 00158.
[42] Artzi S., Dolby J., Tip F., Pistoia M. Practical fault localization for dynamic web
applications. In: 2010 ACM/IEEE 32nd International Conference on Software
Engineering; 265274. doi:10.1145/1806799.1806840. 2010;vol. 1.
[43] Jones J.A., Harrold M.J. Empirical evaluation of the tarantula automatic faultlocalization technique. In: Proceedings of the 20th IEEE/ACM International
Conference on Automated Software Engineering, ASE 05, New York, NY, USA;
ACM; 2005:1-58113-993-4273282. doi:10.1145/1101908.1101949 00601.
[44] Jones J.A., Harrold M.J., Stasko J. Visualization of test information to assist fault
localization. In: Proceedings of the 24th International Conference on Software
Engineering, ICSE 02, New York, NY, USA; ACM; 2002:1-58113-472-X467
477. doi:10.1145/581339.581397 00680.
[45] Artzi S., Dolby J., Tip F., Pistoia M. Directed test generation for effective fault
localization. In: Proceedings of the 19th International Symposium on Software
Testing and Analysis, ISSTA 10, New York, NY, USA; ACM; 2010:978-160558-823-04960. doi:10.1145/1831708.1831715.
[46] Artzi S., Dolby J., Tip F., Pistoia M. Fault localization for dynamic web
applications. IEEE Trans. Softw. Eng. 0098-55892012;38(2):314335.
doi:10.1109/TSE.2011.76 00018.
[47] Abreu R., Zoeteweij P., van Gemund A.J.C. An evaluation of similarity
coefficients for software fault localization. In: Proceedings of the 12th Pacific
Rim International Symposium on Dependable Computing, PRDC 06,
Washington, DC, USA; IEEE Computer Society; 2006:0-7695-2724-83946.
doi:10.1109/PRDC.2006.18.
[48] Chen M.Y., Kiciman E., Fratkin E., Fox A., Brewer E. Pinpoint: problem
determination in large, dynamic Internet services. In: Proceedings of the
International Conference on Dependable Systems and Networks, 2002, DSN
2002; 2002:595604. doi:10.1109/DSN.2002.1029005.
[49] de Matos E.C.B., Sousa T.C. From formal requirements to automated web testing
and prototyping. Innov. Syst. Softw. Eng. 2010;6(1-2):163169.
doi:10.1007/s11334-009-0112-5 ISSN 1614-5046, 1614-5054.
[50] Thummalapenta S., Lakshmi K.V., Sinha S., Sinha N., Chandra S. Guided test
generation for web applications. In: Proceedings of the 2013 International
Conference on Software Engineering, ICSE 13, Piscataway, NJ, USA; IEEE
Press; 2013:978-1-4673-3076-3162171. http://dl.acm.org/citation.cfm?
id=2486788.2486810.
[51] Alshahwan N., Harman M. Automated web application testing using search
based software engineering. In: 2011 26th IEEE/ACM International Conference
on Automated Software Engineering (ASE); 2011:312.
doi:10.1109/ASE.2011.6100082 00049.
[52] Korel B. Automated software test data generation. IEEE Trans. Softw. Eng. 009855891990;16(8):870879. doi:10.1109/32.57624.
[53] Tappenden A.F., Miller J. Automated cookie collection testing. ACM Trans.
Softw. Eng. Methodol. 1049-331X2014;23(1):3:13:40. doi:10.1145/2559936
00003.
[54] Alshahwan N., Harman M. State aware test case regeneration for improving web
application test suite coverage and fault detection. In: Proceedings of the 2012
International Symposium on Software Testing and Analysis, ISSTA 2012, New
York, NY, USA; ACM; 2012:978-1-4503-1454-14555.
doi:10.1145/2338965.2336759.
[55] Shahbaz M., McMinn P., Stevenson M. Automatic generation of valid and invalid
test data for string validation routines using web searches and regular
expressions. Sci. Comput. Program. 0167-64232015. ;97(Part 4):405425.
doi:10.1016/j.scico.2014.04.008.
http://www.sciencedirect.com/science/article/pii/S0167642314001725 00002.
[56] McMinn P., Shahbaz M., Stevenson M. Search-based test input generation for
string data types using the results of web queries. In: 2012 IEEE Fifth
International Conference on Software Testing, Verification and Validation
(ICST); 2012:141150. doi:10.1109/ICST.2012.94 00027.
[57] Fujiwara S., Munakata K., Maeda Y., Katayama A., Uehara T. Test data
generation for web application using a UML class diagram with OCL constraints.
Innov. Syst. Softw. Eng. 1614-50462011;7(4):275282. doi:10.1007/s11334-0110162-3 1614-5054.
[58] Dobolyi K., Soechting E., Weimer W. Automating regression testing using webbased application similarities. 1433-2787 Int. J. Softw. Tools Technol. Transfer.
1433-27792010;13(2):111129. doi:10.1007/s10009-010-0170-x 00000.
[59] de Castro A., Macedo G.A., Collins E.F., Dias-Neto A.C. Extension of Selenium
RC tool to perform automated testing with databases in web applications. In:
2013 8th International Workshop on Automation of Software Test (AST).
2013:125131. doi:10.1109/IWAST.2013.6595803.
[60] Mahajan S., Halfond W.G.J. Finding HTML presentation failures using image
comparison techniques. In: Proceedings of the 29th ACM/IEEE International
Conference on Automated Software Engineering, ASE 14, New York, NY, USA;
ACM; 2014:978-1-4503-3013-89196. doi:10.1145/2642937.2642966 00003.
[61] Mahajan S., Li B., Halfond W.G.J. Root cause analysis for HTML presentation
failures using search-based techniques. In: Proceedings of the 7th International
Workshop on Search-Based Software Testing, SBST 2014, New York, NY, USA;
ACM; 2014:978-1-4503-2852-41518. doi:10.1145/2593833.2593836 00000.
[62] Mahajan S., Halfond W.G.J. Detection and Localization of HTML Presentation
Failures Using Computer Vision-Based Techniques. In: Proceedings of the 8th
IEEE International Conference on Software Testing, Verification and Validation
(ICST); IEEE; 2015.
[63] Mahajan S., Halfond W.G.J. WebSee: A Tool for Debugging HTML Presentation
Failures. In: Proceedings of the 8th IEEE International Conference on Software
Testing, Verification and Validation (ICST) - Tool Track; IEEE; 2015.
[64] Alalfi M.H., Cordy J.R., Dean T.R. Automating coverage metrics for dynamic
web applications. In: 2010 14th European Conference on Software Maintenance
and Reengineering (CSMR); 2010:5160. doi:10.1109/CSMR.2010.21 00010.
[65] Alshahwan N., Harman M. Augmenting test suites effectiveness by increasing
output diversity. In: 2012 34th International Conference on Software Engineering
(ICSE); 2012:13451348. doi:10.1109/ICSE.2012.6227083 00003.
[66] Sakamoto K., Tomohiro K., Hamura D., Washizaki H., Fukazawa Y. POGen: a
test code generator based on template variable coverage in gray-box integration
testing for web applications. In: Cortellessa V., Varr D., eds. Fundamental
Approaches to Software Engineering, no. 7793 in Lecture Notes in Computer
Science. Berlin: Springer; 2013:978-3-642-37056-4343358. 978-3-642-37057-1
http://link.springer.com.ezproxy.wlu.edu/chapter/10.1007/978-3-642-37057-1_25.
[67] Praphamontripong U., Offutt J. Applying mutation testing to web applications.
In: 2010 Third International Conference on Software Testing, Verification, and
Validation Workshops (ICSTW); 2010:132141. doi:10.1109/ICSTW.2010.38
00021.
[68] Ma Y.-S., Offutt J., Kwon Y.R. MuJava: an automated class mutation system.
Softw. Test. Verif. Reliab. 1099-16892005. ;15(2):97133. doi:10.1002/stvr.308.
http://onlinelibrary.wiley.com/doi/10.1002/stvr.308/abstract.
[69] Dobolyi K., Weimer W. Modeling consumer-perceived web application fault
severities for testing. In: Proceedings of the 19th International Symposium on
Software Testing and Analysis, ISSTA 10, New York, NY, USA; ACM;
2010:978-1-60558-823-097106. doi:10.1145/1831708.1831720 00007.
[70] Bryce R.C., Sampath S., Memon A.M. Developing a single model and test
prioritization strategies for event-driven software. IEEE Trans. Softw. Eng. 009855892011;37(1):4864. doi:10.1109/TSE.2010.12.
[71] Rothermel G., Untch R.H., Chu C., Harrold M.J. Prioritizing test cases for
regression testing. IEEE Trans. Softw. Eng. 0098-55892001;27(10):929948.
doi:10.1109/32.962562.
[72] Sampath S., Bryce R.C., Jain S., Manchester S. A tool for combination-based
prioritization and reduction of user-session-based test suites. In: Proceedings of
the 2011 27th IEEE International Conference on Software Maintenance, ICSM
11, Washington, DC, USA; IEEE Computer Society; 2011:978-1-4577-06639574577. doi:10.1109/ICSM.2011.6080833 00007.
[73] Garg D., Datta A., French T. A two-level prioritization approach for regression
testing of web applications. In: 2012 19th Asia-Pacific Software Engineering
Conference (APSEC); 150153. doi:10.1109/APSEC.2012.34. 2012;vol. 2
00001.
[74] Garg D., Datta A. Parallel execution of prioritized test cases for regression
testing of web applications. In: Proceedings of the Thirty-Sixth Australasian
Computer Science ConferenceVolume 135, ACSC 13, Darlinghurst, Australia,
Australia; Australian Computer Society, Inc.; 2013:978-1-921770-20-36168.
http://dl.acm.org/citation.cfm?id=2525401.2525408 00001.
[75] Garg D., Datta A. Test case prioritization due to database changes in web
applications. In: 2012 IEEE Fifth International Conference on Software Testing,
Verification and Validation (ICST); 2012:726730. doi:10.1109/ICST.2012.163
00006.
[76] Sampath S., Bryce R.C. Improving the effectiveness of test suite reduction for
user-session-based testing of web applications. Inf. Softw. Technol. 095058492012;54(7):724738. doi:10.1016/j.infsof.2012.01.007 00010.
[77] Sampath S., Sprenkle S., Gibson E., Pollock L., Greenwald A.S. Applying
concept analysis to user-session-based testing of web applications. IEEE Trans.
Softw. Eng. 0098-55892007;33(10):643658. doi:10.1109/TSE.2007.70723.
[78] Elbaum S., Malishevsky A., Rothermel G. Incorporating varying test costs and
fault severities into test case prioritization. In: Proceedings of the 23rd
[89] Leotta M., Clerissi D., Ricca F., Spadaro C. Improving test suites maintainability
with the page object pattern: an industrial case study. In: 2013 IEEE Sixth
International Conference on Software Testing, Verification and Validation
Workshops (ICSTW); 2013:108113. doi:10.1109/ICSTW.2013.19.
[90] Leotta M., Stocco A., Ricca F., Tonella P. Reducing web test cases aging by
means of robust XPath locators. In: 2014 IEEE International Symposium on
Software Reliability Engineering Workshops (ISSREW). 2014:449454.
doi:10.1109/ISSREW.2014.17 00000.
[91] Andrews A., Do H. Trade-off analysis for selective versus brute-force regression
testing in FSMWeb. In: Proceedings of the 2014 IEEE 15th International
Symposium on High-Assurance Systems Engineering, HASE 14, Washington,
DC, USA; IEEE Computer Society; 2014:978-1-4799-3466-9184192.
doi:10.1109/HASE.2014.33 00002.
[92] Hirzel M. Selective regression testing for web applications created with Google
web toolkit. In: Proceedings of the 2014 International Conference on Principles
and Practices of Programming on the Java Platform: Virtual Machines,
Languages, and Tools, PPPJ 14, New York, NY, USA; ACM; 2014:978-1-45032926-2110121. doi:10.1145/2647508.2647527.
[93] Christophe L., Stevens R., De Roover C., De Meuter W. Prevalence and
maintenance of automated functional tests for web applications. In: 2014 IEEE
International Conference on Software Maintenance and Evolution (ICSME);
2014:141150. doi:10.1109/ICSME.2014.36.
[94] Cai J., Hu Q. Analysis for cloud testing of web application. In: 2014 2nd
International Conference on Systems and Informatics (ICSAI); 2014:293297.
doi:10.1109/ICSAI.2014.7009302.
https://jquery.com/
https://angularjs.org/
http://getbootstrap.com/
https://developers.google.com/closure/templates/
http://papermachines.org/
6
7
https://www.zotero.org/
We cannot get a more readable figure, eg, a larger font for the legend or grayscale patterns, from the Paper Machines
tool. Also, the order of the bars is not the same as the order of the legend.
CHAPTER FIVE
Abstract
The importance of test automation in web engineering comes from the widespread use of web applications and the
associated demand for code quality. Test automation is considered crucial for delivering the quality levels expected
by users, since it can save a lot of time in testing and it helps developers to release web applications with fewer
defects. The main advantage of test automation comes from fast, unattended execution of a set of tests after some
changes have been made to a web application. Moreover, modern web applications adopt a multitier architecture
where the implementation is scattered across different layers and run on different machines. For this reason, end-toend testing techniques are required to test the overall behavior of web applications.
In the last years, several approaches have been proposed for automated end-to-end web testing and the choice
among them depends on a number of factors, including the tools used for web testing and the costs associated with
their adoption. They can be classified using two main criteria: the first concerns how test cases are developed (ie,
Capture-Replay and Programmable approaches), while, the second concerns how test cases localize the web
elements to interact with (ie, Coordinates-based, DOM-based, and Visual approaches), that is what kind of locators
are used for selecting the target GUI components.
For developers and project managers it is not easy to select the most suitable automated end-to-end web testing
approach for their needs among the existing ones. This chapter provides a comprehensive overview of the
automated end-to-end web testing approaches and summarizes the findings of a long term research project aimed at
empirically investigating their strengths and weaknesses.
Keywords
Web testing; Test automation; Capture-replay web testing; Programmable web testing; DOM-based web
testing; Visual web testing; Page object pattern; Robust locators; Selenium; Sikuli
1 Introduction
Web applications are key assets of our society. A considerable slice of modern software
consists of web applications executed in the users web browser, running on computers or
smartphones. The web has a significant impact on all aspects of our society and in the last
years has changed the life of billions of people. Associations, enterprizes, governmental
organizations, companies, scientific groups use the web as a powerful and convenient way
to promote activities/products and carry out their core business. People daily use online
services as source of information, means of communication, source of entertainment, and
venue for commerce. In a sentence, web applications pervade our life, being crucial for a
multitude of economic, social and educational activities.
The importance of the web in our lives stresses the quality with which these
applications are developed and maintained [1]. End-to-end web testing is one of the main
approaches for assuring the quality of web application [2]. The goal of end-to-end web
testing is exercising the web application under test as a whole to detect as many failures as
possible, where a failure can be considered as a deviation from the expected behavior. In
many software projects, end-to-end web testing is neglected because of time or cost
constraints. However, the impact of failures in a web application may be very serious,
ranging from simple inconvenience (eg, malfunction and so users dissatisfaction),
economic problems (eg, interruption of business), up to catastrophic impacts.
The simplest solution is to manually interact with the web application under
development to see if it behaves as expected. Unfortunately, this practice is error prone,
time consuming, and ultimately not very effective. For this reason, most teams automate
manual web testing by means of automated testing tools. The process contains a first
manual step: producing the test code able to instrument the web application. Test code
provides input data, operates on GUI components, and retrieves information to be
compared with oracles (eg, using assertions). The main benefit of test automation comes
from the fast and unattended execution of a test suite after some changes have been made
to the web application under test (ie, for regression purposes).
page elements localization. Fig. 1 shows a classification grid based on these two criteria
that can be applied to existing tools. For what concerns the first criterion, we can find two
main approaches [4]:
Capture-Replay (C&R) Web Testing consists of recording the actions performed by the
tester on the web application GUI and generating a test script that repeats such actions
for automated, unattended reexecution.
Programmable Web Testing aims at unifying web testing with traditional testing, where
test scripts are themselves software artifacts that developers write, with the help of
specific testing frameworks. Such frameworks allow developers to program the
interactions with a web page and its elements, so that test scripts can, for instance,
automatically fill-in and submit forms or click on hyper-links.
An automated end-to-end test case interacts with several web page elements such as
links, buttons, and input fields, and different methods can be used to locate them. Thus,
concerning the second criterion, we can find three different cases [3]:
Coordinate-based localization: the tools implementing this approach just record the screen
coordinates of the web page elements and then use this information to locate the
elements during test case replay. This approach is nowadays considered obsolete,
because it produces test scripts that are extremely fragile. Hence, it is not considered
any further in this work.
DOM-based localization: the tools implementing this approach (eg, Selenium IDE1 and
Selenium WebDriver2 ) locate the web page elements using the information contained in
the Document Object Model (DOM) and, usually, provide several ways to locate web
page elements. For instance, Selenium WebDriver is able to locate a web page element
using: (1) the values of attributes id, name, and class; (2) the tag name of the element;
(3) the text string shown in the hyperlink, for anchor elements; (4) CSS and (5) XPath
expressions. Not all these locators are applicable to any arbitrary web element; eg,
locator (1) can be used only if the target element has a unique value of attribute id,
name, or class in the entire web page; locator (2) can be used if there is only one
element with the chosen tag name in the whole page; and, locator (3) can be used only
for links uniquely identified by their text string. On the other hand, XPath/CSS
expressions can always be used. In fact, as a baseline, the unique path from root to
target element in the DOM tree can always be turned into an XPath/CSS locator that
uniquely identifies the element.
Visual localization: the tools implementing this approach have emerged recently. They
make use of image recognition techniques to identify and control GUI components. The
FIGURE 2
(2) Concerning how test cases localize the web element to interact with, we have evaluated
and compared the visual and DOM-based approaches [3] considering: the robustness of
locators, the initial test suite development effort, the test suite evolution cost, and the test
suite execution time. Our empirical assessment of the robustness of locators is quite
general and tool independent, while the developers effort for initial test suite development
and the effort for test suite evolution were measured with reference to specific
implementations of the two approaches. We have instantiated such analysis for two tools,
Sikuli API and Selenium WebDriver, both adopting the programmable approach but
differing in the way they localize the web elements to interact with during the execution of
the test cases. Indeed, Sikuli API adopts the visual approach, thus using images
representing portions of the web pages, while Selenium WebDriver employs the DOMbased approach, thus relying on the HTML structure. Since visual tools are known to be
computational demanding, we also measured and compared the test suite execution time.
The findings reported in this chapter provide practical guidelines for developers who
want to make an informed decision among the available approaches and who want to
understand which of them could fit more or less well for a specific web development
context.
The chapter is organized as follows: Sections 2 and 3 provide an overview on the main
classical approaches to automated end-to-end web testing and report several examples of
tools instantiating such approaches. Specifically, these sections describes how the test
cases development approaches (ie, capture-replay and programmable) can be combined
with the DOM-based and the visual localization approaches. Section 4 describes how the
evolution of the web application impacts on the test cases created by following each
approach. Section 5 summarizes and discusses the results of the empirical studies we
conducted to analyze the strengths and the weaknesses of various approaches for
automated end-to-end web testing. Section 6 analyses some tools and techniques that have
been recently proposed, overcoming the limitations of the classical approaches to
automated end-to-end web testing. In particular Section 6.1 provides some example of
tools/techniques that go beyond the simple adoption of one approach, ie, solutions that are
able to combine more approaches at the same time (eg, Visual + DOM based or C&R +
Programmable). Then, Section 6.2 analyses a set of techniques that have been proposed in
the literature in order to solve specific problems in the context of automated end-to-end
web testing (robustness, test case repair upon software evolution, page object creation, and
migration between approaches). Section 7 concludes the chapter.
the application evaluates the credentials correctness. If credentials are correct, the
username (eg, John.Doe), contained in a HTML tag with the attribute ID=LoggedUser,
and the logout button are reported in the upper right corner of the home page. Otherwise,
the login form is still shown in the home.asp page.
As an example, we report a test case for this simple functionality implemented using the
capture/replay facility of Selenium IDE (see Fig. 4). The test script produced by Selenium
IDE performs a valid login, using correct credentials (ie, username=John.Doe and
password=123456) and verifies that in the home page the user results to be correctly
authenticated (assertText, id=LoggedUser, John.Doe). It can be noticed that all web
elements are located using the values of the id attributes that can be found in the DOM.
Specifically, during the test script recording phase, Selenium IDE is able to detect the
actions performed on the web page elements and to automatically generate the locators for
such web elements. Selenium IDE contains a locator generation algorithm that produces
locators using different strategies (implemented by the so-called locator builders) and it
ranks them depending on an internal robustness heuristic.
used open-source solutions for web test automation, (4) during our previous industrial
collaborations, we gained a considerable experience in its usage [5, 6].
In Fig. 6, we show an example of a simple WebDriver test case for our running example
application, corresponding to a successful authentication. This automated test case submits
a valid login, using correct credentials (ie, username=John.Doe and password=123456)
and verifies that in the home page the user appears as correctly authenticated (the string
John.Doe is displayed in the top-right corner of the home page, as verified by method
checkLoggedUser).
The first step for building this test case is creating the HomePage.java page object
(see Fig. 7), corresponding to the home.asp web page. The page object HomePage.java
offers a method to log into the application. It takes in input username and password,
inserts them in the corresponding input fields and clicks the Login button. Moreover,
HomePage.java contains also a method that verifies the authenticated username in the
application. As shown in Fig. 7, web page elements can be located using different kinds of
DOM-based locators (eg, ID, LinkText, XPath).
The second step requires to develop the test case making use of the page object methods
(see Fig. 6). In the test case, first, a WebDriver object of type FirefoxDriver is created to
control the Firefox browser as a real user does; second, WebDriver (ie, the browser) opens
the specified URL and creates a page object that instantiates HomePage.java; third,
using method login() , the test tries to login in the application; finally, the test case
assertion is checked.
The following steps are basically the same in Sikuli API and Selenium WebDriver, the
only differences being that in Sikuli API driver is not a parameter of the HomePage
constructor and the assertion checking method does not need any string parameter. On the
contrary, Sikuli APIs page object is quite different from Selenium WebDrivers. As
shown in Fig. 9, the command locate is invoked to search for the portion of a web page
that looks like the image representing the rendering of the web element to be located. The
image must have been previously saved in the file system as a file or must be available
online. Once the web element has been located, a ScreenRegion is returned by method
locate, which can be used to perform operations such as clicking and typing into it (see,
eg, method type in Fig. 9).
Thus, in Sikuli API locators are images. While using DOM-based tools it is possible to
verify whether an HTML element contains textual information (see the last line in Fig. 7),
with visual tools it is necessary to check that the page contains an image displaying such
text (see Fig. 9, method checkLoggedUser). Moreover, some useful and quite general
Selenium WebDriver methods are not natively available in Sikuli API (eg, click() and
sendKeys()). Thus, when using Sikuli API, they must be implemented explicitly in the
page object class as auxiliary methods (eg, methods click() and type()).
script shown in Fig. 5, in particular by removing line 6 and recording the new additional
steps.
C&R Approach + structural change. The tester modifies the locators or the assertion
values used in the test script. In the case of Selenium IDE, she runs the test script and
finds the first broken command (ie, the Selenese command that is highlighted in red
after test case execution), which can be an action command (eg, type or click) or an
assertion. At this point, the tester repairs the broken command and then reexecutes the
test script, possibly finding the next broken command (if any). For example, if (CR3) is
implemented then the test script shown in Fig. 4 needs to be repaired. The tester has to
replace UID with UserID in the command used to insert the username in the input field.
The repair process is similar in the case of Sikuli IDE. It is interesting to note that a
structural change can affect differently DOM-based and visual test scripts. Indeed, in
case (CR3) is implemented, no modifications are required to the Sikuli IDE test script
shown in Fig. 5, while (CR2) requires to modify both Selenium IDE and Sikuli IDE test
scripts.
Programmable Approach + logical change. Depending on the magnitude of the executed
maintenance task, the tester has to modify the broken test cases and/or the
corresponding page objects. In some cases, new page objects have to be created. For
example, if (CR1) is implemented then the tester has to create a new page object for the
web page providing the additional authentication question. Moreover, she has to repair
the testLogin test case in Fig. 6 (and similarly the one shown in Fig. 8), adding a new
Java statement that calls the method offered by the new page object.
Programmable Approach + structural change. The tester modifies one or more page
objects that the broken test case links to. For example, in the case of Selenium
WebDriver, if (CR2) is implemented, the tester has to repair the line:
@FindBy(linkText=Login) in the HomePage.java page object (see Fig. 7).
Similarly, in the case of Sikuli API, the tester has to update the image login.png in the
HomePage.java page object (see Fig. 9).
Description
Bug tracking system
Password manager
Collaborative learning environment
Address/phone book, contact manager
Meeting rooms multisite booking system
Collaboration software
web Site
sourceforge.net/projects/mantisbt/
sourceforge.net/projects/ppma/
sourceforge.net/projects/claroline/
sourceforge.net/projects/php-addressbook/
sourceforge.net/projects/mrbs/
sourceforge.net/projects/collabtive/
+ Visual). The DOM-based test suites were developed for our first work [4], the Sikuli
API test suites for the following work [3], while the Sikuli IDE test suites have been
specifically built for this work.
All the test suites have been developed following well-known best practices. For
instance, regarding the Selenium WebDriver and Sikuli API test suites (programmable
approach) the page object pattern was used and, concerning the Selenium IDE and
WebDriver test suites (DOM-based localization) ID locators were preferred whenever
possible (ie, when HTML tags are provided with IDs), otherwise Name, LinkText, CSS,
and XPath locators were used.
For each test suite, we measured the number of locators produced and the development
effort for the implementation as clock time. Each test suite is equivalent to the others
because the included test cases test exactly the same functionalities, using the same
sequences of actions (eg, locating the same web page elements) with the same input data
and oracle.
(3) Each test suite has been executed against the second release of the web application.
First, we recorded the failed test cases. We also checked that no real regression bugs were
found and that all the failures were due to broken locators or to modifications to the test
case logics. Then, in a second phase, we repaired the broken test cases. We measured the
number of broken locators and the repair effort as clock time. Finally, for comparing the
efficiency of the various localization techniques, we executed 10 times (to average over
any random fluctuation of the execution time) each Sikuli API and Selenium WebDriver
test suite and recorded the execution times.
Varying the localization approach also influences the test suite development time. Fig.
10 clearly shows that DOM-based test suites require less time for their development.
Focusing on the two tools adopting the programmable approach (ie, Selenium WebDriver
and Sikuli API), we found that in all six cases, development of the WebDriver test suites
required less time than the Sikuli test suites (with a reduction between 22% and 57%).
Summary: Employing C&R tools and adopting DOM-based locators contribute to
reduce the overall development time.
Adopting a different localization approach also influences the test suite evolution. Fig.
11 shows that DOM-based test suites require less time for their evolution. Focusing on the
two tools implementing the programmable approach we found that results depend on the
respective robustness of the two kinds of locators (DOM based vs Visual) employed by
the two tools and thus follow the same trend: in four cases out of six, repairing the
Selenium WebDriver test suites required less time (from 33% to 57% less) than repairing
Sikuli API test suites, in one case slightly more. In just one case (ie, Collabtive) Selenium
WebDriver required substantially (10) more effort than Sikuli API.
Summary: Employing a programmable tool contributes to reduce the evolution costs
for repairing automated end-to-end test suites. Concerning the web element localization
approach, the DOM-based approach contributed to reduce the evolution costs in most
cases.
of the cases the cumulative cost of initial development and evolution of programmable test
cases (ie, Selenium WebDriver test suites) is lower than that of C&R test cases (ie,
Selenium IDE test suites) after a small number of releases (more precisely, between 1 and
3 releases).
We estimated that programmable test cases are more expensive to write from scratch
than C&R test cases, with a median ratio between the two costs equal to 1.58. During
software evolution, test suite repair is substantially cheaper for programmable test cases
than for C&R test cases, with a median ratio equal to 0.65. Such cost/benefit trade-off
becomes favorable to the programmable test suites after a small number of releases the
median of which is 1.94. The most important practical implication of these results is that
for any software project which is expected to deliver two or more releases over time,
programmable test cases offer an advantageous return of the initial investment. In fact,
after two or more releases, the evolution of the test suites will be easier and will require
less effort if a programmable approach (such as WebDriver) is adopted. However, specific
features of a given web application might make the trade-off more or less favorable to
programmable tests. In particular, the possibility to capture reusable abstractions in page
objects plays a major role in reducing the test evolution effort for programmable test
cases. In the following, we analyze each factor that might affect the trade-off between
C&R and programmable test cases.
Summary: According to our estimate, after two major releases, programmable test
cases become more convenient than C&R ones. Of course, the actual benefits may
depend on specific features of the web application under test.
The degree of reuse may amplify or reduce the benefits of adopting the page object
pattern and then the programmable approach.
For instance, in the case of MantisBT (number of page objects per test cases equal to
0.73) we have that 14 page objects (among 17) are used by only one or two test cases that
have been repaired. In this case, we have few advantages in terms of maintenance effort
reduction from adopting the page object pattern, since each repair activity on a page
object, done just once, affects only one or at most two test cases. On the other hand, in our
study we found three applications out of six that have a number of page objects per test
cases about 0.25 or lower. Thus, in these cases, the potential advantages of adopting the
PO pattern are higher.
It is interesting to note that, at the beginning of the test suite development, a lot of page
objects have to be created. For instance, the first test case could require to create even four
or five page objects. But usually, as new test cases are added to the existing test suite, the
number of new page objects that the tester has to create decreases. Indeed, the tester has to
create a page object for each logical page of the web application (eg, login, home, user
details), while he could potentially develop a test for each path that could be followed to
reach a specific page. Thus, probably, comprehensive test suites (ie, testing the web
application in depth) benefit more of the page object pattern since the level of page objects
reuse is higher.
Summary: The web page modularity of the web application under test affects the
benefits of programmable test cases. Web applications with well modularized
functionalities, implemented through reusable web pages, are associated with reusable
pages objects that are maintained just once during software evolution.
the same locators. Actually, the number of broken test cases is the same for each pair of
test suites, but the number of repaired test cases is lower with Selenium WebDriver
because of the adoption of the page object pattern. With the page object pattern each
offered method can be reused more times in a test suite. Thus, a change at the level of the
page object can repair more than one test case at once (see the example in Fig. 13).
Clearly, the reduction of the number of repaired test cases is related with the number of
times a method in a page object is (re-)used.
Let us consider a specific example. In Claroline, between the two considered releases a
modification of the part of the application managing the login process occurred. Since this
modification involved also the attribute used to locate the user credentials submission
button, all the test cases were impacted (since all of them start with the authentication). In
the Selenium WebDriver test suite we repaired only the page object offering method
DesktopPage login(String user, String pass). In this way, we automatically
resolved the problem for the entire test suite. On the contrary, in the Selenium IDE test
suite, we had to modify all the test cases (ie, 40 test cases).
Summary: Page objects reuse reduces dramatically the test cases repair effort.
the broken test cases due to structural changes (respectively 727 out of 2735 locators
changed with IDE vs 162 out of 487 locators changed with WebDriver).
Summary: Adopting the page object pattern avoids the duplication of locators as well
as the need for their repeated, consistent evolution.
Note that for each target web element, the locators used by Selenium IDE and
WebDriver are exactly the same as well as their robustness. Thus the problem of the C&R
approach is only due to the locators duplication.
Summary: Additional benefits of programmable test cases (eg, parametric test cases)
should be taken into account when choosing between programmable and C&R web
testing.
locators). The visual approach requires more locators in the following situations: (1) web
elements changing their state, (2) elements with complex interaction, and (3) data-driven
test cases.
Web Elements Changing their State. When a web element changes its state (eg, a check
box is checked or unchecked, see the example in Fig. 14), a visual locator must be created
for each state, while with the DOM-based approach only one locator is required. This
occurred during the development of all the visual test suites (eg, Sikuli API) and it is one
of the reasons why, in all of them, we have more locators than in the DOM-based test
suites (eg, Selenium WebDriver). As a consequence, more effort both during the
development and maintenance is required to create the visual test suites (quite often more
than one locator had to be created and later repaired for each web element).
Web Elements with Complex Interaction. Complex web elements, such as drop-down
lists and multilevel drop-down menus, are quite common in modern web applications. For
instance, let us consider a form that asks to select the manufacturer of the car (see Fig. 15).
Typically, this is implemented using a drop-down list containing a list of manufacturers. A
DOM-based tool like Selenium WebDriver can provide a command to select directly an
element from the drop-down list (in the example only one ID-based locator is required).
On the contrary, when adopting the visual approach the task is much more complex. One
could, for instance: (1) locate the drop-down list (more precisely the arrow that shows the
menu) using an image locator; (2) click on it; (3) if the required list element is not shown,
locate and move the scrollbar (eg, by clicking the arrow); (4) locate the required element
using another image locator; and, finally, (5) click on it. All these steps together require
more LOCs in the page objects and locators. Actually, in this case the visual approach
performs exactly the same steps that a human tester would do.
Data-driven Test Cases. Often in the industrial practice [5], to improve the coverage
reached by a test suite, test cases are reexecuted multiple times using different values. This
is very well supported by a programmable testing approach. However, benefits depend on
the specific programmable approach that is adopted (eg, visual vs DOM-based). For
instance, in Selenium WebDriver it is possible to use data from various sources, such as
CSV files or databases, or even to generate them at runtime. In Sikuli it is necessary to
have images of the target web elements, so even if we can use various data sources (eg, to
fill input fields), when assertions are evaluated, images are still needed to represent the
expected data. For this reason, in the visual approach it is not possible to create complete
data-driven test cases (ie, including both data driven inputs and assertions). This happens
because in the DOM-based approach there is a clear separation between the locator for a
web element (eg, an ID value) and the content of that web element (eg, the displayed
string), so that we can reuse the same locator with different contents (eg, test assertion
values). On the contrary, using a visual tool, the locator for a web element and the
displayed content are the same thing, thus if the content changes, the locator must be also
modified. Moreover, it is important to highlight that, if necessary, parameterizing the
creation of DOM-based locators is usually an easy task (eg, .//*[@id=list]/tr[X]/td[1] with
X=1..n), while this is not the case with visual locators.
In our case study, we experienced this limitation of the visual approach, since we had,
in each test suite, at least one test case that performs multiple, repeated operations that
differ only in the data values being manipulated, such as: insert/remove multiple different
users, projects, addresses, or groups (depending on the considered application). In such
cases we used: (1) a single parameterized locator in Selenium WebDriver, and (2) several
different image locators in Sikuli API (eg, for evaluating the assertions), with the effect
that, in the second case, the number of locators required is substantially higher.
Summary: Adopting the Visual approach requires to generate more locators than with
the DOM-based approach. This happens in the following situations: (1) web elements
changing their state, (2) web elements with complex interaction, and (3) data-driven test
cases.
Summary: DOM-based locators proved to be in general slightly more robust than
Visual locators. However, much depends on the specific characteristics of the web
application.
Changes behind the Scene. Sometimes the HTML code is modified without any
perceivable impact on how the web application appears. An extreme example is changing
the layout of a web application from the deprecated table-based structure to a div-based
structure, without affecting its visual aspect in any respect. In this case, the vast majority
of the DOM-based locators (in particular the navigational ones, eg, XPath) used by DOMbased tools may be broken. On the contrary, this change is almost insignificant for visual
test tools. A similar problem occurs when autogenerated ID locators (eg, id1, id2, id3, ,
idN) are used in DOM-based locators. In fact, these tend to change across different
releases, while leaving completely unaffected the visual appearance of the web page
(hence, no maintenance is required on the visual test suites). For example, the addition of
a new link in a web page might result in a change of all IDs of the elements following the
new link. Such changes behind the scene occurred in our empirical study and explain
why, in the case of Collabtive, the Sikuli test suite has required by far a lower maintenance
effort. In detail, across the two considered releases, a minor change has been applied to
almost all the HTML pages of Collabtive: an unused div tag has been removed. This little
change impacted quite strongly several of the XPath locators (XPath locators were used
because IDs were not present) in the WebDriver test suite. The majority of the 36 locators
(all of them are XPaths) was broken and had to be repaired (an example of repair is from
/div[2]/ to /div[1]/). No change was necessary on the Sikuli visual test suite for
this structural change. Overall, in Sikuli, we had only few locators broken. For this reason,
there is a large difference in the maintenance effort between the two test suites. A similar
change across releases occurred also in MantisBT, although it had a lower impact in this
application.
Summary: DOM-based test suites require less time to complete their execution w.r.t.
visual test suites. However the difference is not so high.
one web element (eg, the label), Solution (ii) requires to calculate a distance in pixels
(similarly to first generation tools), which is not so simple to determine. Both solutions
have problems in case of variations of the relative positions of the elements in the next
releases of the application. Thus, this factor has a negative effect on both the development
and maintenance of the visual test suites.
Summary: Creating Visual locators could be difficult when a web page contains
multiple instances of the same kind of web element (eg, a list of input boxes in a form).
Summary: DOM-based test suites can interact with all the DOM-elements regardless
of whether they are displayed on the screen or not.
Summary: Choosing a tool that provides specific commands for managing web page
loading is useful for creating test cases easily. If equipped with such commands, test
cases are also more robust and faster to execute.
Summary: For the tester, Visual locators are usually simpler to match with the actual
web page elements than DOM-based locators.
Summary: The DOM-based and the Visual approaches have different strengths: the
former creates portable test suites, while the latter can test the correct rendering of web
pages across platforms.
The advantages of this approach are twofold: (1) locators based on contextual clues
proved to be robust [9], and (2) test cases are very simple to read even for a nonexpert, see
for instance the example in Fig. 18, showing an ata-qv test case for our running example.
6.1.2 Ranorex
Ranorex is a GUI test automation framework for testing desktop, web-based and mobile
applications. Concerning automated end-to-end web testing, Ranorex is able to record the
steps performed by the tester on the web application and to create an executable test case
from them. The creation of the assertions is aided and the tester can choose among a set of
possible proposals. Thus Ranorex behaves as a C&R tool, but, on the other hand, it provides
also some functionalities typical of programmable tool, such as: (1) Code Modularization:
once the test cases have been recorded, it is possible to group sequences of steps in order
to create reusable procedures (as done with the PO pattern); (2) Data Driven: it is possible
to reexecute the same test case using different values stored in internal (simple data tables)
or external (Excel or CSV files, SQL Databases) data sets; (3) Module Development: it is
possible to develop test code modules using for instance the C# and VB.NET languages
and then to integrate them with the recorded test cases.
Concerning the localization methods, the tool supports both the DOM-based and the
Visual approaches. Indeed, Ranorex employs the RanoreXPath language, an expression
language similar to XPath, providing a search mechanism for finding single or multiple
web elements within a web page. At the same time, it provides also the capability of
defining visual locators both for localizing web elements to interact with and for defining
assertions.
6.1.3 JAutomate
JAutomate is a commercial tool able to create test scripts similarly to how they can be
produced using Selenium IDE. Indeed, the tester clicks the record button, performs the test
case steps and finally completes the test script by inserting assertions. Test script recording
is fully automatic, since the tool is able to (1) detect the actions executed on the user
interface (eg, click on a button, write in an input form or scroll the web page using the
mouse) and (2) generate the locators. JAutomate is based on the visual localization of the
target web elements but it is also able to provide functionalities that go beyond the typical
visual approach [10], like verifying that a text is displayed in a web page by means of: (a)
runtime generation of a visual locator representing such text, and (b) an OCR (Optical
Character Recognition) algorithm, both of which are very useful for creating data-driven
test cases. In case of several identical images on the screen, it is possible to specify which
has to be selected by using an index position, similarly to how this is done in an XPath
expression (eg, //input[3]). Moreover, JAutomate employs two complementary image
recognition algorithms [11], which, once combined, can identify images with inverted
colors or images with transparent backgrounds. JAutomate tries to overcome some
limitations typical of existing C&R tools by integrating/combining features of the
programmable approach [10], for instance, by providing constructs to (1) implement
loops, (2) create parametric test cases (eg, by loading data values from CSV files) and (3)
call/include other test scripts. Moreover, it provides an API for developing test scripts
directly in Java. A JAutomate test case that has a behavior close to the ones shown in the
previous sections is shown in Fig. 19.
(3) automating the generation of page objects, and (4) automating the migration across
different approaches. The provided solutions, strongly based on static, dynamic and code
transformations [12, 13], can be very useful for reducing the test case development and
maintenance effort.
Java files, representing a code abstraction of the web application, organized using the Page
Object and Page Factory design patterns, as supported by the Selenium WebDriver
framework. A preliminary study comparing the generated page objects with the ones
created manually by a human tester shows promising results.
7 Conclusions
In this chapter we have provided an overview of the most relevant approaches and tools to
automated end-to-end web testing. First, for each approach we have given a detailed
description based on existing implementations; second, we have deeply analyzed their
strengths and weaknesses by discussing the results of a series of empirical studies. Third,
we have described some recent techniques and tools that try to overcome the limitations of
the existing approaches by combining them into hybrid methods. Finally, we have
analyzed a set of techniques that have been proposed in the literature in order to solve
specific problems in the context of the automated end-to-end web testing.
Concerning the methods used for developing and maintaining web test cases, we found
that programmable tests involve higher development effort (between 32% and 112%) but
lower maintenance effort (with a saving between 16% and 51%) than C&R tests. We have
estimated that, on average, after two major releases, programmable test cases become
more convenient than C&R ones. However, the actual benefits depend on specific features
of the web application, including its degree of modularity, which maps to reusable page
objects that need to be evolved only once, when programmable test cases are used.
Moreover, there are useful features of programmable test cases, such as the possibility to
define parametric and repeated test scenarios, which might further amplify their
advantages.
Concerning the approach used for localizing the web elements to interact with, we
found that DOM-based locators are generally more robust than visual locators, and that
DOM-based test cases can be developed from scratch at lower cost. Most of the times they
are also evolved at lower cost. However, on specific web applications visual locators were
easier to repair, because the visual appearance of such applications remained stable across
releases, while their structure changed a lot. DOM-based test cases required a lower
execution time than visual test cases, due to the computational demands of image
recognition algorithms used by the visual approach, although the difference is not
dramatic. Overall, the choice between DOM-based and visual locators is applicationspecific and depends quite strongly on the expected structural and visual evolution of the
application. Other factors may also affect the testers decision, such as the
availability/unavailability of visual locators for web elements that are important during
testing and the presence of advanced, RIA functionalities which cannot be easily tested
using DOM-based locators. Moreover, visual test cases are definitely easier to understand,
which, depending on the skills of the involved testers, might also play a role in the
decision.
References
[1] Ricca F., Tonella P. Detecting anomaly and failure in web applications. IEEE
Multimed. 1070-986X2006;13(2):4451. doi:10.1109/MMUL.2006.26.
[2] Ricca F., Tonella P. Analysis and testing of web applications. In: Proceedings of
the 23rd International Conference on Software Engineering, ICSE 2001; IEEE;
2001:2534.
[3] Leotta M., Clerissi D., Ricca F., Tonella P. Visual vs. DOM-based web locators:
an empirical study. In: Proceedings of the 14th International Conference on Web
Engineering (ICWE 2014), LNCS, Toulouse, France; Springer; 322340.
doi:10.1007/978-3-319-08245-5_19. 2014;vol. 8541.
[4] Leotta M., Clerissi D., Ricca F., Tonella P. Capture-replay vs. programmable
web testing: an empirical assessment during test case evolution. In: Proceedings
of the 20th Working Conference on Reverse Engineering, WCRE 2013, Koblenz,
Germany; IEEE; 2013:272281. doi:10.1109/WCRE.2013.6671302.
[5] Leotta M., Clerissi D., Ricca F., Spadaro C. Improving test suites maintainability
with the Page Object pattern: an industrial case study. In: Proceedings of the 6th
IEEE International Conference on Software Testing, Verification and Validation
Workshops, ICSTW 2013; IEEE; 2013:108113. doi:10.1109/ICSTW.2013.19.
[6] Leotta M., Clerissi D., Ricca F., Spadaro C. Comparing the maintainability of
Selenium WebDriver test suites employing different locators: a case study. In:
Proceedings of the 1st International Workshop on Joining AcadeMiA and
Industry Contributions to testing Automation, JAMAICA 2013; ACM; 2013:53
58. doi:10.1145/2489280.2489284.
[7] van Deursen A. Beyond page objects: testing web applications with state
objects. ACM Queue. 1542-77302015;13(6):20:2020:37.
doi:10.1145/2791301.2793039.
[8] Mirzaaghaei M. Automatic test suite evolution. In: Proceedings of the 19th ACM
SIGSOFT Symposium and the 13th European conference on Foundations of
Software Engineering, ESEC/FSE 2011, Szeged, Hungary; ACM; 2011:978-14503-0443-6396399.
[9] Yandrapally R., Thummalapenta S., Sinha S., Chandra S. Robust test automation
using contextual clues. In: Proceedings of the 2014 International Symposium on
Software Testing and Analysis, ISSTA 2014, San Jose, CA, USA; ACM;
2014:978-1-4503-2645-2304314. doi:10.1145/2610384.2610390.
[10] Swifting AB. JAutomate Manual. 2014.
[11] Alegroth E., Nass M., Olsson H.H. JAutomate: a tool for system- and
acceptance-test automation. In: Proceedings of the 6th IEEE International
Conference on Software Testing, Verification and Validation, ICST 2013; IEEE;
2013:978-0-7695-4968-2439446. doi:10.1109/ICST.2013.61.
[12] Tonella P., Ricca F., Marchetto A. Recent advances in web testing. Adv. Comput.
2014;93:151.
[13] Ricca F., Tonella P., Baxter I.D. Web application transformations based on
rewrite rules. Inf. Softw. Technol. 2002. ;44(13):811825. URL http://dblp.unitrier.de/db/journals/infsof/infsof44.html#RiccaTB02.
[14] Leotta M., Stocco A., Ricca F., Tonella P. Reducing web test cases aging by
means of robust XPath locators. In: Proceedings of the 25th IEEE International
Symposium on Software Reliability Engineering Workshops, ISSREW 2014;
IEEE; 2014:449454. doi:10.1109/ISSREW.2014.17.
[15] M. Leotta, A. Stocco, F. Ricca, P. Tonella, ROBULA+: an algorithm for
generating robust XPath locators for web testing, J. Softw. Evol. Process (under
review)
[16] Montoto P., Pan A., Raposo J., Bellas F., Lopez J. Automated browsing in AJAX
websites. Data Knowl. Eng. 0169-023X2011. ;70(3):269283.
doi:10.1016/j.datak.2010.12.001. URL
http://www.sciencedirect.com/science/article/pii/S0169023X10001503.
[17] Leotta M., Stocco A., Ricca F., Tonella P. Using multi-locators to increase the
robustness of web test cases. In: Proceedings of 8th IEEE International
Conference on Software Testing, Verification and Validation, ICST 2015; IEEE;
2015:110. doi:10.1109/ICST.2015.7102611.
[18] Mirzaaghaei M., Pastore F., Pezze M. Automatic test case evolution. Softw. Test.
Verif. Reliab. 1099-16892014;24(5):386411. doi:10.1002/stvr.1527.
[19] Daniel B., Dig D., Gvero T., Jagannath V., Jiaa J., Mitchell D., Nogiec J., Tan
S.H., Marinov D. ReAssert: a tool for repairing broken unit tests. In: Proceedings
of the 33rd International Conference on Software Engineering, ICSE 2011; IEEE;
2011:10101012. doi:10.1145/1985793.1985978. 0270-5257.
[20] Choudhary S.R., Zhao D., Versee H., Orso A. WATER: web application test
repair. In: Proceedings of the 1st International Workshop on End-to-End Test
Script Engineering, ETSE 2011, Toronto, Ontario, Canada; ACM; 2011:978-14503-0808-32429.
[21] Elbaum S., Rothermel G., Karre S., Fisher II M. Leveraging user-session data to
support web application testing. IEEE Trans. Softw. Eng. 009855892005;31(3):187202. doi:10.1109/TSE.2005.36.
[22] Harman M., Alshahwan N. Automated session data repair for web application
regression testing. In: Proceedings of the 1st International Conference on
Software Testing, Verification, and Validation, ICST 2008; 2008:298307.
doi:10.1109/ICST.2008.56.
[23] Stocco A., Leotta M., Ricca F., Tonella P. Why creating web page objects
manually if it can be done automatically? In: Proceedings of 10th IEEE/ACM
International Workshop on Automation of Software Test, AST 2015, Florence,
Maurizio Leotta is a research fellow at the University of Genova, Italy. He received his
PhD degree in Computer Science from the same University, in 2015, with the thesis
Automated Web Testing: Analysis and Maintenance Effort Reduction. He is author or
coauthor of more than 30 research papers published in international conferences and
workshops. His current research interests are in Software Engineering, with a particular
focus on the following themes: Web Application Testing, Functional Testing Automation,
Business Process Modelling, Empirical Software Engineering, Model-Driven Software
Engineering.
Diego Clerissi is a PhD student in Computer Science at the University of Genova, Italy. In
2015 he received his master degree from the same University, with the thesis: Test Cases
Generation for Web Applications from Requirements Specification: Preliminary Results.
At the time of writing he is coauthor of 10 research papers published in international
conferences and workshops. His research interests are in Software Engineering, ModelBased Testing, Software Testing, Web Applications, System Modeling.
Filippo Ricca is an associate professor at the University of Genova, Italy. He received his
PhD degree in Computer Science from the same University, in 2003, with the thesis:
Analysis, Testing and Re-structuring of Web Applications. In 2011 he was awarded the
ICSE 2001 MIP (Most Influential Paper) award, for his paper: Analysis and Testing of
Web Applications. He is author or coauthor of more than 100 research papers published
in international journals and conferences/workshops. He was Program Chair of
CSMR/WCRE 2014, CSMR 2013, ICPC 2011, and WSE 2008. Among the others, he
served in the program committees of the following conferences: ICSM, ICST, SCAM,
CSMR, WCRE, and ESEM. From 1999 to 2006, he worked with the Software
Engineering group at ITC-irst (now FBK-irst), Trento, Italy. During this time he was part
of the team that worked on Reverse engineering, Re-engineering, and Software Testing.
His current research interests include: Software modeling, Reverse engineering, Empirical
studies in Software Engineering, Web applications, and Software Testing. The research is
mainly conducted through empirical methods such as case studies, controlled experiments,
and surveys.
Paolo Tonella is head of the Software Engineering Research Unit at Fondazione Bruno
Kessler (FBK), in Trento, Italy. He received his PhD degree in Software Engineering from
the University of Padova, in 1999, with the thesis: Code Analysis in Support to Software
Maintenance. In 2011 he was awarded the ICSE 2001 MIP (Most Influential Paper)
award, for his paper: Analysis and Testing of Web Applications. He is the author of
Reverse Engineering of Object Oriented Code, Springer, 2005. He participated in
several industrial and EU projects on software analysis and testing. His current research
interests include code analysis, web and object oriented testing, search-based test case
generation.
http://seleniumhq.org/projects/ide/
http://seleniumhq.org/projects/webdriver/
http://www.sikuli.org/
https://code.google.com/p/sikuli-api/
http://martinfowler.com/bliki/PageObject.html
https://code.google.com/p/selenium/wiki/PageObjects
https://code.google.com/p/selenium/wiki/PageFactory
http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp
10
http://ohmap.virtuetech.de/
11
https://github.com/dzharii/swd-recorder
12
https://github.com/wiredrive/wtframework/wiki/WTF-PageObject-Utility-Chrome-Extension
Author Index
Subject Index