Trends in Software Testing 2017
Trends in Software Testing 2017
Mohanty · J.R. Mohanty
Arunkumar Balakrishnan Editors
Trends in
Software
Testing
Trends in Software Testing
Hrushikesha Mohanty J.R. Mohanty
•
Arunkumar Balakrishnan
Editors
123
Editors
Hrushikesha Mohanty Arunkumar Balakrishnan
School of Computer and Information Mettler Toledo Turing Softwares
Sciences Coimbatore, Tamil Nadu
University of Hyderabad India
Hyderabad, Telengana
India
J.R. Mohanty
School of Computer Applications
KIIT University
Bhubaneswar, Odisha
India
v
vi Preface
managing manually. The sooner these tasks are automated, the faster software devel-
opers reap financial gains. This has initiated unbelievable growth in the software
industry. Software business houses compete with each other to reach out to users with
the promise of deliverables as quickly as possible. In the process, developers take short
cuts to meet deadlines. Short cuts can be made in each phase of the software-
development cycle. Among all of the phases, the testing phase is most prone to short cuts
being taken. Because quality software requires exhaustive testing, the process is time
consuming. Although taking short cuts ensures that users’ delivery expectations are met,
they open the door for problems to creep in that appear at later time. These problems later
may make things messy and even expensive for a company to fix. This phenomenon is
termed “test debt.” Test debt occurs due to inadequate test coverage and improper test
design. The chapter, Understanding Test Debt, by SG Ganesh, Mahesh Muralidharan,
and Raghu Kalyan Anna, takes up this issue and elaborates on defining technical debt as
a whole and test debt specifically. As the authors say, the causes for technical debt could
be many including work-schedule pressure, lack of well-trained personnel, lack of
standards and procedures, and many other similar issues that bring in adhocism into
software development. Test debt can occur during the requirement-engineering, design,
coding, and testing phases. This chapter considers test-debt management from the
authors’ experience. The concepts presented here include quantifying test debt, paying
off debt periodically, and preventing test debt. The best method to avoid test debt is to
take appropriate measures, such as adopting best practices in coding and testing, so that
test debt is prevented. Finally, the authors illustrate their ideas by quoting two case
studies and ending with some suggestions such as the development of tools that can
detect smells when debt occurs in software development, particularly during the testing
phase.
Early detection of test smell is better because it restricts test debt from
increasing. Thus, the idea of testing should start at an early stage of development;
however, testing traditionally occurs at the end of coding. The recent idea is to
develop in small steps and then test; the cycle is repeated to complete development
of the software. Thus, test smell detection happens early in development. This
process is known as “agile testing.” One chapter, Agile Testing by Janakirama Raju
Penmetsa, discusses methodology along with presenting case studies to familiarize
beginners with this new concept. The hallmark of agile testing is collaboration,
transparency, flexibility, and retrospection. In traditional testing methodology, the
developer and the tester “chase” each other with negative attitudes: Developers find
“devils” in testers, and testers try to pin down developers. This attitude provokes an
unwanted working atmosphere. Agile method works on the principle of
co-operation and transparency and thus results in a congenial working atmosphere.
The chapter on agile testing presents a brief discussion on the agile process by
quoting extreme programming and scrum. A reader new to this concept gets a basic
idea of the agile process and then explores the use of this concept in the realm of
testing. The author, with the help of an illustration, draws a picture of the
agile-testing process, which is based on an iterative and collaborative approach in
software development. An application is divided into smaller atomic units with
respect to user requirements. Each unit is developed, tested, and analysed by all
Preface vii
concerned including a designer, a code developer, and a tester. In case there is any
problem, issues are sorted out then and there instead of carrying test debt forward.
Advantages and disadvantages of agile testing are also identified. The success of
agile testing greatly depends on a new class of testing tools that automate the
process of testing on the basis of the agile-testing process.
Traditionally functional testing has been a prime concern in software testing
because this ensures users that requirements will be met. Among nonfunctional issues,
responsiveness testing takes the first priority because users usually want a system to
perform as quickly as possible. Recently the security of software systems has become
the main priority because it is a prime need, particularly when systems become
complex as distributed and loosely coupled systems. The integration of third-party
software has become prevalent even more so in the case of large systems. This
provides a great degree of vulnerability to systems it could many open doors for
hackers to sneak through and create havoc in terms of system-output accuracy and
performance. This underscores the urgency of security testing to ensure that such
doors are closed. Security testing minimizes the risk of security breach and ensures the
confidentiality, integrity, and availability of customer transactions. Another chapter,
Security Testing by Faisal Anwer, Mohd. Nazir, and Khurram Mustafa, takes up the
issue of security testing. On discussing the challenges involved, the chapter proceeds
to detail the significance of security testing. The authors explore the life cycle of
software development and argue that security concerns must be addressed at every
single phase because security is not a one-time activity. Security in requirement,
design, coding, and testing is to be ensured in order to minimise system vulnerability
to security threats. Static security testing involves an automatic check of system
vulnerability without the execution of programs, whereas dynamic security testing
involves testing of programs while the gadget is running. Static-testing techniques
include code review, model checking, and symbolic checking. Techniques such as
fuzz testing, concolic testing, and search-based testing techniques come under the
category of dynamic testing. The authors present a case study to illustrate the ideas of
security testing. They argue that security testing must be embedded in the
software-development process. Software security should be checked in a two-layer
approach including checking at each phase of the software-development life cycle and
again in integrated manner throughout the development life cycle. A discussion of
industry practices followed in security testing is contributed, and the authors go for-
ward to explore industry requirements and future trends. The chapter concludes with a
positive remark expecting a bright future in software-quality assurance, particularly in
the context of security testing, by adopting phase-based security testing instead of
demonstration-based conventional testing.
Testing itself is a process consisting of test selection, test classification, test
execution, and quality estimation. Each step of testing is dependant on many factors,
some of which are probalistic, e.g., the selection of a test case from a large set of
candidate test cases. In that case, there could be a compromise in test quality. Thus,
uncertainty in software testing is an important issue to be studied. At each phase of
software development, it is required to estimate uncertainty and take appropriate
measures to prevent uncertainty. Because the greater the uncertainty, the farther
viii Preface
away the actual product could be from the desired one. Because this issue is serious
and not has been sufficiently addressed in software engineering, this book includes a
chapter—Uncertainty in Software Testing by Salman Abdul Moiz—on the subject.
The chapter starts with a formal specification of a test case and introduces some
sources of uncertainty. It discusses at length about what contributes to uncertainty.
Uncertainty starts from uncertainty in requirement engineering and priortising user
requirements. Uncertainty, if not eliminated early on, affects other activities, such as
priortising test cases and their selections, at a later stage of software development.
In the case of systems with inherent asynchrony, the results differ from one test run
to another. Uncertainty plays a significant role in assuring the quality testing of such
systems. Currently many advanced systems adopt heuristics for problem solving,
and some use different machine-learning techniques. These also contribute to
uncertainty regarding software output because these techniques depend on variables
that could be stochastic in nature.
In order to manage uncertainty, there must be a way to measure it. The chapter
reviews some of the important techniques used to model uncertainty. Based on the
type of system, an appropriate model to measure uncertainty should be used. The
models include Bayesian approach, fuzzy logic, HMM, and Rough sets.
Machine-learning techniques are also proposed to determine uncertainty from
previous test runs of similar systems. The chapter also provides rules of thumb for
making design decisions in the presence of uncertainty.
Measuring uncertainty, ensuring security, adopting agile testing, and many such
software-engineering tasks mostly follow a bottom-up approach. That is the reason
why modularisation in software engineering is an important task. Modularisation
aims at isolation with minimal dependence among modules. Nevertheless, the
sharing of memory with the use of pointers creates inconsistency when one module
changes value to a location without the knowledge of others who share the location.
This book has a chapter, Separation Logic to Meliorate Software Testing and
Validation by Abhishek Kr. Singh and Raja Natarajan, on this issue.
This chapter explores the difficulties in sharing of memory locations through
presenting some examples. It introduces the notion of separation logic. This logic is
an extension to Hoare logic. The new logic uses a programming language that has
four commands for pointer manipulation. The commands perform the usual heap
operations such as lookup, update, allocation, and deallocation. Formal syntax for
each command is introduced. Sets of assertions and axioms are defined to facilitate
reasoning for changes due to each operation. For this purpose, basic structural rules
are defined; then for specific cases, a set of derived rules are also defined. This helps
in fast reasoning because these rules are used directly instead of deriving the same
from basic rules at the time of reasoning. Before and after each operation, the
respective rules are annotated in the program to verify anomalies, if any, created by
the operation. The idea is illustrated by a case study of list traversal. The chapter,
while introducing separation logic, also proposes a formal language that naturally
defines separation in memory usages. The technique proposed here presents a basic
method to assure separation among modules while using common pointers. This
helps in building systems by modules and reasoning over its correctness.
Preface ix
Hrushikesha Mohanty
J.R. Mohanty
Arunkumar Balakrishnan
Acknowledgments
xi
Contents
xiii
About the Editors
xv
About the Book
This book aims to highlight the current areas of concern in software testing. The
contributors hail from academia as well as the industry to ensure a balanced per-
spective. Authors from academia are actively engaged in research in software
testing, whereas those from the industry have first-hand experience in coding and
testing. This book attempts to identify trends and some recurring issues in software
testing, especially as pertaining to test debt, agile testing, security testing and
uncertainty in testing, separation modules, design structure matrix and testing as a
service.
Taking short cuts in system testing results in test debt and the process of system
development must lead to reduce such debts. Keeping this in view, the agile testing
method is explored. Considering user concerns on security and the impact that
uncertainty makes in software system development, two chapters are included on
these topics. System architecture plays an important role in system testing.
Architectural components are required to be non-interfering and keep together for
making a stable system. Keeping this in view chapters on separation and design
structure matrix are included in the book. Finally, a chapter on testing as a service is
also included. In addition to introducing the concepts involved, the authors have
made attempts to provide leads to practical realization of these concepts. With this
aim, they have presented frameworks and illustrations that provide enough hints
towards system realization.
This book promises to provide insights to readers having varied interest in
software testing. It covers an appreciable spread of the issues related to software
testing. And every chapter intends to motivate readers on the specialties and the
challenges that lie within. Of course, this is not a claim that each chapter deals with
an issue exhaustively. But we sincerely hope that both conversant and novice
readers will find this book equally interesting.
We hope this book is useful for students, researchers and professionals looking
forward to explore the frontiers of software testing.
xvii
Understanding Test Debt
Keywords Software testing Technical debt Test debt Test smells Test
code Refactoring test cases
Test design
Effective testing Testing
approaches Test debt case studies
1 Introduction
G. Samarthyam (&)
Bangalore, India
e-mail: [email protected]
M. Muralidharan
Siemens Research and Technology Center, Bangalore, India
e-mail: [email protected]
R.K. Anna
Symantec Software, Pune, India
e-mail: [email protected]
Technical debt is the debt that accrues when we knowingly or unknowingly make
wrong or non-optimal technical decisions [1]. Ward Cunningham coined the term
“technical debt” [2] wherein long term software quality is traded for short-term
gains in development speed. He used the metaphor of financial debt to indicate how
incurring debt in the short run is beneficial but hurts badly in the long run if not
repaid.
We incur financial debt when we take a loan. The debt does not cause problems
as long as we repay the debt. However, if we do not repay debt, the interest on the
debt keeps building over a period of time and finally we may land in a situation
where we are not in a position to repay the accumulated financial debt leading to
financial bankruptcy. Technical debt is akin to financial debt. When we take
shortcuts during the development process either knowingly or unknowingly, we
incur debts. Such debts do not cause problems when we repay the debt by per-
forming refactoring to address the shortcut. If we keep accumulating technical debts
without taking steps to repay it, the situation leads the software to “technical
bankruptcy”.
A high technical debt in a software project impacts the project in multiple ways.
Ward Cunningham talks about the impact of technical debt on engineering orga-
nizations in this original paper [2]: “Entire engineering organizations can be
brought to a stand-still under the debt load …”. Israel Gat warns about technical
debt [3]: “If neglected, technical debt can lead to a broad spectrum of difficulties,
from collapsed roadmaps to an inability to respond to customer problems in a
timely manner.” Specifically, the Cost of Change (CoC) increases with increasing
technical debt [4] that leads to poor productivity. In addition, it starts the vicious
cycle where poor productivity leads to more focus on features (rather than asso-
ciated quality) in the limited time, and that leads to more technical debts due to time
constraints. Apart from technical consequences, high technical debt impacts the
morale and motivation of the development teams in a negative manner [1].
There are several well-known causes that lead to technical debt, including:
• Schedule pressures.
• Lack of skilled engineers.
• Lack of documentation.
Understanding Test Debt 3
The chapter has discussed already a few typical causes of technical debt and same
contribute to test debt as well. Software teams are typically under tremendous
schedule pressure to quickly deliver value to their stakeholders. Often, in such
situations, due to lack of knowledge and awareness about test debt, the teams
sacrifice testing best practices and incur test debt.
Lack of skilled engineers also contributes to test debt. Developers are typically
not trained on unit testing during their university degree and often learn writing unit
tests “on-the-job”. It has been observed that “the quality of unit tests is mainly
dependent on the quality of the engineer who wrote the tests” [10]. The observation
indicate that unit tests written by inexperienced engineers are harder to maintain and
contribute to test debt.
“Number obsession” is another not-so-evident but common cause of test debt
that occurs when software teams focus solely on the numbers associated with tests
such as coverage and total tests count. Their obsession is just to satisfy their
management; however, in reality they sacrifice the test quality while achieving
“numbers” and contribute to test debt.
Inadequate test infrastructure tools also contributes to test debt. For instance,
consider that a test team uses an out-of-date test automation framework (i.e.,
framework that is nearing end-of-life or obsolete). The more the team postpones
Understanding Test Debt 5
replacing the framework with a new framework, more tests are written in the older
set-up. So, it costs more to upgrade to the newer versions and hence contribute to
accumulating test debt.
Additionally, there are many other causes of test debt, such as inadequate test
planning and estimation errors in test effort planning.
There are numerous aspects that contribute to test debt. We propose a classification
of test debt based on the scope and type of testing performed1 (Fig. 2).
There are numerous short-cuts in unit testing that contribute to test debt (many of
them covered by Deursen et al. [11]). Here we cover a few important aspects
relating to unit testing that contribute to test debt.
• Inadequate unit tests. Consider the scenario in which unit tests are inadequate
or missing in a project. In such cases, the defects remain latent for longer time
and may get caught at higher level tests (such as system tests). This leads to
more time to find, debug, and fix the defect.
• Obscure unit tests. Unit tests act as live documentation—they help understand
the functionality of the underlying code. When unit tests are obscure, it becomes
difficult to understand the unit test code as well as the production code for which
the tests are written.
• Unit tests with external dependencies. Unit tests should execute independently—
they should isolate external dependencies while testing the subject under test.
Failing to isolate real dependencies leads to many problems such as slow unit
tests, failing unit tests due to a wrong reason, and inability to execute the tests on
a build machine. For example, consider a case where a unit test directly accesses
a database instead of isolating that dependency. The requirement to connect to
the database and fetch the data will slow down the unit test. Further, the test may
fail even though the underlying functionality is correct due to external reasons
such as network failure.
• Improper asserts in unit tests. There are many situations where wrong or
non-optimal usage of asserts results in test debt. For instance, many a times
where a “number obsessed” team adopts a bad practice wherein unit tests are
written without necessary asserts to boost the coverage. Similarly, assert con-
ditions are repeated that do not add any value to the test. Further, in many
1
Note that the categories in this classification are neither mutually exclusive nor jointly exhaustive.
6 G. Samarthyam et al.
situations, the provided assert conditions are wrong—in that case, it takes more
time to debug the code to figure out the cause of the failure. Yet another case of
improper asserts occurs when the test conditions are expected to be checked
manually instead of providing them in assert statements. Such tests require
manual intervention and effort to check the condition(s) every time the test is run
that negate the benefit of having automated tests.
The exploratory testing (or freeform testing) approach does not rely on a formal test
case definition. In this form of testing, testers come up with tests and run them
based on their intuition and knowledge instead of formally designing test cases [12].
Practitioners find exploratory testing attractive because of its creative, uncon-
strained, and cost-effective characteristics. Exploratory testing is also effective for
certain kinds of tests. For example, security and penetration testing require con-
siderable expertise and creativity from the test engineers to unearth security
vulnerabilities.
However, overreliance on exploratory testing can undermine the overall effec-
tiveness of the testing approach; some of the factors are:
• Exploratory testing typically increases effort and cost as the complexity of the
application grows because it is harder to cover a large application with a few
tests that the testers come up with rather than adopting a structured test man-
agement approach.
• Test managers have difficulty understanding and monitoring testing progress.
Understanding Test Debt 7
• Re-execution of tests is both difficult and expensive because the test cases are
not documented. When test cases are not documented, the likelihood of some
functionalities not tested is high, and hence the possibility of defects slipping to
the field is also high.
Hence, we find that depending exclusively on exploratory testing contributes to
test debt. However, when exploratory testing complements manual tests and
automated tests, it is beneficial. In this context, following aspects can be identified
that contribute to test debt.
• Inaccurate assessments. The test-result assessment based on the missing Oracle
in exploratory testing would result in additional rework due to more residual
defects, directly affecting the maintenance costs.
• Inexperienced testers. The main strength of exploratory testing derives from the
experience of the testers and their domain knowledge. However, when inex-
perienced testers are employed to perform exploratory testing, the testing is
ineffective. Due to suboptimal tester fitness and non-uniform test accuracy over
the whole system, it leads to accumulation of residual defects.
• Poor documentation. Because there is no documentation maintained when
exploratory testing is performed, it jeopardizes knowledge management in the
company resulting in higher maintenance costs. For instance, without docu-
mentation, testers new to the team will find it difficult to get up to speed and
contribute effectively.
In addition, when proper logs of past testing activities are not maintained while
performing exploratory testing, it can lead to other problems such as estimation
errors in effort planning. Such improper effort estimation causes test debt because it
encourages test engineers to take short-cuts in testing to meet deadlines.
In manual testing, test engineers execute test cases by following the steps described
in a test specification. Here is a list of common contributors to test debt that relates
to manual testing.
• Limited test execution. Many a time, the available resources and time to carry
out complete and comprehensive testing is not sufficient. Hence, testers execute
only a subset of tests for a release (i.e., a shortcut), thereby increasing the
possibility of residual defects.
• Improper test design. Manual testing is a time consuming and arduous process.
It is the tester’s responsibility to carry out all the documented tests and record
their outcomes. A test case could be designed with lot of variations using
different data combinations but then it is necessary for tester to execute all these
combinations. Since execution of all combination of test cases is a laborious
8 G. Samarthyam et al.
process for testers, they restrict themselves to few happy scenarios for testing
taking a short cut in test design. This increases the risk of residual defects in the
System Under Test (SUT).
• Missing test reviews. Test reviews help improve quality of test cases and also
help in finding the problems earlier. Due to time pressures or process not
mandating test reviews, testers take short cut by skipping test review. Missing
test review could delay finding defects or increase maintenance of test cases.
Automation testing involves executing pre-scripted tests to execute and check the
results without manual intervention. Depending purely on manual testing and
lacking automation itself would contribute towards test debt. For instance, it is
essential to perform regression testing whenever the code is changed, and auto-
mated tests are key to effective regression testing. When automated tests are
missing, it makes regression testing harder and increases the chances of residual
defects, thereby decreasing the quality of the software and increasing the cost of
fixing defects when they are reported later by the customers.
Here we consider contributors to test debt in projects where automated testing is
performed. Since unit tests are developer centric, we consider debt related to unit
tests different from automation test debt here.
• Inappropriate or inadequate infrastructure. Tests being conducted in infras-
tructure not resembling customers infrastructure could lead to unpredictable
behavior of the software resulting in reduced confidence on the system. Relevant
hardware components or devices must be available for testing. For instance, in
the context of embedded systems where both hardware and software compo-
nents are involved, testing is often performed using emulators or simulators.
Though it simplifies testing, skipping tests on actual hardware devices or
components is a form of short-cut and it contributes to test debt.
• Lack of coding standards. Automated tests need to adhere to common coding
standard. When a common coding standard is not adopted and followed, it
increases the effort to maintain the test code.
• Unfixed (broken) tests. Not fixing broken tests or tests not being updated when
functionality changes reduces confidence on the tests, decreases quality and
increases maintenance effort.
• Using record and replay. It is easy to use record and replay tools for testing
Graphical User Interface (GUI) applications. However, using record and replay
has numerous drawbacks—for example, even a slight change in a UI can result
in the need for updating the test(s). Better alternatives may be available to use
instead of record and replay tools. Hence, when record and replay tools are used
Understanding Test Debt 9
for GUI testing because they are easy to create (a form of shortcut), in the long
run, it increases maintenance effort and contributes to test debt.
Incurring debt is inevitable given the realities of software development projects that
operate under schedule pressures with resource and cost constraints. The most
important aspect in managing test debt (and technical debt in general) is to track the
debt and periodically repay it to keep it under control. The key is to adopt a diligent
and pragmatic approach towards evaluating and repaying technical debt.
However it is important to note that sometimes incurring debt is beneficial since
it empowers the team to meet its goals. For example, it may not be possible to
complete addressing all the findings from a test review because of the immediate
release deadline that must be met. Not addressing the findings from test review
incurs test debt, but it is acceptable so long as a plan is in place to address the
findings in the upcoming releases and the plan is properly tracked and executed to
repay the debt.
Another important aspect towards managing test debt is to “prevent” accumu-
lation of test debt. Prevention can be achieved by increasing awareness in the
development and test teams on test debt. Introducing relevant processes can also
help stop accumulation of debt. For example, the focus given for “clean coding”
[13] practices for software code needs to be given for test code as well. When
existing processes are not adhered to or followed well, they can be strengthened
again. For example, regular test reviews provide feedback loop to improve the
quality of tests.
The key to manage test debt is to “repay” test debt. Often, software projects have
accumulated considerable amount of test debt over a period of time. In such situ-
ations, it is important to evaluate the extent of test debt and plan to repay it. In
real-world projects, it is not feasible or practical to stop writing new tests and focus
exclusively on repaying test debt. Hence, a balanced and pragmatic approach is
needed to repay debt that takes into account the project and organizational
dynamics. With this in mind, we discuss steps for repaying debt based on our
experience in industrial projects.
10 G. Samarthyam et al.
1. Quantify the test debt, get buy-in from management, and perform large-scale
repayment
When the debt is really high, it may be necessary for a project team to repay test
debt by spending many months of dedicated effort to repay debt. Buy-in (i.e.,
acceptance and support) from management is often necessary for repaying test
debt in these cases. To get buy-in, it is important to quantify the extent of test debt
in the projects. For software code bases, numerous tools (such as SonarQube tool
[14]) are already available which quantify technical debt for code and design debt
dimensions. Since test debt is an emerging topic, tools are yet to emerge that
quantify test debt. However, quantification can be still performed manually using
simple tools such as excel sheets. Note that the term “technical debt” itself was
coined as a metaphor to communicate the costs of taking short-cuts in devel-
opment projects; hence it is important to be specific about the kinds of test debt
incurred in the project and quantify (to the extent possible) to present the extent
of debt to the management. Presentation to the management on test debt for
getting their buy-in for repaying the debt should also include a plan for the team
on how they will be repaying the debt. Such a plan would mainly cover time
schedule, resources required and how they would be coordinated.
2. Pay technical debt periodically
In projects where the debt has not reached critical levels, it is better for the team
to repay debt in the form of small installments—equivalent of Equated Monthly
Installments (EMIs) in financial domain—is desirable and practical. In this
approach, make test debt repayment as part of every release along with the
regular work of the test team. The Boy’s scout rule “always leave the camp-
ground cleaner than you found it” is applicable for software development as
well: “check-in the code better than you checked-out”. Whenever developers or
test engineers touch the tests, encourage them to improve the tests by refactoring
them before checking them in.
3. Avoid increasing test debt
Strategic approaches to managing test debt needs to be taken to avoid increasing
test debts debt in future. To give an example, consider introducing Test Driven
Development (TDD) [15] to the development team (in a team that hasn’t
adopted it yet). In TDD, the developer first writes an (initially failing) automated
test case that defines a desired improvement or new function, then produces the
minimum amount of code to pass that test, and finally refactors the new code
and test to acceptable standards. In TDD, code for automated unit tests always
gets written, executed and refactored. Hence, introducing TDD in a develop-
ment team is a strategic approach towards managing test debt.
It is sometimes surprising to observe that software teams don’t give the same
importance to test code as given to application code shipped to customers. Towards
rectifying this situation, it is imperative to follow best practices for managing code
and design debt for reduction of test debt.
• Pair programming. Pair programming involves two programmers collaborating
by working side-by-side to develop software; one developer is a driver who is
involved in writing code and another one is a navigator who observes and helps
the driver [16]. Benefits of pair programming include reduced defects, enhanced
technical skills and improves team communication [17]. Test engineers, writing
tests using pair programming, is also likely to bring in the same benefits.
• Clean coding. As Robert C Martin observes [13]: “Even bad code can function.
But if code isn’t clean, it can bring a development organization to its knees.
Every year, countless hours and significant resources are lost because of poorly
written code.” Clean coding practices are now getting widely practiced with the
spirit of craftsmanship for developing high-quality software [18]. In the same
vein, even bad tests can find defects, but it is important that the tests are clean.
Hence, it is important to adopt clean testing practices as a strategic approach for
managing test debt.
• Refactoring. The traditional meaning of refactoring—“behavior preserving
program transformations” [19]—is applicable for code in unit tests as well.
Refactoring for unit tests is defined as transformation(s) of unit test code that do
not add or remove test cases, and make test code better understandable or
maintainable [11]. Refactoring is best performed as part of the development
process, i.e., whenever the code is being changed (for fixing or enhancing code).
In the same vein, test refactoring is best performed whenever the tests are being
touched. The longer the refactoring is delayed, and more code is written to use
the current form, it results in piling up of debt. Hence, refactoring unit tests and
automated tests is the key practice for keeping test debt under control.
An important observation here is that these are not isolated practices—they are
interdependent to each other. For instance, adopting clean coding practices requires
making changes to the code for addressing static analysis violations, adopting and
then ensuring that the whole codebase is written following a common coding style,
etc. In this process, the developer making changes also naturally finds improvement
opportunities in designs and performs refactoring. Any code refactoring needs
corresponding unit tests to pass. When unit tests fail, typically, it is the code that is
the culprit and that needs to be fixed. However, in many cases, the tests may also
12 G. Samarthyam et al.
There are many testing practices that software teams can adopt for strategic man-
agement of test debt.
• Ten minute builds. Quick and complete builds are important for reducing the
turn-around time for making changes to the code in developer’s machine to
moving it to production. A rule of thumb is that it should take only 10 min to
build, test, and deploy the software: starting from compiling the code, running
tests, configuring the machines (registry setting, copying the files etc.), firing up
the processes, etc. should take only 10 min or so and not hours [22]. In legacy
software, it is not uncommon to see builds and deployment processes taking
many hours—speeding it up to the order of 10 min is feasible and can be
achieved by adopting relevant practices (such as incremental compilation) [23].
• Scriptless test automation. Conventional approach for GUI testing is to use
record-and-playback tools: they are quick to create but are harder to maintain
(e.g., even small changes in GUI can easily break the tests). GUI testing can be
automated with scripts, but is effort intensive and requires considerably skilled
test engineers; further, they also have maintenance problems. An emerging
approach is to perform scriptless automation in which tests are composed from
objects and actions through a GUI. The scriptless approach are less susceptible
to changes in the GUI, technology agnostic, accepting late changes, and are
relatively easier to maintain [24].
4 Case Studies
In this section, we discuss two case studies relating to test debt in industrial pro-
jects. These two case studies are derived from consultancy experience of one of the
co-authors in this chapter. In documenting these case studies we have taken care not
to mention any organisational or project-specific details.
Understanding Test Debt 13
This case study concerns a software product consisting of around 30 KLOC (Kilo
Lines of Code) in a very large organization. It is a C++ software that has evolved
over a period of eight years. Though the size of the code is small, it exposes more
than a thousand Application Programming Interface (API) functions and can also be
invoked through command line. It is a critical software used by a large number of
teams within the organization. The application has to work on different platforms
(Linux flavours, Windows and Mac). Team consists of 4 developers and 3 testers.
A release is made every three to six months, with around 500–600 lines changed or
added in every release.
The development process used was Waterfall model. Only manual testing was
performed in this software when the feature additions or bug fixes were made.
There was no unit testing performed in this software. No tools or processes were
used by either developers or the testers to improve code quality. Test machines and
test infrastructure used were obsolete. There were delays to get new hardware for
testing because of arduous and time-consuming procurement processes followed in
the organization.
The main problem that concerned the project team and the management was that
the internal users were not satisfied because of the numerous defects in the software.
Developers in the team found it difficult to make changes to the software (though it
is a small software) because there was no automated regression testing in place.
Since the test team had to perform testing manually, it took considerable amount
time and effort from them.
One could observe that lack of automated (unit and functional) tests contribute to
test debt. Further, processes in place were not conducive to creating high-quality
software given the criticality of the software, or instance, lack of tools and processes
for improving code quality.
The project team, with buy-in from management, took steps to address test debt
in this project. The first step was to automate the tests and the team used GTest and
Perl scripts. Automation was not challenging because the software was exposed as
an API. Unit tests were implemented using CppUnit. For the command-line ver-
sion, the team introduced tools such as Cucumber and Robot. The team also
introduced BullsEye code coverage tool to track the extent of code covered by tests.
These changes took more than six months of time to implement and it delayed
the next release of the software. But the changes were got beneficial to the product
as the number of defects that slipped in, reduced. Further, the development team
had more confidence in making changes because of the safety net available in the
form of automated regression tests. The test team had a difficult time learning how
to automate, but after automation was complete, they had more time to focus on
finding hard-to-find defects using manual testing. The team now plans to integrate
code quality tools as part of builds and plans to modernize its test infrastructure.
This case study illustrates how test debt can have significant impact even in a
small code base. Tool driven approach and getting buy-in were strategic to the
14 G. Samarthyam et al.
Before the end of knowledge transfer for the product, the new test team per-
formed a detailed analysis for estimating effort required for testing. Since it was a
critical product all tests needed to be executed and it was estimated it would require
2 calendar years for complete execution of the tests. However, given the require-
ment that the next release should happen in one year timeframe, the team had to find
a way to optimize the tests.
Since many of the tests in integration and system tests were the same due to
duplication, the new test team removed redundant tests after careful analysis.
Further, within the tests, many test steps were duplicated. Hence the team involved
the product owner for analysis and review. Careful optimization of the tests reduced
the integration tests from 1000 to 400 and system tests from 2000 to 700. The
product owner gave considerable inputs on the correctness of these condensed tests.
To further optimize the time required for testing, the test team used a risk-based
approach to prioritize the tests based on the criticality of the functionality. With that
the tests were assigned with one of three priority levels. Priority 1 and 2 were
considered mandatory for test execution whereas priority 3 tests could be executed
opportunistically. After completing knowledge transfer (which took 6 months
time), it took 8 months time to complete these steps to reach to a state where testing
could actually begin.
Reflecting on this experience, we find that test debt accumulated over a decade
had the impact of considerably slowing down the testing process. Further, huge
amount of effort was unnecessarily wasted due to the accumulated debt. The
specific kind of debt accumulated in this case relates to extensive test duplication.
This shows that test debt is a key concern that software teams must care about.
Though automated tests were present in this software project, they were not in
usable state. Whenever automated tests are broken due to changes in the software, it
is important to get them working; hence, continuous maintenance and refactoring is
important for automation testing.
In the upcoming release, the new test team is taking many steps for improving
the test scenario such as bringing in smoke tests, introducing exploratory tests. The
team also plans to get automated testing in place and improve the test review
processes.
5 Future Directions
Technical debt as a metaphor has been gaining wider acceptance in academia and
industry in the last decade. There are books [25], workshops [26], numerous
research papers and conference talks covering technical debt. Some dimensions of
technical debt—such as code and architecture debts—have received attention of
software engineering community. Though there are some papers, blogs and articles
that cover test debt, it is yet to gain wide attention from the community. In this
section we outline some of the areas that need further exploration.
16 G. Samarthyam et al.
• The concept of “test smells” has been widely used for aspects that indicate test
debt. Meszaros [27] differentiates between three kinds of test smells: code smells
are bad smells that occur within the test code; behaviour smells affect the
outcome of tests as they execute; project smells are indicators of the overall
health of a project. Of these three kinds, code smells in unit testing has received
considerable attention from the community (such as [11, 27–30]). However,
other kinds of test smells such as behavioral smells and project smells haven’t
yet received much attention from the community.
• Mainstream Integrated Development Environments (IDEs) including Visual
Studio, Eclipse, and IntelliJ IDEA support automated code refactoring. Though
there are test smell detection tools available (such as TestLint [31]), we are not
aware of any tools or IDEs that support automated refactoring for test smells.
• There are technical debt quantification tools that cover code and design
dimensions (for example, SonarQube tool [14]). Tools that quantify test debt are
yet to emerge.
• Test debt is an excellent metaphor to communicate the cost of short-cuts taken
during the testing process to the management. In this context, test maturity
assessment frameworks and methods (such as [32, 33]) can take test debt into
consideration.
Given the importance of managing test debt in industrial projects, we hope that
software testing community would address these areas in future.
References
11. A.D. Leon, M.F. Moonen, A. Bergh, G. Kok, Refactoring test code. Technical report (CWI,
Amsterdam, The Netherlands, 2001)
12. S.M.A. Shah, M. Torchiano, A. Vetro, M. Morisio, Exploratory testing as a source of
technical debt”, IT Prof. 16 (2014)
13. R.C. Martin, Clean code: a handbook of agile software craftsmanship (Prentice Hall, USA,
2009)
14. G. Campbell, Patroklos P. Papapetrou, SonarQube in action (Manning Publications Co., USA,
2013)
15. K. Beck, Test-driven development: by example (Addison-Wesley Professional, USA, 2003)
16. L. Williams, R.R. Kessler, Pair programming illuminated (Addison-Wesley Professional,
USA, 2003)
17. A. Cockburn, L. Williams, The costs and benefits of pair programming. Extreme
Programming Examined (2000)
18. S. Mancuso, The software craftsman: professionalism, Pragmatism, Pride (Prentice Hall,
USA, 2014)
19. W.F. Opdyke, Refactoring object-oriented frameworks, Ph.D. thesis (University of Illinois at
Urbana-Champaign, Illinois, 1992)
20. A. van Deursen, L. Moonen, The video store revisited—thoughts on refactoring and testing.
Proceedings of International Conference on eXtreme Programming and Flexible Processes in
Software Engineering (XP) (Alghero, Italy, 2002), pp. 71–76
21. O. Hazzan (ed.), Agile processes in software engineering and extreme programming.
Proceedings of XP 2011 (Springer, Berlin, 2011)
22. K. Beck, C. Andres, Extreme programming explained: embrace change, 2nd edn.
(Addison-Wesley, USA, 2004)
23. J. Shore, S. Warden, The art of agile development (O’Reilly Media, USA, 2007)
24. V. Moncompu, Agile test automation: transition challenges and ways to overcome them.
Pacific NW Software Quality Conferences (2013)
25. Managing Software Debt, Building for inevitable change (Addison-Wesley Professional, Chris
Sterling, 2010)
26. Workshop Series on Managing Techincal Debt, Carnegie Mellon—Software Engineering
Institute, http://www.sei.cmu.edu/community/td2014/series/. Last accessed 29 Aug 2015
27. G. Meszaros, xUnit test patterns: refactoring test code (Addison-Wesley, USA, 2007)
28. B. Van Rompaey, et al., On the detection of test smells: a metrics-based approach for general
fixture and eager test. IEEE Transac. Softw. Eng. (2007)
29. H. Neukirchen, M. Bisanz, Utilising code smells to detect quality problems in TTCN-3 test
suites. Test. Softw. Commun. Syst. (Springer, Berlin, 2007)
30. B. Gabriele, et al., An empirical analysis of the distribution of unit test smells and their impact
on software maintenance. 28th IEEE International Conference on Software Maintenance
(ICSM, UK, 2012)
31. S. Reichhart, T. Girba, S. Ducasse, Rule-based assessment of test quality. J. Object Technol. 6
(9), 231–251 (2007)
32. I. Burnstein, A. Homyen, R. Grom C.R. Carlson, A model to assess testing process maturity.
CROSSTALK (1998)
33. J. Andersin, TPI–a model for test process improvement. Seminar on Quality Models for
Software Engineering (2004)
Agile Testing
Abstract In recent times, software development has to be flexible and dynamic due
to ever-changing customer needs and high competitive pressure. This competitive
pressure increases the importance of Agile methods in software development and
testing practices. Traditional testing methods treat development and testing as a
two-team two-step process. The process discovers bugs in software at later stage of
development. Further, the process frequently leads to an internal division between
teams. Agile testing combines test and development teams around the principles of
collaboration, transparency, flexibility, and retrospection. This testing enables the
organization to be nimble about uncertain priorities and requirements. It helps to
achieve higher quality in software products. This chapter with a brief introduction on
Agile-based software engineering deals with Agile-based testing. Agile testing
focuses on test-first approaches, continuous integration (CI), and build–test–release
engineering practices. The chapter also explains advantages and disadvantages of
Agile testing practices. The process is explained with an example.
Keywords Agile testing Testing in parallel Team collaboration Continuous
improvement Test engineering
1 Introduction
among developers and testers. The cost of change to fix a bug increases expo-
nentially based on the time delay between the introduction of a bug and its finding.
This phenomenon is described in Fig. 1.
Testing as close to development as possible is the key to Agile testing. Critics
often call traditional methods as heavily regulated, regimented, and micro-managed.
New Agile and light-weight methods evolved in mid-1990s to avoid shortcomings
in traditional methods. Once Agile manifesto defined in 2001, its practice has come
to prominence. The common aspect of all Agile methodologies is delivering soft-
ware in iterations, keeping user priority in mind. Changes in requirements may
change iteration priorities, but it is easy to reschedule the iterations as situation
demands. The majority of successful Agile teams used the best development
practices to come up with their own flavor of agility. Scrum and XP are two such
popular Agile methods.
Agile testing means testing within the context of an Agile workflow. An Agile
team is usually a cross-functional team, and Agile testing involves all of them in the
process. Tester’s particular expertise is used to embed quality into the deliverables
at every iteration (Fig. 2).
Fig. 2 As per CHAOS Manifesto 2012, a project is “Successful” means “Project is on budget and
on time as initially specified”. A project is “Challenged” means “Completed but over budget, over
time”. A project is “Failed” means “Project canceled at some point”. Source The CHAOS
manifesto, The Standish group, 2012
Agile Testing 21
According to CHAOS Manifesto, Agile methods are about three times more
successful than traditional methods as shown in Fig. 2 [1]. Keeping in this view,
Agile testing has gained acceptance among practitioners.
In this chapter, Agile testing process is described in detail using a simple
illustration. In Sect. 2, a short description on traditional testing process is described.
Sections 3 and 4, respectively, present discussions on Agile-based software engi-
neering and Agile-based testing. Section 5 presents an example explaining Agile
testing process. Next section briefs on engineering process in Agile testing.
Section 7 presents a brief analysis highlighting advantages and disadvantages in
Agile testing. The chapter ends with a concluding remark in the next section.
Traditional testing practices revolve around the testing center of excellence model
[1]. A group of seasoned testers form as a separate testing team, who has the best
customer focus, motivation, and intuition. Development team handles code delivery
and fixing bugs.
Typically, after developers deliver code, testing team tries to find as many bugs
as possible. Developers keep poor attention to quality while developing code.
Consequences are harder and costly to fix.
Traditional testing teams keep costs low by using tools to document test cases
and bugs and sometimes through labor outsourcing. However, this does not reduce
systemic issues and shift costs back upstream into the development cycle. It results
in higher levels of scrap and rework.
Preparing detailed test cases appear to help and optimize testing, but exacerbate
the problem whenever requirements change. Change in requirements is almost
unavoidable. Extensive efforts of testing activities slow down delivery. Even
throwing a phalanx of testers is not very efficient. In reality, daily builds, functional
and nonfunctional testing cadence overhead nullify any gains achieved due to
specialized testing teams and practices.
Usually, testing is pushed to end of projects and gets squeezed. Unfortunately,
when projects fall behind schedule, teams compress and sacrifice testing time to
make up for delays in other processes. Thus, quality is always compromised.
Last-minute discovery of defects results in waste and high rates of rework.
Longer it takes to provide feedback to developers, longer it takes to fix them.
Developers take additional time to re-engage with the context of the code as they
move on to the new project and new problems. The problem is worse if that
last-minute bug is an architecture or design issue. A simple misunderstood fun-
damental requirement can cause havoc to timelines if discovered last minute.
Teams build up too much technical debt (also known as code debt or design
debt). Technical debt is a metaphor referring to the possible future work of any
system architecture, design, and development within a codebase [2]. Solution to
these problems is to push testing to earlier phases of the development cycle.
22 J.R. Penmetsa
The Agile manifesto, principles, and values of Agile software engineering are
formed revealing better ways of producing software.
Agile Manifesto uncovers better ways of developing software by giving value to:
• Individuals and interactions over processes and tools,
• Working software over comprehensive documentation,
• Customer collaboration over contract negotiation,
• Responding to change over following a plan.
that could possibly work. Testing is pushed to the extreme of test-driven devel-
opment and continuous integration (CI).
Managers, customers, developers, and testers are all equal participants in a
collaborative team. XP enables teams to be highly productive by implementing a
simple, yet effective self-organizing team environment [4]. XP improves a software
deliverable in five essential ways; simplicity, communication, respect, courage, and
feedback. Extreme Programmers regularly communicate with their fellow devel-
opers and customers. They keep their design clean and simple.
Testers start testing on day one and developers get feedback immediately. Team
delivers the system as soon as possible to the customers and implements suggested
changes. Every small success increases their respect for the individual contributions
of team members. With this foundation, Extreme Programmers can courageously
respond to changing technology and requirements [4].
3.4 Scrum
The sprint ends with a sprint retrospective and review. And the whole process
repeats, with next chunk of work from the product backlog [5] (Fig. 4).
After understanding Agile processes in detail, it is time to explore Agile-based
testing (at times referred as “Agile testing”) and how it helps to achieve higher
quality.
4 Agile-based Testing
Agile-based testing means testing within the context of an Agile process such as
Scrum or XP. Agile suggests development and testing, two essential functions of
software development, proceed concurrently. Agile testing emphasizes on positive
desire to implement a solution that passes the test, confirmation of user stories,
whereas traditional testing methods focus on negative desire to break the solution,
falsification of given solution.
Development and testing teams are united based on Agile principles to perform
Agile testing. Illusory divide between code breakers (testers) and code creators
(developers) is reduced. The necessity of both development and testing roles is
respected. The testing team works to provide feedback to developers as soon as
possible. This integration implies both that developers cultivate skills of testers and
testers understand the logic in development.
Applying Agile concepts to the testing and QA process results in a more efficient
testing process. Pair programming is encouraged to provide instant feedback, and
also test-driven methodologies come handy, where tests can be written before the
actual development. Lack of automation is the principle barrier to provide instant
feedback to developers. End-to-end automated testing process integrated into
development process achieves continuous incremental delivery of features.
In short, Agile requires teamwork and constant collaboration among software
development teams and stakeholders. Quality is “baked into” the product at every
phase of its development through continuous feedback. Everyone in the team holds
a vision of the final product.
Working software is preferred over documentation and produced in small iter-
ations frequently. However, every iteration delivered is in line with demands of the
final system. Every iteration has to be fully integrated and carefully tested as a final
production release.
In Agile testing, there is no strict adherence to requirement documents and
checklists. The goal is always to do what is necessary to complete customer’s
requests. Documentation is often replaced with constant collaboration through
in-person meetings and automated tests. Only essential and minimal documentation
is maintained. Unified self-organizing project teams work closely together to
achieve a high-quality software product. The notion of separate testing team dis-
appears entirely. In some cases, it might be necessary to do separate release can-
didate testing for not to find problems or bugs. It is just performed for verification,
trail audit completion, and regulatory compliance.
26 J.R. Penmetsa
5 Illustration
Let us consider a use case to design a small messaging service to understand the
process better. This new service sends a message to the user if the user is online.
Otherwise, it would persist until the user comes online. Assume Scrum process is
being followed to implement this use case.
Developers and testers discuss customer requirements and identify the tests that
need to be designed for acceptance criteria. A lot of questions are raised by the
testing team during this discussion to drive clarity for the requirements (e.g.,
message persistence duration, the number of messages limits, and the priority of
messages). Development teams focus on identifying challenges in solution design.
These discussions usually happen in pre-iteration planning meetings (also referred
as backlog grooming meetings). Also, the work is divided into several small user
stories. User stories are one of the primary development artifacts for Scrum project
teams. A user story is a simplified representation of a software feature from a
customer point of view. It describes what customer wants in terms of acceptance
criteria. Also, it explains why the customer needs that feature that helps in the
understanding of business value.
For example, following user stories are identified for the “small messaging
service”:
The division of the user stories is completely at the discretion of the team and
business owner and what they think appropriate to divide the work. Prioritization of
the stories is also done during the meeting based on the effort and priority from the
business owner.
After prioritization of stories is completed, and a backlog is prepared, a
sprint/iteration can be started by taking a small chunk from the top of the backlog.
In the sprint, developers and testers work in parallel. Every day, the team meets and
discusses what has to be done that day and what testing has to be done to verify the
work. Quality is thus embedded into working software delivered by the team with
constant collaboration, negotiating better designs for easy testability.
Testers participate from day one and insist on writing code that is testable. This
emphasis often leads to use design patterns extensively and achieves better code
layout, architecture, and quality. Design pattern means a reusable solution to a
commonly occurring problem within a given context in software design.
Agile Testing 27
In the present illustration, as testers and developers approach first user story in
Table 1, they have to agree on a standard API or interface to write their code or test,
to start their work in parallel as in interface-based programming pattern. Teams
agree upon interface “FetchUndeliveredUserMessageResource” that contains
“fetchUndeliveredUserMessages” method, to accept “userId” as the parameter as
described in Table 2. Testers start writing their integration tests, and developers
start their implementation in parallel.
Interface-based programming is claimed to increase the modularity of the
application and hence its maintainability. However, to guarantee low coupling or
high cohesion, we need to follow a couple of guiding principles. First one is single
responsibility principle. It suggests that every class should have the responsibility for
a single functionality or behavior, and the class should encapsulate that imple-
mentation of that responsibility. This principle keeps tedious bug prone code con-
tained. The second one is interface segregation principle. It suggests that no client
should be forced to depend on methods that do not use. In that cases, interfaces are
from the third party, using adapter pattern that makes the code well organized. The
basic idea of adapter pattern is changing one interface to the desired interface.
In the present illustration, as testers and developers approach second user story
in the Table 1, the team identifies the need to separate database retrieval so that it
can be tested independently and defines “PersistMessage” interface to access data
from the database as described in Table 3. This repository design pattern comes
handy to separate business logic, and data model, and retrieval. This pattern cen-
tralizes the data logic or Web service access logic and provides a substitution point
for the unit tests and makes it flexible.
In the present illustration, as testers and developers approach third user story in
Table 1, “SendMessageResource” interface has been identified as described in
Table 4. As the testing team writes tests for these user stories, they would design
them to run in isolation and test each story independently and in isolation. It reduces
test complexity and forces development to reduce complex dependencies between
components. As much as possible, no test is made dependent on another test to run.
Also, to reduce interdependencies among different components and classes,
dependency injection pattern is used.
Dependency injection is a software design pattern that implements inversion of
control for software libraries. That means component delegates to external code (the
injector) the responsibility of providing its dependencies. The component is not
allowed to call the injector system. This separation makes clients independent and
easier to write a unit test using stubs or mock objects. Stubs and mock objects are
used to simulate other objects that are not in the test. This ease of testing is the first
benefit noticed when using dependency injection. Better testable code often pro-
duces simple and efficient architected system.
Agile methods do not provide actual solutions but do a pretty good job of surfacing
problems early. The motto is to “catch issues fast and nip them in bud.”
Agile testing team has to test every iteration carefully as if it is a production
release and integrate it into the final product. As a result of repeated integration
tests, testing team needs to design and implement test infrastructure and rapid
development and release tools. The success of Agile testing depends on the team’s
ability to automate test environment creation and automatic release candidate val-
idation and report test metrics.
As teams deliver fully functional features every iteration in Agile practice, they
need to work ways to reduce build, deployment, and running test overhead. CI is a
practice that requires developers to integrate code several times a day. Each time, a
30 J.R. Penmetsa
Automated build system enables the team to do CI. It has become a cornerstone of
Agile development.
Automated build system reduces the cost associated with bug fixing exponen-
tially and improves quality and architecture. There have been several sophisticated
continuous build systems available such as Jenkins [7], Hudson [8], Team
Foundation Server [9], and Apache Continuum [10].
Build systems need to be optimized for performance, ease of use by developers,
incremental execution of tests, and software language agnostic.
In the above illustration, the team used Jenkins as their automated build system.
Every time, a developer checks in code, or tester checks in a test in which a new
build is made, and all the unit and integration tests are executed. If any issues are
found, a build failure email is sent to the team. Jenkins makes the deployment
artifact as an Redhat package manager (RPM). These RPMs are deployed to the
team environment by deployment scripts automatically so that testers can perform
post-deployment verification as needed. Next section does a comparative study of
Agile testing and its advantages and disadvantages.
Let us briefly review the distinction between Agile and Spiral models, as there is
apparent similarity between the two. Then, an analysis is made showing both
advantage and disadvantage in Agile testing [11].
The common aspect of Agile process and Spiral model is iterative development and
differ in most of the other aspects. In case of Spiral model, length of iteration is not
specified, and an iteration can run for months or years. But in Agile model, length
must be small and usually, it is about 2–3 weeks. In Spiral model, iterations are
often well planned out in advance and executed in order of risk. In Agile process,
focus is on one iteration at a time, and priority of user stories drives the execution
order instead of risk. Within each iteration, Spiral model follows traditional
Agile Testing 31
waterfall methodology. Agile recognizes the fact that people build technology and
focuses “test-first” approaches, its collaboration, and interactions within the team.
Knowledge transfer happens naturally between developers and testers as they work
together with constant feedback on each iteration. Junior team members benefit
from active feedback, and testers perceive the job environment as comfortable.
Resilience is built into the teams and nourishes team atmosphere.
Small teams with excellent communication require small amounts of docu-
mentation. This collaboration facilitates learning and understanding for each other.
There is no practical way to comprehend or measure this particular advantage but
produces an awesome environment and team.
Requirements volatility in projects is reduced because of small projects and
timelines. The wastage of unused work (requirements documented, unused com-
ponents implemented, etc.,) is reduced as the customer provides feedback on
developed features and helps prioritize and regularly groom the product backlog.
Customers appreciate active participation in projects.
Small manageable tasks and CI helps process control, transparency, and increase
in quality. Very few stories are in progress at any point in time and handled in
priority order, and this reduces stress on teams. The risk associated with unknown
challenges is well managed. Frequent feedback makes problems and successes
transparent and provides high incentives for developers to deliver high quality.
As bugs are found close to the development, they can be fixed with very limited
extra overhead and thus increase actual time spent on designing and developing
new features (Fig. 5).
The quality of work life improves as the social job environment is trustful,
peaceful, and responsible.
Implementation starts very early in the process, and sometimes, enough time is
not spent on design and architecture. Poor design choices might lead to rework.
Sometimes, dependencies buried inside implementation details are hard to identify,
often causes impediments to the team. Many people would be required to manage
increased number of builds, releases, and environments.
8 Conclusion
Agile testing combines test and development teams around the principles of col-
laboration, flexibility, simplicity, transparency, and retrospection. Its focus is on
positive desire to implement a solution that passes the test, rather than negative
desire to break a solution. Agile practices do not provide actual solutions but do a
pretty good job of surfacing them early. Developers and testers are empowered to
work toward solving problems.
Agile has already won the battle for mainstream acceptance, and its general
precepts are going to remain viable for some time. As empowerment of people is
key to good governance so is for software development. Agile approaches empower
Agile Testing 33
References
1. http://www.computerweekly.com/feature/Why-agile-development-races-ahead-of-traditional-testing
2. https://en.wikipedia.org/wiki/Technical_debt
3. http://www.agilemanifesto.org/
4. http://www.extremeprogramming.org/
5. https://www.scrumalliance.org/why-scrum
6. https://www.scrumalliance.org/scrum/media/ScrumAllianceMedia/Files%20and%20PDFs/
Why%20Scrum/ScrumAlliance-30SecondsFramework-HighRes.pdf
7. https://jenkins-ci.org/
8. http://hudson-ci.org/
9. https://www.visualstudio.com/en-us/products/tfs-overview-vs.aspx
10. https://continuum.apache.org/
11. http://www.sciencedirect.com/science/article/pii/S0164121209000855
12. http://www.ambysoft.com/essays/agileTesting.html
13. http://www.agilemanifesto.org/principles.html
14. https://www.mountaingoatsoftware.com/blog/agile-succeeds-three-times-more-often-than-
waterfall
Security Testing
Keywords Application security testing Secure software development life cycle
Phase embedded security testing
F. Anwer
Department of Computer Science, Aligarh Muslim University, Aligarh, India
e-mail: [email protected]
Mohd. Nazir K. Mustafa (&)
Department of Computer Science, Jamia Millia Islamia (A Central University),
Jamia Nagar, New Delhi, India
e-mail: [email protected]
Mohd. Nazir
e-mail: [email protected]
1 Introduction
Now a days, industry is producing software for almost every domain that is gen-
erally complex and huge in size. Majority of the software are being used openly in
the prevailing distributed computing environment, leading to a range of security
issues in application domain. These issues are exploited by attackers easily and so
the number of attack incidences on software applications is growing rapidly. As the
number of vulnerabilities is enormously increasing, there are possible serious
threats and challenges in future that can cause a severe setback to applications.
Security threats also arises due to distributed and heterogeneous nature of appli-
cations, their multilingual and multimedia features, interactive and responsive
behavior, ever evolving third-party products and the rapidly changing versions.
Recent reports reveal that security threats surrounding the financial and banking
applications have increased dramatically and they continue to evolve.
It is imperative now to develop the software applications, with high security
attributes to ensure reliability. In the view of the prevailing circumstances, security
is one of the major concerns for many software systems and applications. Industry
practitioners assume that installed security perimeter such as antivirus, firewall are
secure enough and they believe them as effective measures for dealing with the
security aspect of an application. With the number of security products growing up,
there is still lack of attention in resolving security issues in online banking systems,
payment gateways, insurance covers etc. IDG survey of IT and security executives
report that nearly 63 % of applications remain untested for critical security vul-
nerabilities [1].
Security is a process, and not a product. It cannot be stapled or incorporated to
an application during finish time, as an afterthought. Security testing defines tests
for security specific and sensitive requirements of software. Security requirements
are functional as well as non-functional and thus require a different way of testing
compared to those applied for primary functional requirements. Security testing is
an effort to reveal those vulnerabilities, which may violate state integrity, input
validity and logic correctness along with the angle of attack vectors that exploit
these vulnerabilities. Further, security testing minimizes the risk of security breach
and ensures confidentiality, integrity and availability of customer transactions. As a
tester, it is harder to devise exhaustive anti-security inputs and hardest to prove
whether application is secure enough against these infinite set of input. Hence, it is
necessary prerequisite to understand security challenges and debatable issues in
order to comprehend existing methodologies and techniques available for security
testing.
Security is not one-time approach too, but a series of activities performed
throughout SDLC starting from very first to the last phase, which leads to SSDLC
(secure software development life cycle). Security testing is the most important
aspect in the SSDLC as it is an important means to expose the serious security
related issues of the applications. Researchers and practitioners advocate security
testing as a series of activities that are performed throughout system development
Security Testing 37
process. If security testing is taken care of in early stages of SDLC, security issues
can be tackled easily and at a lower cost. Hence, it would be instrumentally
effective as well as efficient, if security testing is embedded with each phase of
SSDLC. Security testing should start from requirement phase where security
requirements are tested against security goals. At design phase, security testing may
be carried out using threat model and abuse case model [2]. Secure coding in coding
phase may be validated against static analysis methods/tools [3] such as Flawfinder,
FindBugs. Security testing at testing phase is carried out as functional security
testing as well as risk-based security testing [3]. Security testing at this stage is
carried on as attacker’s perspective. Finally, production environment needs to be
properly tested which ensures that the application is protected against the real-time
attacks in the live environment. All these activities form the approach namely as
phase embedded security testing (PEST).
This chapter illustrates significance and relevance of security testing in present
context and organized as follows: Sect. 2 elucidates the current security challenges
faced by applications. Section 3 highlights the significance of security testing.
Section 4 gives an overview of SSDLC while Sect. 5 discusses the security issues
and the related concerns. The approaches of security testing are discussed in
Sect. 6, and phase embedded security testing approach (PEST) is explained in
Sect. 7. Section 8 discusses industry practices while Sect. 9 highlights industry
requirement and future trend. Finally, Sect. 10 concludes the chapter with some of
the important directions of research in this area.
Software, both as a process and the product, is becoming more and more complex
due to prevailing practices in industry. Primitive software was having limited
functionalities with limited users but due to advancement in technologies and
increase in dependencies on software, software is not only highly complex but also
huge in size and virtually expandable operational environment. Complex software
has more lines of code and has more interactions among modules and with
38 F. Anwer et al.
the report of coverity [15], companies are using major part of software code from
multiple third parties and these codes are not tested with the same intention and
rigor as internally developed software. So the conclusion is that if third-party code
is being used, it must be properly tested for security flaws.
Industry generally presumes that security testing will not return on investment.
They consider that they have completed functional testing and environment is fitted
with firewall and antivirus, so there is no need to invest extra cost on security
testing. But the fact is something different. A recent study reveals that three out of
four applications are at risk of attack and large number of these attacks are per-
formed on applications and firewalls and SSL cannot do anything in this case [18].
Actually in long run, industry will get following benefits through security testing.
A well-tested application goes through less downtime and enhances the brand
image of industry. It will add value to the brand and indirectly will attract other
client as well. Kirk Herath, chief privacy officer, assistant vice president and
associate general counsel at Nationwide Insurance in Columbus, Ohio says “Any
breach has the tendency to dampen greatly whatever you are spending around your
brand” [19]. On the other side, brand damage can have serious concern for com-
panies. They may lose trust of existing clients and new clients will be reluctant to
use their products.
A properly and thoroughly tested applications will face less failure; otherwise, cost
to fix the application at later stages will be high and subsequently development cost
will be higher in these cases. For example, fixing a requirement error found after an
application has been deployed costs 10–100 times compare to 10 times when error
found at testing stage [20]. Similarly, fixing a design error found after an appli-
cation has been deployed costs 25–100 times compared to 15 times when error
found at testing stage.
Security Testing 41
There is no doubt that client is the ultimate and direct beneficiary of well-tested
quality product. A client itself could be an industry such as retail, banking and
insurance. Untested software may be exploited by the attackers that will have direct
influence on client. They may lose important data and may be victim of financial
loss. Therefore, clients are most concerned regarding the products and vendors who
are supplying these. A well-tested product will have great significance for client
including as follows.
Obviously, incorporating security on the product will add quality to the product.
Security testing insures whether security aspects have been added to the product or
not. A properly security tested product will face minimum service downtime and
hence will improve quality of the product. On the other hand, an untested or
improperly tested product will miss quality aspect of the product.
A small flaw in any part of the application can hamper performance of the critical
services. They may need other hardware or software to compensate for this loss
which in turn needs more investment. A single security flaw in an application may
cost millions to the customer. A study carried out by Ponemon reports that security
violation sometimes costs the businesses at an average of $7.2 million dollars per
breach [21]. This shows a huge cost is associated with a single security failure. In
some cases, industry may need to pay millions of dollars compensation.
42 F. Anwer et al.
End users are those who will be benefited with industries IT environment such as
customers of an online banking. If the software is vulnerable to attack, then per-
sonal data, credential and others may be stolen by an attacker. End user may face
interrupted services and they may be victim of financial lose also. So it is very
important that end users are provided with properly protected and well-tested
applications. Following are the significance of well-tested application in context of
end-user perspective.
An insecure product may be victim of DoS attack which may completely block
applications. End users will be unable to access the application because of this type
of attack. A more sophisticated attack, distributed denial of services (DDoS), may
be used by attackers. Well-known companies such as Twitter and Facebook in past
have faced Dos attack [22]. Security testing ensures that applications should not be
exploited by these types of attack, and at very early stage, the applications are
protected from these attacks.
Attackers are very much interested in users credential as well personal data. They
may use these data for their ill intentions. According to a recent article of CBS news
[23], personal data of over 21 million individuals were stolen in a widespread data
breach. It is very important for an application to guard against these data breaches.
Application should be properly tested and protected for these types of breaches.
Security
Requirements Security Design Secure Coding Security testing Secure Deployment
and Maintenace
At very first company-wide security policies are established that ensure role-based
permission, access level controls, password control and others. Requirement
specification are thoroughly examined to decide where security can be incorporated
44 F. Anwer et al.
which is called security points. Security guidelines, standards, rules and regulations
meaningful for particular company are considered at this stage which along with
security policies and security point decides the complete security requirements.
Some of the sample security requirements are as follows:
• Application should only be used by legitimate users.
• Each user should have a valid account, and their levels of privilege should be
properly defined.
• Application send confidential data to different stake holders over the commu-
nication network, so encryption technique should be applied.
Experts have proposed several approaches to accommodate security during
requirement phase such as [28, 29]. A comparative study on different security
requirement engineering methods can be found in Fabian et al. paper [30].
At design stage, security experts must identify all possible attacks [31] and design
the system accordingly. These threats can be systematically identified through one
of the popular technique called threat modeling [32]. It helps the designer to
develop mitigation techniques for the possible threats and guides them to con-
centrate on part of the system at risk. Other technique to model software design is
Unified Markup Language (UML) diagrams, which helps to visualize the system
and include activities, individual components and their interaction, interaction
between entities and others. UMLsec [33] an extension of UML for secure system
development has been proposed in the literature.
After successfully applying security at each phase the product should be installed in
secure environment. Software and its environment need to be monitored continu-
ously, and in case security issues are found in the product or environment, a
decision should be made whether patch is required or not [34]. Software and its
environment also need to update periodically so that they may not be affected by the
security vulnerabilities. Attackers used to exploit outdated or older version of the
software since chances of vulnerabilities are high in these cases.
This is a typical kind of loophole through which malicious scripts are injected
through client-side script that access confidential data, content of html page may be
46 F. Anwer et al.
rewritten by that script. This vulnerability is considered as one of the most popular
hacking techniques to exploit Web-based applications. An attacker who exploits
XSS might access cookies, session tokens and even they can execute malicious
code into other users system [37]. In past, several popular Web sites were affected
by XSS, prominently among these is recent instance of twitter that is exploited
through this vulnerability [38]. This enables someone to open third-party Web sites
in users browser just hovering mouse over a link.
Denial of services (DoS) attack, that targets service to make unavailable to intended
clients, has shown serious threat to the Internet security [42]. DoS can result from
various reasons such as application crash, data destruction, resource depletion like
memory, CPU, bandwidth and disk space depletion [43]. DoS attacks were carried
out with only little resources and thus caused a serious threat to organizations.
A more sophisticated form of DoS is distributed DoS (DDoS) where several sys-
tems may be distributed across globe is used to carry on an attack.
Buffer overflow is very common and easy to exploit vulnerability. This vulnera-
bility can be exploited to crash the application and in extreme case, an application
can control the program. In a classic buffer overflow vulnerability, an attacker
enters large-sized data into a relatively small-sized stack buffer, which results in
overwriting of call stack including the function’s return pointer [44]. In a more
sophisticated attack, function’s return pointer is overwritten with address of mali-
cious code. Both Web server and application server can be exploited by BOF.
disclosure, DoS and execution of malicious code on the Web server/client side [45].
According to Imperva report, RFI was ranked as one of most prevalent Web
application attacks performed by hackers in 2011 [46].
Code review involves testing an application by reviewing its code. Its purpose is to
find and fix errors introduced in the initial stage of software development, and
hence, overall quality of software will be improved. Code review can be done
manually as well through automated tools. Manual code review is very
Security Testing 49
time-consuming which needs line by line scan of code where as automated code
review scans source code automatically to check whether predefined set of best
practices have been applied or not.
Model checker automatically checks whether a given model of the system meets
given system’s specification. Specification may contain safety, security or other
requirements that may cause the system to behave abnormal. Systems are generally
modeled by finite-state machines that would act as an input for model checker along
with collection of properties generally expressed as formulas of temporal logic [49].
The model checker checks whether properties hold or violated by the system.
Model checker works in following manner. Suppose M be a model, i.e., a
state-transition graph and let p be the property in temporal logic. So the model
checking is to find all states s such that M has property p at state s. An important
tool Java Path Finder (JPF) [50] based on model checking is widely used research
tool for various different execution modes and extensions. Several testing tools
based on JPF such as Symbolic PathFinder, Exception Injector and others have
been proposed in the literature. A whole list of such methods/tools can be found on
the site of JPF.
This technique generates test cases for every path by generating and solving path
condition (PC) which is actually a constraint on input symbols. Each branch point
of the program has its own path condition, and PC at higher level is the aggregation
of current branch and branches at previous level. Classical symbolic execution
internally contains two components:
• Path condition generation: Suppose a statement is given as if(cond) then S1
else S2. PC for the branch of this statement would be PC ! PC ^ cond (for true
branch) and PC ! PC ^ ¬cond (for false branch).
• Path condition solver: Path condition solver or constraint solver is used for two
purposes, first to check whether path is feasible or not, and second, to generate
the inputs for the program that satisfies these constraints. Constraint solving is
the main challenge of symbolic execution since the cost of constraint solving
dominates everything else and the reason is that constraint solving is a
NP-complete problem.
Classic symbolic execution provides sound foundation for dynamic testing such
as concolic testing. Plenty of works based on concolic testing have been proposed
in the literature. We have discussed Concolic testing in next section.
50 F. Anwer et al.
In order to demonstrate the actual working of testing tool, we consider a case study
of a java package, namely BareHTTP [51]. We have applied popular static testing
tool Findbugs [47], a static analysis tool for java code on BareHTTP package that
produces following result mention in Table 1. Findbugs takes java programs as
input and generates different types of bugs such as empty database passwords, JSP
reflected XSS vulnerability and so on.
Every Bugs found through Findbugs do not come under security issues.
A careful analysis of bug report generated for BareHTTP shows that some bugs are
generated because IO stream object is not closed. This may result in a file descriptor
leak, and in extreme case, applications may crash. Tool in discussion generates
wide list of bugs, and they are categorized under bad practices, performance issue,
dodgy code, malicious code vulnerabilities and so on. All bugs produced by
Findbugs are not actual error, and it may generates several false positives.
Dynamic security testing involves testing of programs for security in its running
state. It can expose flaws or vulnerabilities that are too complicate for static analysis
to reveal. Dynamic testing exhibits real errors, but it may miss certain errors due to
missing certain paths. Dynamic testing involves using a variety of techniques in
which most popular are fuzz testing, concolic testing, search-based testing. Each
one is valuable from certain perspective such as test effectiveness, test case gen-
eration, vulnerabilities coverage. Each of these techniques is described next along
with latest research in these directions.
Program
Crash
rt
sta
Re
Resta
rt
Normal Run
Fuzz testing can be broadly divided into black box and white box fuzz testing.
Black box fuzz testing generates input without knowledge of program internals on
the other hand white box fuzz testing considers internals of the program also. As far
as coverage of fuzz testing is concern, it can uncover several security issues like
buffer overflow, memory leaks, DoS and so forth [53]. Fuzz testing process can be
better depicted through Fig. 3. This figure shows that fuzz test generator repeatedly
generates test inputs and the program gets executed with these inputs with the help
of test executor. A log is also maintained to store the fault/crash data.
Researchers have proposed a number of fuzz testing techniques such as [54–58]
to test programs for bugs that lead to program crash. Among these, popular methods
are JCrasher [55] an automated testing tool that test for undeclared runtime
exception which if executed will crash the program and CnC [56] an another
random testing method that combines static testing and automatic test generation to
find the program crash scenario.
Concolic testing is a type of testing that while running the program collects the path
constraints along the executed paths and these collected constraints along the path
are solved to obtain new inputs for alternative path [59]. This process is repeated till
all the coverage criteria is covered. Basically, it is an extension of classic symbolic
execution which dynamically generates inputs by solving the path constraints.
Concolic testing has been very successful in security testing since it executes
each and every path in the program and make sure that program does not contain
dark path (untested and neglected path). This testing technique automatically
detects corner cases where programmer not handled exception properly or fails to
allocate memory, which may lead to security vulnerabilities. Several methods/tools
52 F. Anwer et al.
have been developed using concolic execution such as Cute and JCute [60],
Symbolic JPF [48] and others. These methods are very effective in security testing
of the programs since it tests all paths and hence avoid untested paths that may lead
to security vulnerabilities.
We have taken a case of program given in Listing 1 that generates uncaught divide
by zero exception. Results of uncaught or improper handled exceptions can lead to
severe security issues [63]. This may allow an attacker to cause a program crash
which will finally lead to DoS. Several instances of divide by zero exceptions have
been reported in vulnerability databases such as OSVDB [36], NVD [65]. As per
database record, 9 such cases are in 2015, while 15 cases in 2014 [36]. For instance,
1 class symexe{
2 int i , j , z ;
3 void foo( int i , int j ){
4 if ( i > j)
5 {
6 System.out.println(”i is greater than j”) ;
7 z=i/(j −8);
8 }
9 else
10 {
11 System.out.println(”j is greater than i”) ;
12 z=j/(i−4);
13 }
14 if (z>5)
15 System.out.println(”z is greater than limit”) ;
16 else
17 System.out.println(”z is less than the limit”) ;
18 }
19
20 public static void main(String[] args){
21 symexe s = new symexe();
22 int p,k;
23 s . foo(12,6) ;
24 }
25 }
If we will constitute Symbolic tree of the above program, then it will generate
six paths out of which two paths will generate uncaught exceptions. Symbolic
PathFinder generates the constraints of these six paths, one of them is given
below as:
(j 2 SYMINT / (i 1 SYMINT – CONST 4)) <= CONST 5 &&
(i 1 SYMINT – CONST 4) != CONST 0 && i 1 SYMINT <= j 2 SYMINT
These constraints for each path constraint are solved subsequently using different
constraint solver embedded in Symbolic Java PathFinder and finally following
results are produced:
54 F. Anwer et al.
[ foo(54,11) ]
[ foo(92,51) ]
[ foo(86,−77)]
[(expected = java.lang.ArithmeticException.class), foo (21,8) , ##EXCEPTION## ”jav
a.lang. ArithmeticException: div by 0... ”]
[ foo(21,8) ]
[ foo(11,54) ]
[ foo (0,0) ]
[(expected = java.lang.ArithmeticException.class), foo (4,49) , ##EXCEPTION## ”jav
a.lang. ArithmeticException: div by 0... ”]
[ foo(4,49) ]
In normal setup, generating values for these cases are usually difficult since a
program may have. Result of Symbolic PathFinder on the above program can be
summarized in Table 2.
Here in the above test cases, two test cases foo(21,8) and foo(4,49) generate
divide by zero exceptions. Other test cases would execute remaining paths other than
the paths that generated the exceptions. This type of testing is very effective in the
sense that it needs few number of test cases to find bugs in the program. We have
taken a case of improper handling of “divide by zero” exception and tested it using a
concolic testing tool. There are some other types of exploitable exceptions to crash a
program, such as Null Pointer exception, that have not been taken in this study.
Security testing techniques discussed in Sect. 6 are further categorized. Table 3
gives brief description of each technique including attack types it covers like BOF,
DoS and others mentioned in the table.
In previous sections, we have discussed that security is not a onetime approach after
implementation but a series of activities throughout development life cycle. Like
security is phase-based activities, similar to this, we have proposed that security
testing should also be carried out from the very first stage, i.e., requirement stage till
deployment and maintenance phase.
Software security should be applied as two-layer approach. At very first, security
need to carry out throughout development life cycle and in second-layer security
testing should be embedded at each phase of SDLC. Figure 5 elaborates this
concept. At very first phase, security requirements such as security rules, standards,
policies and goals are collected and formalized. Test cases are created and analyzed
Security Testing 55
SSDLC
Security Testing Security Testing Security Testing Security Testing Security Testing
of Requirements of Design of Coding At testing phase of Live Environment
· Formalize Security · Static Security
· Deriving security · Security Test Plans · Testing of secure
Requirements. testing
test cases from · Static and Dynamic configuration
· Test cases against · Compile time
attack scenarios Testing · Testing of secure
Functional Security detection
· Testing security · Test Defect Review environment and
requirements · Analysis of Secure
features · Functional and non platform.
· Risk Analysis coding practices functional security · Secure Testing of
· Discover missing
testing module being
security requirements
delivered as patch
against these security requirements. In second phase, security test cases are derived
from attack scenario and security features are tested. One such attack scenario could
be finding ways of application crash. Model-based design verification is also
helpful to test secure design. During security testing at coding phase, secure coding
practices are analyzed, and static security testing is applied to detect security issues.
At testing stage, functional and non-function security issues are tested using static
or dynamic or combination of both. Finally, security testing of live environment is
carried out through testing secure configuration, environment and platform being
used. Modules delivered as patch should also be tested for security issues.
Production environment needs to be properly tested as it will ensure that the
application is protected against the real-time attacks in the live environment.
As a case study, we consider an organization that has formed software devel-
opment team to build a Web-based application. Knowing the significance of
security relating to application, the software development team made every effort to
incorporate security in the application. Security experts were involved with the
development team from initial requirement gathering phase till deployment and
Security Testing 57
Threat 1:
Bypass/Weak authentication
AND
AND
1.c.a
User data is not properly validated 1.c.b
in a page of the concerned web Not using parameterized query
site
Fig. 7 Threat model diagram of bypass authentication. Adapted from work of Gencer Erdogan
[69]
Similarly, several scenarios have been created to test for malicious activities.
Security requirements can also be formalized to test the requirements auto-
matically. However, we have not formalized security requirements in this case,
but several studies have used this technique [67, 68].
• Security Testing of Design: Testing in this phase refers to exposing security
vulnerabilities that may enter into application because of design errors such as
improper error handling and weak authentication. We have modeled security
threats through threat modeling diagram that help experts to discover threats that
might enter into the system. We have included a threat model of bypass/weak
authentication threat adopted from work of Gencer Erdogan [69] as depicted in
Fig. 7. This helps experts to identify whether online shopping application suffers
from these threats or not. Similar to this threat model, several component-based
threat models have been created to test for any vulnerabilities.
• Security Testing of Coding: Security testing in this phase is done to expose any
missing secure coding practices or common vulnerabilities pattern. Security
experts walked through the whole code to get to know any security flaw or
missing secure practices. We have applied static analysis tools such as
find-security-bugs [70] to detect any known vulnerabilities. This tool can be
added as a plug-in in Findbugs, and it can be used to test for XSS, SQLi, DoS
and others in a given program. It is very effective in finding potential XSS such
as given in below code 2.
Security Testing 59
<%
String taintedInput = (String)request.getAttribute(”input”);
%>
[...]
<%= taintedInput %>
In the above code, a potential XSS vulnerability has been found. This vulnerable
code could be exploited to execute malicious JavaScript in a client’s browser.
A list of static analysis tools can be found at site of OWASP [71].
• Security Testing in Testing Phase: After review and testing of secure coding,
next step is to apply any dynamic security testing tool to identify vulnerabilities
during running state of application. These tools find potential vulnerabilities in
applications such as input/output validation, specific application problems and
others. Dynamic security testing tool such IBM Security AppScan [72] can be
applied to uncover security issues. We have downloaded the trail version of this
tool from IBM site and tested a dummy Web site “http://www.altoromutual.com”
for the security issues. This tool has created several test cases, and some of the
test cases are able to exploit the Web site through XSS, SQLi and other
vulnerabilities.
Similar tools based on other security testing techniques such as search-based
security testing and concolic testing have been applied in the study. A list of
such tools can be found at site of OWASP [73].
• Security Testing of Deployment and Maintenance: After deployment of
application, health checks of the application and its platform need to be perform
periodically to ensure that no new security threats have been introduced in the
system. If a system needs any updates or new patches, these should also be
properly tested for security vulnerabilities. Experts have proposed several run-
time monitoring techniques to monitor the applications at running state.
A survey of runtime monitoring techniques can be found in the work of Shahriar
and Zulkernine [74].
We have taken a live case of Virtual Freer v1.57 Web application in which very
recently SQLi vulnerability has been found. According to vulnerability laboratory
[75], an auth bypass session vulnerability has been discovered in the official Virtual
Freer v1.57 content management system (CMS). Remote attackers may exploit this
vulnerability to access administrator panel or Web user interface. The vulnerability
found in a login.php file allows remote attackers to use a 1 = 1 sql payload which
leads to bypass validation script at the login.php page. Thus results in access to the
administrator panel without a valid account.
If developers have followed the PEST approach as discussed above, then at very
early stage they might have tested this vulnerability. We have discussed how a test
scenario for the SQLi vulnerability can be created. In case if it has missed in
requirement testing, then at design phase this would have been tested using threat
model that we have shown in security testing of design.
60 F. Anwer et al.
So the idea behind PEST is to find and fix security issues at the early stages. It
seems that these activities will impose overheads in terms of cost, but such activities
will prove to be an advantage or value assets in the long run to enable the company
having a cutting edge in the competitive market.
10 Conclusion
us in the right direction and to the target effectively and efficiently. Moreover, the
present state of the art in security testing appears to be falling short of putting check
on the root cause, i.e., inherent vulnerabilities and hence assuring security. Thereby,
a framework for phase-based embedded security testing is proposed as continued
activity rather than a piecemeal and after-thought approach.
This work is useful for security practitioners to test their application for security
vulnerabilities and apply protection accordingly. This will also provide a future
direction for security testing researchers as techniques discussed here are good topic
to be worked upon specially search-based security testing and hybrid of different
techniques.
In fact security threats surrounding the financial applications have increased
exponentially and dramatically they continue to evolve. Hence, it is now essentially
needed to make software applications highly secure and reliable. Security testing is
an effort to reveal those vulnerabilities that may violate state integrity, input validity
and logic correctness along with the angle of attack vectors, which may exploit
these vulnerabilities. This minimizes risks of security breach and ensures confi-
dentiality, integrity and availability of customer transactions. Modern applications
are facing several security challenge prominently include software complexity,
third-party code and dynamic security policies. These challenges often exhibit
significant impact on applications security.
On the other hand, software is becoming highly complex, huge in size and
virtually expandable operational environments. Thus, it is harder to categorically
understand applications, difficult to diagnose the problems and test leading to the
possibilities of bugs in the system that may likely introduce security vulnerabilities.
Because of the market pressure and strict deadlines to produce and deliver software
quickly and at a lower cost, third-party code is widely used in software industry. It
is a well recognized and obvious fact that third-party code creates many security
related issues and concerns that may lead to a serious security risk for companies.
Therefore, if third-party code is being used, it must be properly and effectively
tested for security flaws. An attempt is made in this chapter to evolve such a
mechanism in order to test and assure high level of security at each phase through
PEST.
Information security policies specify a set of policies adopted by an organization
to protect its information agreed upon with its stakeholder and management. It
includes scope of the policy, information classification, management goal of
securely handling information at each class and some other. Modern systems
interact with external environment where security policies cannot be known well in
advance and applied. Thus, security policies are dynamic in nature for such cases
like for instance deciding authority of users depending on type of message received
through external environment. Therefore, a mechanism is essentially needed that
can allow security critical decisions at runtime based on dynamic observations of
the environment.
Security is one of the prime aspects of software quality due to prevailing cir-
cumstances in software industry. Software could be functionally correct yet, lacking
quality in terms of stability, security or its usability. The technology is evolving at
Security Testing 63
incredibly faster pace and spectacularly affecting almost every domain. Therefore, it
is indispensable to realize the need and significance to identify a mechanism with
comprehensive testing capabilities that is adaptable with evolving dynamic and
heterogeneous nature of Web domain. Increasing rate of security breaches has made
security testing as a vital part of software life. Software security testing would be
instrumental for exposing possible vulnerabilities and associated threats. It provides
the assurance that application fulfills current need of security when exposed to a
malicious input. All the theories are useless until not cater better tools and tech-
niques. The practices should start right from the beginning at requirement level and
continue till the application is finally developed. It would find out vulnerabilities
just at the time when they are introduced.
Security testing automation tools need continuous updates with ever evolving and
changing technologies. This is one of the major challenges and tools that require
adequate integration under existing development workflows. Continuous monitoring
and assessment will surely safeguard the upcoming possible threats. The chapter
argues and concludes with the fact that security testing is dependent on technology.
Hence, it is now necessary to explore testing techniques that can keep track of all
possible issues while developing meticulous test design and strategies. Future
direction demands how we adapt with secure dynamic pervasive nature of Web.
In view of the current scenario and projections, a bright future appears ahead for
quality assurance and software testing domain such as automated, symbolic, formal,
phase-based security testing rather than demonstration-based conventional testing.
This in turn will be leading to dependable software components in their true spirit to
fulfill the future software requirements and secure functionalities. Software testers
need to be prepared and be ready to grab the emerging opportunities in the
multibillion dollar software security testing industry.
References
1. IDG, Idg study: why application security is a business imperative. White paper, IDG Research
Services, 2014
2. Testing guide introduction. https://www.owasp.org/index.php/Testing_Guide_Introduction#
Deriving_Functional_and_Non_Functional_Test_Requirements. Accessed 23 Oct 2015
3. B. Potter, G. McGraw, Software security testing. Secur. Priv. IEEE 2(5), 81–85 (2004)
4. G. Dave, S. Keri, J. Jon, Driving quality, security and compliance in third-party code. http://
softwareintegrity.coverity.com/register-for-the-coverity-and-blackduck-webinar.html. Accessed
30 July 2015
5. A. Girard, C. Rommel, Software quality and security challenges growing from rapid rise of
third-party code. White paper, VDC Research, 2015
6. H. Janicke, L. Finch, The role of dynamic security policy in military scenarios. in Proceedings
of the 6th European Conference on Information Warfare and Security, pp. 2007–2009 (2007)
7. I. Magazine and Accenture, Managing complexity still top security challenge, prompting
increase in security spend, according to security survey. https://newsroom.accenture.com/
news/managing-complexity-still-top-security-challenge//-prompting-increase-in-security-
spend-according-to-security-survey.htm. Accessed 30 July 2015
64 F. Anwer et al.
8. B. Schneier, Beyond fear: thinking sensibly about security in an uncertain world (Springer
Science & Business Media, 2003)
9. D.S. Board, Complex systems can have backdoors and trojan code implanted that is more
difficult to find because of complexity. Technical report, Department of Defence, Sept 2007
10. M. Software, More complex = less secure: miss a test path and you could get hacked.
Technical report, McCabe Software, Apr 2012
11. F. Brooks, No silver bullet. Apr 1987
12. G. Booch, Object oriented analysis & design with application (Pearson Education India, 2006)
13. R. Bhoraskar, S. Han, J. Jeon, T. Azim, S. Chen, J. Jung, S. Nath, R. Wang, D. Wetherall,
Brahmastra: driving apps to test the security of third-party components. in 23rd USENIX
Security Symposium (USENIX Security 14) (USENIX Association, 2014), pp. 1021–1036
14. J. Kirk, Third-party code putting companies at risk. http://www.infoworld.com/article/2626167/
data-security/third-party-code-putting-companies-at-risk.html. Accessed 12 Aug 2015
15. Coverity, Study reveals less than fifty percent of third party code is tested for quality and
security in development. http://www.coverity.com/press-releases/report-reveals-less-than-fifty-
percent-of-third-party//-code-is-tested-for-quality-and-security-in-development. Accessed 12
Aug 2015
16. J. Bayuk, How to write an information security policy. http://www.csoonline.com/article/
2124114/strategic-planning-erm/how-to-write-an-information-security-policy.html. Accessed
12 Aug 2015
17. Trustwave, Trustwave global security report. Report, Trustwave, 2015
18. A. Girard, C. Rommel, Application security in the software development lifecycle: issues,
challenges and solutions. Report, Quotium, 2015
19. A.R. Nazarov, Protecting your brand. https://searchsecurity.techtarget.com/magazineContent/
Protecting-Your-Brand. Accessed 21 Oct 2015
20. S. McConnell, Code complete. Microsoft Press (2004)
21. IXIA, Security testing for financial institutions. White paper, IXIA, Jan 2014
22. E.V. Buskirk, Facebook confirms denial-of-service attack (updated) (2009)
23. R. Martin, Your personal data will be stolen (2015)
24. M. Commissioned Forrester Consulting. State of application security. Report, Forrester, Jan
2011
25. M. Howard, S. Lipner, The security development lifecycle: Sdl-a process for developing
demonstrably more secure software (2006)
26. G. McGraw, Software security touch point: architectural risk analysis. Technical report,
Technical report, 2010. http://www.cigital.com/presentations/ARA10.pdf (2009)
27. OWASP, Comprehensive, lightweight application security process. http://www.owasp.org
(2006)
28. C.B. Haley, Arguing security: a framework for analyzing security requirements. PhD thesis,
The Open University, 2007
29. G. Sindre, A.L. Opdahl, Capturing security requirements through misuse cases. NIK 2001,
Norsk Informatikkonferanse 2001. http://www.nik.no/2001 (2001)
30. B. Fabian, S. Gürses, M. Heisel, T. Santen, H. Schmidt, A comparison of security
requirements engineering methods. Requirements Eng. 15(1), 7–40 (2010)
31. G. McGraw, Software security. Secur. Priv. IEEE 2(2), 80–83 (2004)
32. S. Forum, Architecture and design considerations for secure software, 22 Feb 2011
33. J. Jürjens, Model-based security with umlsec. in UML Forum, Tokyo (2003)
34. C. Wysopal, L. Nelson, E. Dustin, D. Dai Zovi. The Art of Software Security Testing:
Identifying Software Security Flaws (Pearson Education, 2006)
35. SEI CERT coding standards. https://www.securecoding.cert.org/confluence/display/seccode/
SEI+CERT+Coding+Standards. Accessed 12 Aug 2015
36. Open source vulnerability database. http://www.osvdb.org/. Accessed 13 Aug 2015
37. K. Spett, Cross-site scripting. SPI Labs 1, 1–20 (2005)
38. Twitter users including sarah brown hit by malicious hacker attack. http://www.theguardian.
com/technology/blog/2010/sep/21/twitter-bug-malicious-exploit-xss. Accessed 13 Aug 2015
Security Testing 65
64. D. Romano, M. Di Penta, G. Antoniol, An approach for search based testing of null pointer
exceptions. in 2011 IEEE Fourth International Conference on Software Testing, Verification
and Validation (ICST) (IEEE, 2011), pp. 160–169
65. National vulnerability database. https://web.nvd.nist.gov. Accessed 13 Aug 2015
66. Divide by zero exception ibnstance. http://osvdb.org/show/osvdb/118073. Accessed 13 Aug 2015
67. C.B. Haley, J.D. Moffett, R. Laney, B. Nuseibeh, Arguing security: validating security
requirements using structured argumentation (2005)
68. Z. Hui, S. Huang, B. Hu, Y. Yao, Software security testing based on typical ssd: a case study.
in 2010 3rd International Conference on Advanced Computer Theory and Engineering
(ICACTE), vol. 2 (IEEE, 2010), pp. V2–312
69. G. Erdogan, Security testing of web based applications (2009)
70. Find security bugs. http://h3xstream.github.io/find-sec-bugs/ Accessed 11 Sep 2015
71. Source code analysis tools. https://www.owasp.org/index.php/Source_Code_Analysis_Tools.
Accessed 11 Sept 2015
72. Application security. http://www-03.ibm.com/software/products/en/category/application-
security. Accessed 18 Nov 2015
73. Source code analysis tools. https://www.owasp.org/index.php/Category:Vulnerability_
Scanning_Tools. Accessed 11 Sept 2015
74. H. Shahriar, M. Zulkernine, Taxonomy and classification of automatic monitoring of program
security vulnerability exploitations. J. Syst. Softw. 84(2), 250–269 (2011)
75. Sql injection vulnerability. http://www.vulnerability-lab.com/get_content.php?id=1592.
Accessed 11 Sept 2015
76. J.F. Neil MacDonald, Magic quadrant for application security testing. Technical report,
Gartner, 2015
77. F. Pan, Y. Hou, Z. Hong, L. Wu, H. Lai, Efficient model-based fuzz testing using higher-order
attribute grammars. J. Softw. 8(3), 645–651 (2013)
78. Markets and Markets, Security testing market worth $4.96 billion by 2019 (2015)
Uncertainty in Software Testing
1 Introduction
available to model some of these uncertainties, and Sect. 7 concludes the chapter by
highlighting future scope of work.
2 Uncertainty Preliminaries
The quality of each of the deliverables produced during software developed needs
to be assessed. This is possible by testing the observed behavior with expected
outcome. To test the quality of each of the artifacts produced, a set of test cases are
needed.
A test case is represented as a triplet [I, S, O] where the set of input data given to
the system is represented by I, S is the state of the system and O is output or
outcome produced by the system (Fig. 1).
State S represents the artifacts of software development produced at the end of
each phase. Given finite input set space I and output set space O and a function f
(i) which maps a given input to output state, the test case is defined as:
The set of test cases with which the quality of a given deliverable is tested forms
test suite. In test oracle, expected outcomes are known. If the output produced by a
deliverable or an artifact is the expected one, then the state of the system is certain.
Risk is when we are unaware of the outcome, but we do know the distribution of
outcomes, i.e., when the expected outcome is not known but the probability of the
outcome is known. For a given input “i,” the outcome is unknown but is expected to
be within the range or distribution of output space then it leads to risk.
Uncertainty prevails when neither the outcome nor the distribution is known.
The probability of undesired incident is known or justifiable in case of risk, whereas
in uncertainty the probability of any such event cannot be computed. For a given
input, the outcome produced is outside the output space and then it leads to
uncertainty. Uncertainty also exists when the outcome produced varies for the same
given input.
There are two types of uncertainty as proposed by Luqi and Cooke [9]. One of the
uncertainties is to verify whether the given description is the actual specification of
the software to be produced. This forms the basis of requirement validation. Huge
amount of code generated without knowing the correctness of specifications leads
to time and cost overruns.
The second type is related to lifetime of the valid specifications. According to
Lehman [10], there are two categories of programs viz., those which are based on
given specification and are valid for ever and the one whose specification changes
over a period of time.
According to IEEE software engineering standard [11], there are two sources of
uncertainty. The first type of uncertainty arises due to the poor understanding of
problem domain. The domain experts are needed for the better understanding of
problem domain as every problem domain cannot be understood by every person.
The second type of uncertainty deals with insufficient emphasis toward maintaining
the quality of software and to understand its user needs. The end user expects a
quality product satisfying the requirements. Non-conformance of quality of artifacts
at each stage of software development may lead to uncertainty.
Uncertainty in Software Testing 71
3 Sources of Uncertainty
prevails in the real world, a system that models such problems also inherits the
uncertainties and thereby results in domain uncertainties. When the assumptions,
constraints emanating from the requirements of the system are not reflected con-
sistently in the model, it may lead to risks and uncertain outcomes. The models are
used to build the system, but as the clarity in understanding requirements can be
chaotic or stochastic, it leads to uncertainty in mapping the specifications to models.
The models built for real-time application and for safety critical systems suffer from
uncertainty characterized by approximation techniques. Models which are built on
historical data may not be validated as the structure of model may change over a
time due to the varied characteristics and dynamic changes in data set.
Uncertainty exists in stake holder’s requests and priorities. Uncertainty arises
due to difference in user’s view of the requirements and the developers view. There
could be alternative architecture solutions for a particular problem. However, if the
consequences of various alternatives is not known or cannot be decided, then it
leads to uncertainty in freezing architecture of a system [14].
The common paradigms in solution domain are Cyclical and Concurrent
Debugging. Traditionally, testing process is cyclic including two stages viz., error
finding and error correction. The process repeats until a program gets bug free. This
type of debugging is often referred as cyclical debugging [15]. Such a process is
suitable for testing of parallel programs. As concurrency is inherent in such envi-
ronments, it results in different traces of execution in every run for the same inputs.
Recreating an execution scenario is probabilistic for possibility of several alterna-
tive scenarios concurrency provides.
The cyclical debugging approach cannot be applied for concurrent programs as
the behavior of the program for the similar inputs varies when the application is
re-executed. The characteristics of concurrency increase the complexity of parallel
programs. The effort required to debug parallel programs is higher than that for
debugging sequential programs. The concurrent or parallel program does not
always result in identical outcome when they are executed with same inputs at
several instances. If one process attempt to perform a write operation on to a shared
memory location and the second process reads the same data from the same shared
space, then the outcome of later operation differs from that of former operation. This
variation is due to new or old value acquired by the process.
If the undesirable behavior or the unacceptable outcome arises as a result of
similar input, then there is a chance that the program may not be able to create the
error situation. The unintended behavior is often referred as Heisenberg’s uncer-
tainty principle [16] or Probe effect [17, 18]. Probe effect may not exist if there are
no synchronization errors. The non-determinism in concurrent programming is
because of the race condition. It is difficult to deal with such non-determinism as
program has little or no control over it. The resolution of such issues depends on
other attributes. For example, resolution of race may depend on network traffic or
on CPU’s load. The Heisenberg’s uncertainty principle states that “the presence of
an observer may affect scientific observation such that absolute confidence in
observed results is not guaranteed [12].”
Uncertainty in Software Testing 73
Software testing is defined as “the search for discrepancies between the outcome
produced by software versus what the user expects it to do” [22]. The process of
software testing includes planning, execution, evaluation, and quality estimation.
The outcome of testing may be uncertain if the test cases selected are not verified
for their correctness. Uncertainty is inherent in software testing but is rarely
managed. The primary reason of the uncertainty is human nature of dominance in
each activity of software testing.
74 S.A. Moiz
Test execution process refers to execution of a set of source code with respect to
given set of inputs. The system under test may include uncertainties because of
difference in outcomes of testing a system in a simulated environment with that of
live environment. Test results may vary with respect to person, timing, synchro-
nization, and other dynamical issues.
Uncertainties also emanates from the external software and hardware. According
to the report of Therac accidents [24], software error is the primary cause of various
accidents and it leads to hazards especially for the mission critical systems. The
malfunctioning of a device or hardware is caused because of unexpected outcome
of the device driver program. Correcting the errors once found could address the
safety issues in mission critical systems. If the causes of a flaw or a bug are properly
addressed, the future accidents are likely to be prevented. In this scenario, the real
cause of the malfunctioning of the devices is unknown, thereby leading to
uncertainty.
A better way to manage uncertainty in testing is to develop more sophisticated
oracles [25]. In presence of uncertainty it is difficult to find whether a test case is
successful or unsuccessful because the outcome may not be judged properly. There
will be inherent risk in test oracle if the distribution is relaxed. However, higher
relaxation of test oracle may become barrier in locating faults.
There are two parts of test oracle viz., oracle information and oracle procedure
[26]. The oracle information states the condition or statement of correct behavior.
The oracle procedure is used to verify the test execution with corresponding oracle.
The oracles captured from the formal specification may mostly reflect the intended
behavior provided the specifications are correct. Effective test oracles can be
directly used during regression testing.
Test oracles may be identified directly from the user acceptance criteria or
directly from the specifications. It actually identifies whether a system behaves
correctly as expected. The correctness of testing depends on the input set which in
turn depends on the type of system being tested. In general, there are two kinds of
systems sequential and reactive systems [26]. In sequential systems, only the
functional requirements are considered whereas in reactive systems functional and
quality requirements are judged.
The major challenge in testing is the error tracing. When the outcome of testing an
artifact is ambiguous, the cause of erroneous outcome has to be traced. The errors
might have emanated from the current phase or any of the previous phases.
Effective error tracing is the mechanism that requires that the software artifacts be
interrelated so that it can be traced. This is also referred as discovery task [27].
76 S.A. Moiz
5 Prioritization of Uncertainty
When multiple mobile host accesses shared data from a fixed host at the same time,
it may lead to concurrent access and synchronization issues. When the data item(s)
is locked by one mobile host, the other mobile has to wait till the resources are
unlocked. Consistency of data items is to be assured in presence of the inevitable
characteristics of mobile environments like disconnections and mobility.
Open loop systems are traditional, i.e., once software does not give the appropriate
solution they are replaced. In closed loop, a system responds dynamically to
unexpected outcomes and changes.
In the distributed transactions example, the frequent transaction re-starts may
affect the throughput of the system. A mechanism is adopted to keep the uncom-
mitted transactions in queue such that it can be executed later. When the transaction
has not completed its execution within the specified time period, the timer value is
updated to new value dynamically by increasing it with a step factor “∂.” This will
enable transaction to be completed when it is scheduled again. The system is closed
loop because the timer value is dynamically adjusted based on decision about state
of transaction execution.
6 Modeling Uncertainties
Uncertainty creeps into the software testing activity because of the failure to
measure certain characteristics of software systems and its artifacts. If these char-
acteristics are quantified, then the uncertainty can be addressed in a better way.
According to Laplante [30], uncertainty is the inability of quality measure.
Uncertainties that arise out of few characteristics of artifacts under test need to be
modeled and managed explicitly. According to Elbaum [25], the three requirements
of uncertainty handling in testing are (a) test frameworks that provides automated
support for generation of inputs from those distributions that helps in identification
of uncertainty, (b) oracles that are used to compare the expected behavior with
78 S.A. Moiz
computed behavior which are probabilistic, and (c) models to represent uncer-
tainties of artifacts under test which also helps in implicit uncertainty quantification.
Traditional software systems run an open loop [20]. In this, the applications are
created and deployed and will be used by the stake holders. Once the system does
not behave correctly it is replaced with new system. Uncertainty modeling is needed
for the open loop systems to save the effort needed to migrate to the new system
and/or environment. In closed loop environment, the run time behavior is contin-
uously monitored and the system improves its behavior by dynamically adopting to
it. Uncertainty can be handled by using closed loop systems as they allow system to
react rapidly to requirement changes, variation in resources, unexpected faults, etc.
Even though it is the known fact that human involvement is the common cause
for uncertainly, very few attempts have been made to model them. Risk manage-
ment has been researched over three decades with signification contributions [31,
32], but baring software dependability and software reliability models [33, 34],
uncertainty is often overlooked. The modeling and management of uncertainty is
realized using Soft Computing techniques. [35–37].
Bayesian belief networks are used to model uncertainties in software testing [5, 12].
They are also used in validation of safety critical systems [12]. A Bayesian network
is represented as a triplet [V, E, P] in a Directed Acyclic Graph (DAG). V is set of
vertices that represent variables (input), E is the set of edges representing causal
inferences, and P is the set of probabilities. A Bayesian network represents prob-
ability relationship among random variables.
Each edge of a DAG has an associated matrix of probabilities which depicts
cause effect relationship, i.e., how value of each variable (cause) affects the prob-
ability of each value of other variable (effect). It is basically a parent–child
relationship. It is a probabilistic model to know the effect of change in value of a
variable on the other.
Each variable Vi of a DAG and takes one value and is assigned with a vector of
probabilities labeled Belief (Vi). The probability in Belief (Vi) represents that the
variable Vi will take a particular value and directed edge (si, di) represents causal
relationship between source and target node. The causal influence is characterized
by conditional probability distribution P(di/si). The Bayesian network structure is
established in consultation with domain experts, and the probabilities of edge
matrices can be derived from statistical techniques.
For testing each artifact of software development, a prior belief of the confidence
is to be determined. The directed edge from A to B represents that B implements A.
The probabilities of correctness are determined by domain experts. The belief
vector is revised with repeated computation, and once the correctness of a value is
reached, a message is sent to its parent.
Uncertainty in Software Testing 79
Effective test case selection and classification helps in proper testing of a system.
The objective of the test selection is to select minimum number of test cases such
that the effort, costs, and uncertainties are minimized, and at the same time, the
effectiveness in testing is achieved [23].
To improve the quality of testing and to minimize the cost and effort of software
testing, multi-faceted concepts [38, 39] such as test case fitness evaluation, selec-
tion, and classification are to be treated together. Fuzzy logic is a mechanism that
helps in solving problems that gives definite outcomes from imprecise, vague, or
ambiguous information. It helps in finding a precise solution. Fuzzy logic based
multi-faceted measurement framework [23] is used in generating test cases by
filtering, prioritizing, and selection. The fuzzy synthesis evaluation algorithm
considers multiple criterions to evaluate each test case. The concurrent considera-
tion of fuzzy parameters and testing objectives results in uncertainty.
The multi-faceted measurement is used by tester to generate fitness score of test
cases. The grades are assigned to each test case based on the fitness score, and then,
the test cases are classified using final grade. The final grades assigned to test cases
are on scale of seven [40] worst, worse, bad, average, good, better, excellent.
Gaussian membership function is used to derive the relationship between the
grades and fitness parameters by assigning fitness values. Fuzzy feature weighting
mechanism is used to address uncertainty.
Multi-objective functions are used to optimize the conflicting test objectives
subject to given constraints. The goal is to find acceptable solution by selecting
subset of test cases. Test cases are generated by using a multi-objective function by
using fuzzy logic and weight assigning method [23]. The fuzzy feature weighting
approach considers several parameters which includes fitness value, weight of fit-
ness value, and frequency of parameters in particular module, total number of
parameters, and size of test case repository. These parameters vary randomly from
one application to another, and the time taken to evaluate the appropriate test cases
will be much higher. Further any minor change in the code requires all the
parameters to be reassessed. The scale of grade chosen to select test cases is to be
standardized.
80 S.A. Moiz
Hidden Markov Models [25] is a probabilistic state machine that can identify veiled
properties of system. Elements of HMM include set of states and outcomes rep-
resented by “S” and “O,” respectively. The total number of states and outcomes is
“n.” Whenever a state S produces a successful outcome O, the next state produces
an observation from O. Transition probability X = {xi,j} is assumed to be a constant
between all pairs of states i and j. Similarly, emission probability Y = {yi,j} rep-
resents the possibility of making observation j in state i, which is also believed to be
a constant. The states and outcomes of a system can be known but for few appli-
cations of HMM, some of the probabilities of X and Y may be unknown. The
sequence of outcomes helps in estimating the probability.
The vagueness in inferring the expected outcome is quantified by confusion
matrix. Let the system be trained to identify and classify n states and n outcomes.
The training of classifiers produces a confusion matrix [25] whose values represent
the emission probabilities.
The tester then randomly selects a subset of sample activities such that each
succeeding activity is performed with equal probability. Then, based on number of
instances of one state producing an outcome, the faults are predicted. The outcomes
may also be modeled to specify the degree of correct outcome with a precision
value. In Hidden Markov Model, estimation of emission probability is a challenge,
and when the states of a system increase, the transition probability changes.
Rough sets theory is used in varied applications. It can be used to model and
measure uncertainty in applications. Its purpose is to arrive at a proper decision in
presence of uncertainty. The approach can be used in many activities of software
engineering as for some practical reasons a developer may not be certain of the end
product throughout the software development life cycle.
A Conventional set theory for managing incomplete information called Rough
set theory was introduced by Pawlak [41]. A rough set represents a function which
is used for making decisions based on multiple attributes or with an uncertain
information space. Information system development depends on many attributes
which can assume values that are uncertain. Hence, use of rough sets in dealing
with uncertainty in the development of software systems is promising.
Rough sets can be represented informally in the context of software testing as
follows:
Let E be a set specifying expected outcome(s) that an oracle determines for a
particular artifact for a given set of input test case I. The conformance of the quality
of end product or artifact in concurrence with given set of input(s) can be defined
using the following sets:
Uncertainty in Software Testing 81
The elementary sets formed by relevant outcome (decision) are called concepts
(C) [42]. For the decision attribute 1, C1 = {r3, r6} and for the decision attribute 0,
C2 = {r1, r2, r4, r5, r7}. Executions of transactions r3, r4 lead to confusion because in
r3, the transaction is committed and in r4 transaction is aborted for the same con-
ditions. Rough computing is applied to manage these uncertainties.
The lower and upper approximation of the output produced (decision attri-
bute = 1) is evaluated as follows.
Only, A5 = {r6} is distinguishable from others in C1 which is r3. Therefore,
lower approximation of C1 is represented as R(C1) = {r6}. Upper approximation is
specified as the union of R(C1) and those atoms which are indistinguishable. {r3, r4}
are indistinguishable in C1. Hence, the upper approximation of C1 is R(C 1) = {r3,
r4, r6}. The boundary region of C1 is defined as RðC 1 Þ RðC1 Þ. Though, the
boundary region can be computed manually, but when information space grows, it
is difficult to identify the uncertainty in producing an outcome. This scalability issue
can be addressed by automating generation of decision rules when the condition
attributes are given.
The decision rules are generated using various available mechanisms viz.,
Exhaustive, LEM2, Genetic and covering algorithms [43]. LEM2 (Learning from
Example Module, Version 2) is a rule induction algorithm which uses the idea of
multiple attribute value pairs. The requirement information space is given as an
input to RSES (Rough Sets Exploration System) tool [44] to generate rules. When
the test cases are more, the outcome of the rule generation would be useful in
identifying the uncertain outcomes.
Table 1 is given as an input to RSES and LEM2 algorithm is used to generate
rules. The rules generated are as follows:
(a) IF (a2 = 0) THEN (d = 0)
(b) IF (a1 = 0) and (a2 = 1) THEN (d = 0)
(c) IF (a1 = 0) and (a2 = 1) THEN (d = 1)
(d) IF (a1 = 1) and (a2 = 1) THEN (d = 1)
The rules (b) and (c) specify the uncertain outcome. If a1 = 0 and a2 = 1, then
outcome may be d = 0 or d = 1 which is uncertain. One solution to resolve this
uncertainty is inclusion of one of the derived attribute. The uncertainty in the output
produced may depend on few characteristics which are not considered in the
information system. Table 2 specifies the modification to Table 1 by introducing a
derived attribute with an intention to resolve uncertainty.
In the distributed transaction example, the device characteristics (memory
capacity, processing speed) are introduced as an additional attribute to make an
attempt to address the uncertainty.
By introducing the derived attribute a3, the rules are again generated using
LEM2.
(a) IF (a2 = 0) THEN (d = 0)
(b) IF (a2 = 1) and (a3 = 1) THEN (d = 1)
(c) IF (a1 = 0) and (a2 = 1) and (a3 = 0) THEN (d = 0)
Uncertainty in Software Testing 83
Fig. 2 Quantification of
attributes of software
processes & tools [47]
The unknown characteristics of the artifact under test can be computed using B.
The subset of such characteristics in addition to the quantified properties considered
earlier can help in reducing the uncertainty.
The challenges in managing uncertainty in rough sets with respect to software
testing are as follows.
• To predict the derived attribute, related characteristics of parameters need to be
considered which may be domain specific and may vary from one artifact to the
other. The derived attribute helps in addressing the uncertainty arising due to
variation in expected outcome of the quality requirements of a product.
• Similar concept can be extended to all the phases of development provided they
are quantified properly.
• The approach to resolve uncertainty is specific to the applications. An attempt can
be made to at least have generic guidelines for applications of certain domains.
• The concept of reduct set of rough set theory might help in answering to the
questions like “When to stop testing” as there is a possibility for reduction in the
information space.
According to Frank [45], the information extracted from the models must corre-
spond to the information extracted for corresponding operations on reality.
Software design decisions are dependent on multiple goals that are difficult to
characterize, thereby leading to conflicts. The goals include the non-functional
concerns such as security, reliability, and performance of software. Classifying the
goals and evaluating the trade of each of the alternatives help in software design
decisions. To handle the complexity of the system design decisions, Emmanuel [14]
proposed the following process:
The first step is to define a multi-objective architecture decision model consisting
of design decisions, dependency constraints, model parameters, and optimization
goals. The complexity of the design increases when the number of objectives grows.
The second step is to define the cost benefit model that allows system designers
to map design decisions and satisfaction criterion to financial goals that are of
utmost importance to its customers. In reality, it is too hard to quantify them.
In the third step, design decision risk is addressed and quantified. Measures
related to net benefit can be of interest to an architect. In [14], the goal failure and
project failure risks are addressed. Finally, candidate architecture is selected based
on comparison of risks of possible other design architectures.
The design decision can be open or closed. A design decision is said to be closed
if all the architectures shortlisted agree on the option selected for a particular
decision. The design decision is said to be open, if the shortlisted architectures has
the provision of selecting alternate options. In case of having multiple objectives
playing role in making design decisions, standard methods for multi-criteria
Uncertainty in Software Testing 85
7 Conclusion
References
1. W. Royce, Measuring agility and architectural integrity. Int. J. Softw. Inform. 5(3), 415–433
(2011)
2. R.B. Paramkushan, A decision theoretic model for information technology. Manag. Stud. 1(1),
47–63 (2013)
3. B. Kitchenham, S. Linkman, Estimates, uncertainty and risk. IEEE Softw. 14(3), 69–74 (1997)
4. H. Madsen, P. Thyregod, B. Burtschy, G. Albenau, F. Popentiu, A fuzzy logic approach to
software testing & debugging, in Safety & Reliability for Managing Risk (ESREL 2006), vol.
11, ed. by C. Guedes Soares, E. Zio (Taylor & Francisc Group, Abingdon, 2006), pp. 1435–
1442
5. H. Ziv, D.J. Richardson, Bayesian-network confirmation of software testing uncertainties, in
Proceedings of Sixth European Software Engineering Conference (ESEC), 1997
6. R. Buxton, Modeling uncertainty in expert systems. Int. J. Man Mach. Stud. 31(4), 415–476
(1989)
7. W.A. Wrigh, G. Ramage, D. Cornford, I.T. Nabney, Neural network modeling with input
uncertainty: theory & applications. J. VLSI Signal Process. Syst. Signal Image Video Technol.
26(1), 169–188 (2000)
8. V.B. Robinson, A.U. Frank, About kinds of uncertainty in collection of spatial data, in
Proceedings of AUTO-CARTO 7 (1985), pp. 440–449
9. Luqi, D. Cooke, The management of uncertainty in software development, in Proceedings of
16th International Conference on Computer Software & Applications (1992), pp. 381–386
10. M.M. Lehman, Programs, life cycles and laws of software evolution. Proc. IEEE 68(9), 1060–
1075 (1980)
11. Computer Society IEEE, IEEE guide for the use of IEEE standard dictionary of measures to
produce reliable software, in IEEE Standards Collection Software Engineering (IEEE
Computer Society Press, New York, 1994)
12. H. Ziv, D.J. Richardson, The uncertainty principle in software engineering, in Proceedings of
19th International Conference on Software Engineering, 1996
13. A. Gemmer, Risk management: moving beyond process. IEEE Comput. 30(5), 33–43 (1997)
14. E. Letier, D. Stefan, E.T. Barr, Uncertainty, risk and information value in software
requirements & architecture, in Proceedings of 36th International Conference on Software
Engineering (2014), pp. 883–894
15. C.E. Medowell, D.P. Helmbold, Debugging concurrent programs. ACM Comput. Surv. 21(4),
593–628 (1989)
16. C.H. Ledoux, D.S. Paker Jr, Saving traces for Ada debugging, in Proceedings of the Ada
International Conference ACM (1985), pp. 97–108
17. Jason Gait, A probe effect in concurrent programs. Softw. Pract. Exp. 16(3), 225–233 (1986)
18. J. Gait, A debugger for concurrent programs. Softw. Pract. Exp. 15(6), 539–594 (1985)
19. C.V. Ramamoorthy, Danilel E. Cooke, The Correspondence between Methods of Artificial
Intelligence and the Production and Maintenance of Evolutionary Software, in Proceedings of
the 3rd International IEEE Conference on Tools for Artificial Intelligence (1991), pp. 114–118
20. D. Garlan, Software engineering in an uncertain world, in ACM Proceedings of FOSER
(2010), pp. 125–128
21. D. Garlan, B. Schmerl, The RADAR architecture for personal cognitive assistance. Int.
J. Softw. Eng. Knowl. Eng. 17(2) (2007)
Uncertainty in Software Testing 87
22. A.L. Goel, Software reliability models: assumptions, limitations and applicability. IEEE Trans.
Softw. Eng. SE-11(12),1411–1423 (1985)
23. M. Kumar, A. Sharma, R. Kumar, multi-faceted measurement framework for test case
classification & fitness evaluation using fuzzy logic based approach. Chiang Mai J. Sci. 39(3),
486–497 (2012)
24. N.G. Leveson, C.S. Turner, An investigation of the Therac-25 accidents. IEEE Comput. 26(7),
18–41 (1993)
25. S. Elbaum, D.S. rosenblum, Known unknowns: testing in the presence of uncertainty, in
Proceedings of FSE (2014), pp. 833–836
26. D.J. Richardson, S.L. Aha, T.O. O’Malley, Specification based test oracles for reactive
systems, in Proceedings of 14th International Conference on Software Engineering (ICSE)
(1992), pp. 105–118
27. P.T. Devanbu, R.J. Brachman, P.J. Selfridge, B.W. Ballard, LaSSIE: a knowledge based
software information systems, in Proceedings of 12th International Conference on Software
Engineering (1990), pp. 249–261
28. A.M. Davis, Tracing: a simple necessity neglected. IEEE Softw. 12(5), 6–7 (1995)
29. S.A. Moiz, R. Lakshmi, Single lock manager approach for achieving concurrency in mobile
environments, in 14th IEEE International Conference on High Performance Computing
(HiPC) (Springer LNCS 4873, 2007), pp. 650–660
30. P.A. Laplante, C.J. Neil, Modeling uncertainty in software engineering using rough sets.
Innovations Syst. Softw. Eng. I, 71–78 (2005)
31. B.W. Boehm, Software Risk Management (IEEE Computer Society Press, Washington, D.C.,
1989)
32. B.W. Boehm, Software risk management: principles and practices. IEEE Softw. 8(1), 32–41
(1991)
33. B. Littlewood, How to measure software reliability and how not to. IEEE Trans. Reliab. R-28
(2), 103–110 (1979)
34. B. Littlewood, L. Strigini, Validation of ultrahigh dependability for software-based systems.
Commun. ACM 36(11), 69–80 (1993)
35. J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference
(Morgan Kaufmann Publishers, San Mateo, 1988)
36. R.E. Neapolitan, Probabilistic reasoning in expert systems: theory and algorithms (Wiley,
New York, 1990)
37. D.E. Heckerman, A. Mamdani, M.P. Wellman, Real-world applications of Bayesian networks.
Uncertainty AI Commun. ACM 38(3), 24–26 (1995)
38. M. Kumar, A. Sharma, R. Kumar, Towards multi-faceted test cases optimization. J. Softw.
Eng. Appl. 4(9), 550–557 (2011)
39. M. Kumar, A. Sharma, R. Kumar, Soft computing-based software test cases optimization: a
survey. Int. Rev. Comp. Softw. 6(4), 512–526 (2011)
40. S. Yogesh, A. Kaur, B. Suri, Test case prioritization using ant colony optimization.
ACM SIGSOFT Softw. Eng. Notes 35(4), 1–7 (2010)
41. Z. Pawlak, Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)
42. L.P. Khoo, S.B. Tor, L.Y. Zhai, A rough-set-based approach for classification and rule
induction. Int. J. Adv. Manuf. Technol. 15(7), 438–444 (1999)
43. J.W. Grazymala-Busse, A new version of the rule induction system lers. J. Fundamenta
Informaticae 31(1), 27–39 (1997)
44. http://rses.software.informer.com/
45. A. Frank, Conceptual framework for land information system—a first approach, in
Proceedings of Commission 3 of the FIG (1982)
46. K. Forsberg, H. Mooz, The relationship of system engineering to the project cycle, in
Proceedings of First Annual Symposium of National Council on System Engineering (1991),
pp. 57–65
47. PA Laplante, CJ Neils, Uncertainty: a meta-property of software, in Proceedings of the
IEEE/NASA 29th Software Engineering Workshop (2005), pp. 228–233
Separation Logic to Meliorate Software
Testing and Validation
Abstract The ideal goal of any program logic is to develop logically correct
programs without the need for predominant debugging. Separation logic is con-
sidered to be an effective program logic for proving programs that involve pointers.
Reasoning with pointers becomes difficult especially due to the way they interfere
with the modular style of program development. For instance, often there is aliasing
arising due to several pointers to a given cell location. In such situations, any
alteration to that cell in one of the program modules may affect the values of many
syntactically unrelated expressions in other modules. In this chapter, we try to
explore these difficulties through some simple examples and introduce the notion of
separating conjunction as a tool to deal with it. We introduce separation logic as an
extension of Hoare Logic using a programming language that has four
pointer-manipulating commands. These commands perform the usual heap opera-
tions such as lookup, update, allocation and de-allocation. The new set of assertions
and axioms of separation logic are presented in a semi-formal style. Examples are
given to illustrate unique features of the new assertions and axioms. Finally, the
chapter concludes with proofs of some real programs using the axioms of separation
logic.
Keywords Modular testing Formal methods Separation logic Pointer alias-
ing Hoare logic Programming methodology
1 Introduction
The ideal goal of any program logic is to help in developing logically correct
programs without the need for predominant debugging. Separation logic [1] is one
such Program Logic. It can be seen as an extension to standard Hoare Logic [2].
The aim of this extension is to simplify reasoning with programs that involve
low-level memory access, such as pointers.
– Reasoning with pointers becomes very difficult because of the way they interfere
with the modular style of program development.
Structured programming approaches provide the freedom to develop a large
program by splitting it into small modules. Hence, at any given time, a programmer
can only concentrate on developing a particular module against its specification. In
the absence of pointer commands, the proof of correctness of these small modules
can easily be extended to a global proof for the whole program.
– For example, consider the following specification expressed in the form of a
Hoare triple:
fx ¼ 4g x :¼ 8 fx ¼ 8g
fx ¼ 4 ^ y ¼ 4g x :¼ 8 fx ¼ 8 ^ y ¼ 4g
Here, the proposition y ¼ 4 should remain true in the postcondition, since the
value of variable y is not affected by the execution of command x:= 8.
– The reasoning above is an instance of a more general rule of Hoare logic, viz.
the rule of constancy
fpg c fqg
fp ^ rg c fq ^ rg
where no free variable of r is modified by c. It is easy to see how this rule follows
from the usual meaning of the assignment operator. The following sequence of state
transitions can be used to illustrate the above idea:
Separation Logic to Meliorate Software Testing and Validation 91
However, if the program modules use data structures such as arrays, linked lists
or trees, which involve addressable memory, then extending local reasoning is not
so easy using the rule of constancy.
– For example, consider a similar specification which involves mutation of an
array element
In this case, two more clauses, i.e., i 6¼ j and i 6¼ k are needed in the precondition
to make it a valid specification. These are the extra clauses that programmers often
forget to mention. However, these clauses are necessary since it assures that the
three propositions a½i ¼ 4; a½j ¼ 4 and a½k ¼ 4 refer to mutually disjoint portions
of heap memory, and hence, mutating one will not affect the others. Thus, although
fa½i ¼ 4g a[i]:= 8fa½i ¼ 8g is a valid specification, applying rule of constancy
in these cases can lead us to an invalid conclusion.
These kinds of non-sharing are often assumed by programmers. However, in
classical logic non-sharing needs explicit mention, which results in a program
specification that looks clumsy.
– Separation logic deals with this difficulty by introducing a separating conjunc-
tion, P * Q, which asserts that P and Q hold for disjoint portions of addressable
memory.
– In this sense, it is closer to the programmer’s way of reasoning.
Since non-sharing is the default in separating conjunction, the above specifica-
tion can be written succinctly as:
normal rule of constancy is no more valid, we have the following equivalent rule
called “frame rule”
fpg c fqg
fp rg c fq rg
2 Background
– Allocation.
Command: x :¼ consðe1 ; . . .; en Þ
Meaning: The command x :¼ consðe1 ; . . .; en Þ reserves n consecutive cells in
the memory initialized to the values of e1 ; . . .; en , and saves in x the address of
the first cell. Note that, for successful execution of this command, the
addressable memory must have n uninitialized and consecutive cells available.
– Lookup.
Command: x :¼ ½e
Meaning: It saves the value stored at location e in the variable x. Again, for the
successful execution of this command, location e must have been initialized by
some previous command of the program. Otherwise, the execution will abort.
– Mutation.
Command: ½e :¼ e0
Meaning: The command ½e :¼ e0 stores the value of expression e0 at the
location e. Again, for this to happen, location e must be an active cell of the
addressable memory.
– Deallocation.
Command: free (e)
Meaning: The instruction free (e) deallocates the cell at the address e. If e is
not an active cell location, then the execution of this command shall abort.
The structure of commands in the language L can also be described by the fol-
lowing abstract syntax:
where aexp and bexp stand for arithmetic and boolean expressions, respectively.
The syntax of these is as follows:
94 A.Kr. Singh and R. Natarajan
– Note that the notations, cons and [-], which refer to the heap memory, are absent
in the syntax of aexp. Therefore, the evaluation of an arithmetic or boolean
expression depends only on the contents of the store at any given time. We use
the notation sj¼e + v to assert that the expression e evaluates to v with respect to
the content of store s. For example, let s ¼ fðx; 2Þ; ðy; 4Þ; ðz; 6Þg then
sj¼x ðy þ zÞ + 20. For our discussion, we assume that this evaluation relation
is already defined.
Operational Semantics: We now define a transition relation, represented as
hc; ðs; hÞiðs0 ; h0 Þ, between states. It asserts that, if the execution of command
c starts in a state ðs; hÞ, then it will end in the state ðs0 ; h0 Þ. The following set of
rules, SEMANTICS-I and SEMANTICS-II, describe the operational behavior of
every command in the language L, using the transition relation .
Separation Logic to Meliorate Software Testing and Validation 95
sj¼e + v
Asign
hx :¼ e; ðs; hÞiðs½x : v; hÞ
sj¼e + false
WF
hwhile e do c; ðs; hÞiðs; hÞ
The operational semantics of language L can be used to prove any valid specifi-
cation of the form hc; ðs; hÞi ðs0 ; h0 Þ. However, this form of specification is not the
most useful one. Usually, we do not wish to specify programs for single states.
Instead, we would like to talk about a set of states and how the execution may
transform that set. This is possible using a Hoare triple fpgcfqg,
– Informally, it says that if the execution of program c begins in a state that
satisfies proposition p, then if the execution terminates it will end in a state that
satisfies q.
where p and q are assertions that may evaluate to either true or false in a given
state. We will use notation ½½ ps h, to represent the value to which p evaluates, in the
state ðs; hÞ. Therefore,
Now, we can use Hoare triples to give rules of reasoning for every individual
command of the language L. It is sometimes also called the axiomatic semantics of
the language. These rules in a way give an alternative semantics to the commands.
Axiomatic Semantics: Consider the following set of axioms (AXIOMS-I) and
structural rules for reasoning with commands, that does not use pointers. Here,
Separation Logic to Meliorate Software Testing and Validation 97
Q½e=x represents the proposition Q with every free occurrence of x replaced by the
expression e; ModðcÞ represents the set of variables modified by c, and FreeðRÞ
represents the set of free variables in R.
AXIOMS-I
assign
fQ½e=xgx :¼ efQg
fI ^ egcfIg
while
fIgwhile e do cfI ^ :eg
STRUCTURAL RULES
fPgcfQg
extractE x 62 FreeðcÞ
f9x:Pgcf9x:Qg
fPgcfQg
constancy ModðcÞ \ FreeðRÞ ¼ /
fP ^ RgcfQ ^ Rg
– For the subset of language L, that does not use pointers, these axioms and
structural rules are known to be sound as well as complete [3] with respect to the
operational semantics.
– The benefit of using these axioms is that we can work on a more abstract level
specifying and proving program correctness in an axiomatic way without
bothering about low-level details of states.
– Note that the last four commands of the language L, which manipulate pointers,
are different from the normal variable assignment command. Their right hand
side is not an expression. Therefore, the assign rule of Hoare logic is not
applicable to them.
98 A.Kr. Singh and R. Natarajan
Array Revisited: Now, let us go back to the same array assignment problem
which we discussed in the introduction. Let Q be the postcondition for the com-
mand a[i]:= 8, where a[i] refers to the ith element of array.
– In our language L, this is the same as the command [a+i]:= 8; however, for
convenience, we will continue with the usual notation of arrays.
The command a[i]:= 8 looks similar to the variable assignment command.
But, we cannot apply Hoare assign rule to get Q[8/aa[i]] as weakest precondition.
One should not treat a[i] as a local variable, because the assertion Q may contain
references such as a[j] that may or may not refer to a[i]. Instead, we can model
the above command as a := update(a, i, 8), where update(a,i,8) ½i ¼
8 and update(a,i,8) ½j ¼ a ½j for j 6¼ i.
– The effect of executing a[i]:= v is same as assigning variable a an altogether
new array value “update (a,i,v)’’.
– In this way, a is acting like a normal variable; hence, we have the following rule
for array assignment,
array assign
fQ½updateða; i; vÞ=aga½i :¼ vfQg
Let us try to prove the following specification using the above rule.
fi 6¼ j ^ a½i ¼ 4 ^ a½j ¼ 4g a½i :¼ 8 fa½i ¼ 8 ^ a½j ¼ 4 ^ i 6¼ jg:.
Thus,
Hence, we have a correct rule for deducing valid specifications about array
assignments. However, the approach looks very clumsy.
– We still need to fill in all minute details of index disjointness in the specification.
– Moreover, it seems very artificial to interpret a local update to an array cell as a
global update to the whole array. At least, it is not the programmer’s way of
understanding an array element update.
Separation Logic to Meliorate Software Testing and Validation 99
– The idea of separation logic is to embed the principle of such local actions in the
separating conjunction. It helps in keeping the specifications succinct by
avoiding the explicit mention of memory disjointness.
It is important to note that the meaning of these new assertions depend on both
the store and the heap.
– emp
The heap is empty.
– e1 7! e2
The heap contains a single cell, at address e1 with contents e2 .
– p1 p 2
The heap can be split into two disjoint parts in which p1 and p2 hold,
respectively.
– p1 p2
If the current heap is extended with a disjoint part in which p1 holds, then p2
holds for the extended heap.
For convenience, we introduce some more notations for the following assertions:
p emp , p
p1 p2 , p2 p1
ðp1 p2 Þ p3 , p1 ðp2 p3 Þ
ðp1 _ p2 Þ q , ðp1 qÞ _ ðp2 qÞ
ðp1 ^ p2 Þ q ) ðp1 qÞ ^ ðp2 qÞ
ð9x pÞ q , 9x ðp qÞ where x is not free in q
ð8x pÞ q ) 8x ðp qÞ where x is not free in q
New Axioms for pointers: The axioms needed to reason about pointers are
given below. There is one axiom for every individual command.
AXIOMS-II
alloc
fx ¼ X ^ empgx :¼ consðe1 ; . . .; ek Þfx 7! e1 ½X=x; . . .; ek ½X=xg
lookup
fe 7! v ^ x ¼ Xgx :¼ ½efx ¼ v ^ e½X=x 7! vg
mut
fe 7! g½e :¼ e0 fe 7! e0 g
free
fe 7! gfree ðeÞfempg
Separation Logic to Meliorate Software Testing and Validation 101
Allocate The first axiom, called alloc, uses variable X in its precondition to record
the value of x before the command is executed. It says that if execution
begins with empty heap and a store with x ¼ X, then it ends with
k contiguous heap cells having appropriate values
Lookup The second axiom, called lookup, again uses X to refer to the value of
x before execution. It asserts that the content of heap is unaltered. The
only change is in the store where the new value of x is modified to the
value at old location e
Mutate The third axiom, called mut, says that if e points to something
beforehand, then it points to e0 afterward. This resembles the natural
semantics of Mutation
Free The last axiom, called free, says that if e is the only allocated memory
cell before execution of the command, then in the resulting state there
will be no active cell. Note that, the singleton heap assertion is necessary
in the precondition to assure emp in the postcondition
Frame The last rule among the structural rules, called frame, says that one can
extend local specifications to include any arbitrary claims about variables
and heap segments which are not modified or mutated by c. The frame
rule can be thought as a replacement to the rule of constancy when
pointers are involved
STRUCTURAL RULES-II
P ) P0 fP0 gcfQ0 g Q0 ) Q
conseq
fPgcfQg
fPgcfQg
extractE x 62 FreeðcÞ
f9x:Pgcf9x:Qg
fpgcfqg
frame ModðcÞ \ FreeðrÞ ¼ /
fp rgcfq rg
– Note that the expressions are intentionally kept free from the cons and [−]
operators. The reason for this restriction is that the power of the above proof
system strongly depends upon the ability to use expressions in place of variables
in an assertion.
– In particular, a tautology should remain a valid assertion on replacing variables
with expressions.
102 A.Kr. Singh and R. Natarajan
– Mutation
• Global reasoning
• Backward reasoning
– Free
• Global (backward) reasoning
– Allocation
• Global reasoning (forward)
frgx :¼ consðe1 ; . . .; ek Þf9x0 :ðx 7! e1 ½x0 =x; . . .; ek ½x0 =xÞ r½x0 =xg
• Backward reasoning
– Lookup
• Global reasoning
Here, x, x0 and x00 are distinct, x0 and x00 do not occur free in e, and x is not
free in r.
• Backward reasoning
4 Annotated Proofs
Proof outlines: We have already used assertions in Hoare triples to state what is
true before and after the execution of an instruction. In a similar way, an assertion
can also be inserted between any two commands of a program to state what must be
true at that point of execution. Placing assertions in this way is also called anno-
tating the program.
For example, consider the following annotated program that swaps the value of
variable x and y using a third variable z. Note the use of X and Y to represent the
initial values of variable x and y, respectively.
104 A.Kr. Singh and R. Natarajan
fx ¼ X ^ y ¼ Yg
z :¼ x;
fz ¼ X ^ x ¼ X ^ y ¼ Yg
x :¼ y;
fz ¼ X ^ x ¼ Y ^ y ¼ Yg
y :¼ z;
fx ¼ Y ^ y ¼ Xg
Validity of each Hoare triple in the above program can easily be checked using
axioms for assignment. Hence, one can conclude that the program satisfies its
specification.
– A program together with an assertion between each pair of statements is called a
fully annotated program.
– One can prove that a program satisfies its specification by proving the validity of
every consecutive Hoare triple which is present in its annotated version. Hence,
a fully annotated program provides a complete proof outline for the program.
Now, we consider another annotated program that involves assertions from the
separation logic. Note that the assertion ðx 7! a; oÞ ðx þ o 7! b; oÞ can be used to
describe a circular offset list. Here is a sequence of commands that creates such a
cyclic structure:
1 fempg
x :¼ cons ða; aÞ;
2 fx 7! a; ag
t :¼ cons ðb; bÞ;
3 fðx 7! a; aÞ ðt 7! b; bÞg
½x þ 1 :¼ t x;
4 fðx 7! a; t xÞ ðt 7! b; bÞg
½t þ 1 :¼ x t;
5 fðx 7! a; t xÞ ðt 7! b; x tÞg
6 f9o:ðx 7! a; oÞ ðx þ o 7! b; oÞg
Also, note the use of * in assertion 3. It ensures that x þ 1 is different from t and
t þ 1, and hence, the assignment ½x þ 1 :¼ t x cannot affect the t 7! b; b clause.
A similar reasoning applies to the last command as well.
Inductive definitions: When reasoning about programs which manipulate data
structures, we often need to use inductively defined predicates describing such
structures. For example, in any formal setting, if we wish to reason about the
contents of a linked list, we would like to relate it to the abstract mathematical
notion of sequences.
– Consider the following inductive definition of a predicate that describes the
content of a linked list
Here, ⊠ represents the null pointer. On a careful analysis of the code, it is easy to
see that:
– At any iteration of the while loop, variables i and j point to two different list
segments having the contents a and b such that concatenating b at the end of ay
will always result in a0 .
– Thus, we have the following loop invariant
106 A.Kr. Singh and R. Natarajan
y
^ a0 ¼ a y b
a b
9a; b ði j
y
Þ ^ a0 ¼ ay :b ^ i ¼ g
a b
8 f9a; b:ði j
y
Þ ^ a0 ¼ y :bg
b
8a f9b:ði j
b y
8b f9b:ði j Þ ^ a0 ¼ bg
y
a0
8c fði j Þg
y
a0
8d fðj Þg
where the loop invariant can be verified using the following proof outline:
y
Þ ^ a0 ¼ ay :b ^ i 6¼ g
a b
1 f9a; b:ði j
y
j Þ ^ a0 ¼ ða:a0 Þy :bg
a0 b
2f9a0 :ði 7! a; p p
k :¼ ½i þ 1;
y
Þ ^ a0 ¼ ða:a0 Þy :bg
a0 b
3 f9a0 :ði 7! a; k k j
½i þ 1 :¼ j;
y
Þ ^ a0 ¼ ða:a0 Þy :bg
a0 b
4 f9a0 :ði 7! a; j k j
b0 y
Þ ^ a0 ¼ ða0 Þy :b0 g
a0
5 f9a0 ; b0 :ðk i
j :¼ i;
b0 y
Þ ^ a0 ¼ ða0 Þy :b0 g
a0
6 f9a0 ; b0 :ðk j
i :¼ k;
b0 y
Þ ^ a0 ¼ ða0 Þy :b0 g
a0
7 f9a0 ; b0 :ði j
y
Þ ^ a0 ¼ ðaÞy :bg
a b
7a f9a; b:ði j
Separation Logic to Meliorate Software Testing and Validation 107
y
Þ ^ a0 ¼ ay :b ^ i 6¼ g
a b
1 f9a; b:ði j
y
1a f9a; a0 ; b:ði j Þ ^ a0 ¼ ða:a0 Þy :bg
a:a0 b
y
1b f9a; a0 ; b; p:ði 7! a; p p j Þ ^ a0 ¼ ða:a0 Þy :bg
a0 b
y
2 f9a0 :ði 7! a; p p j Þ ^ a0 ¼ ða:a0 Þy :bg
a0 b
y
4 f9a0 :ði 7! a; j k j Þ ^ a0 ¼ ða:a0 Þy :bg
a0 b
y
4af9a0 :ðk i 7! a; j j Þ ^ a0 ¼ ða:a0 Þy :bg
a0 b
y
4b f9a0 :ðk i Þ ^ a0 ¼ ða0 Þy :a:bg
a0 a:b
b0 y
Þ ^ a0 ¼ ða0 Þy :b0 g
a0
5 f9a0 ; b0 :ðk i
– Explanations:
• Most of the proof steps, especially those around a command, comprise Hoare
triples, which can easily be verified using the axioms for the corresponding
commands.
• Note the use of * instead of ^ in assertion 3. It ensures that i þ 1 is different
from k and j. Hence, an attempt to mutate the location i þ 1 does not affect
a0 b
the remaining two clauses k and j .
a
• We can obtain 1a from 1 by using definition of i with the fact that
0
a:a
i 6¼ . Then, we unfold the definition of i to obtain 1b from 1a. Finally,
instantiating 9a takes us to 2.
a:b
• 4a is a simple rearrangement of 4. Since i is a shorthand for
b
i 7! a; j j , we can obtain 4b from 4a. Finally, we obtain 5 by gener-
alizing a:b in 4b as b0 using the existential quantifier.
5 Conclusion
In this article, we reviewed some of the important features of separation logic, that
first appeared in [1, 6, 7]. We illustrated the difficulties that arise in the reasoning
and development of modular programs due to unwanted aliasing arising due to
pointer data structures. We introduced the notion of separating conjunction as a tool
to deal with it and presented separation logic as an extension to Hoare Logic using a
108 A.Kr. Singh and R. Natarajan
References
1. J.C. Reynolds, Separation logic: a logic for shared mutable data structures, in Proceedings
Seventeenth Annual IEEE Symposium on Logic in Computer Science (Los Alamitos,
California, IEEE Computer Society, 2002) pp. 55–74
2. C.A.R. Hoare, An axiomatic basis for computer programming. Commun. ACM. 12(10):576–
580, 583 (1969)
3. G. Winskel, The Formal Semantics of Programming Languages (MIT Press, USA, 1993)
4. H. Yang, Local Reasoning for Stateful Programs. Ph. D. dissertation (University of Illinois,
Urbana-Champaign, Illinois, 2001)
5. J.C. Reynolds, An Introduction to Separation Logic (Preliminary Draft). http://www.cs.cmu.
edu/jcr/copenhagen08.pdf. 23 Oct 2008
6. J.C. Reynolds, Intuitionistic reasoning about shared mutable data structures, in Millennial
Perspectives in Computer Science, ed. by J. Davies, B. Roscoe, J. Woodcock (Houndsmill,
Hampshire, Palgrave, 2000) pp. 303–321
Separation Logic to Meliorate Software Testing and Validation 109
7. P.W. O’Hearn, J.C. Reynolds, H. Yang, Local reasoning about programs that alter data
structures, in Proceedings of 15th Annual Conference of the European Association for
Computer Science Logic: CSL 2001, Lecture Notes in Computer Science (Springer, Berlin,
2001)
8. R.M. Burstall, Some techniques for proving correctness of programs which alter data
structures, in Machine Intelligence, ed. by B. Meltzer, D. Michie, vol 7 (Edinburgh University
Press, Edinburgh, Scotland, 1972), pp 23–50
9. R. Bornat, Proving pointer programs in Hoare logic. Math. Program Constr. (2000)
10. H. Yang, An example of local reasoning in BI pointer logic: the Schorr-Waite graph marking
algorithm, in SPACE 2001: Informal Proceedings of Workshop on Semantics, Program
Analysis and Computing Environments for Memory Management, ed by F. Henglein,
J. Hughes, H. Makholm, H. Niss (IT University of Copenhagen, Denmark, 2001), pp. 41–68
11. Peter W. O’Hearn, David J. Pym, The logic of bunched implications. Bull. Symbol. Logic 5
(2), 215–244 (1999)
mDSM: A Transformative Approach
to Enterprise Software Systems Evolution
Note: The earlier results of this study (Sects. 2.4, 3.1, and 4.1) were published in Proceedings of
the Twenty-sixth IEEE International Symposium on Software Reliability Engineering,
Gaithersburg, MD, November 2–5, 2015.
D. Threm
University of Arkansas—Little Rock, Little Rock, AR, USA
e-mail: [email protected]
L. Yu
Computer Science and Informatics, Indiana University South Bend,
South Bend, IN, USA
e-mail: [email protected]
S.D. Sudarsan
Industrial Software Systems, ABB Corporate Research Center, Bangalore, India
e-mail: [email protected]
S. Ramaswamy (&)
ABB Inc., Cleveland, OH, USA
e-mail: [email protected]
1 Introduction
From a very early age, we are taught to break apart problems, to fragment the world. This
apparently makes complex tasks and subjects more manageable, but we pay a hidden,
enormous price… after a while, we give up trying to see the whole altogether - Senge
software system needs to adapt, but only at the appropriate amount for timeliness
and completeness of solution.
This chapter is arranged as follows: Sect. 2—Background concepts on software
design, testing, interaction of design and testing, software stability, and the design
structure matrix (DSM); Sect. 3—mDSM and Evolutionary stability; Sect. 4—Case
Studies; Sect. 5—Advantages of mDSM and Design and Testing Rationalization;
and Sect. 6—Summary and Conclusions.
2 Background Concepts
Designing the tests for a software system is different from testing the design of a
software system. Designing test scenarios for a software systems test generally
include unit testing, integration testing, system testing, system load testing, and
user-defined test cases. Testing the design of a system may include similar tests but
generally focuses on usability-related aspects of the system. Designing test sce-
narios can happen in parallel in both design and development phases.
Testing, development, and design are not necessarily independent activities. The
lines between which techniques are specific to design, testing, and development are
starting to blur. In the Agile methodology, test-driven development (TDD) incorporates
refactoring and test-first development (TFD). Refactoring is making small and incre-
mental change to the design or code without impact on the external behavior of a
software system. TFD is the development of a test scenario and writing the code so that
the code meets the minimal pass criteria of the test scenario.
Their approach is applied to software product lines [17], where the SSM could
bring multiple benefits to software architecture, design, and development. The SSM
is shown to be significantly better than the existing approaches, but the evaluation
criteria are not defined. In other research, Xavier and Naganathan [18] presented a
two-dimensional probabilistic model based on random processes to enhance the
stability of enterprise computing applications. Their model enables software sys-
tems to easily accommodate changes in business policies [19].
Other work in this area includes the identification of stable and unstable software
components. Grosser et al. [20] utilized case-based reasoning (CBS) methods to
predict stability in object-oriented (OO) software. Bevan and Whitehead [21] used
configuration history to locate unstable software components. Wang et al. [22]
presented a stability-based component identification method to support
coarse-grained component identification and design. Hamza applied the formal
concept analysis (FCA) method to identify evolutionary stable components [23].
There are basically two ways to measure software stability. One way is to analyze a
single program and generate its stability metrics based on interdependencies between
software modules [24, 25]. These stability metrics reflect the degree of probability
that changes made to other modules could require corresponding changes to this
module. If a software product has weak dependencies between components, the
software is more stable, i.e., change propagation is less likely to happen.
Another way to measure software stability is based on software evolution his-
tory, which is why it is also called evolutionary stability. Stability is then repre-
sented by differences between two versions of an evolving software product. To
measure the differences between two versions of an evolving software product,
architecture-level metrics and program-level metrics have been used, which are
described below.
Using architecture-level metrics, Nakamura and Basili [26] studied the differ-
ences between versions of the same program. In their research, graph kernel theory
is used to represent software structure. Comparing the graph difference can provide
insight into the version difference. If two systems have different structures, their
measurements will always be different. However, architecture-level metrics work
best for measuring the stability of an entire software product at the architectural
level, but is not readily applicable to measuring the stability of a single component
at the source code level.
Using program-level metrics, Kelly [27] introduced the concept of version
distance, which measures the differences/similarities between different versions of a
software product. In a similar research, Yu and Ramaswamy [28] introduced the
concept of structural distance and source code distance. In both of these studies,
program-level metrics, such as the number of lines of code, the number of variables,
the number of common blocks, and the number of modules, are used as the
underlying measures for version distance.
mDSM: A Transformative Approach to Enterprise Software Systems Evolution 117
To our knowledge, software evolutionary stability has not been formally mea-
sured with information-level metrics based on Kolmogorov complexity. This
chapter intends to fill this research gap, i.e., using the normalized compression
distance (NCD) between two versions of an evolving software product to measure
software evolutionary stability.
maxfðKðxjyÞ; KðyjxÞg
dðx; yÞ ¼ ð2Þ
maxfKðxÞ; kðyÞg
In Eq. 3, xy denotes the concatenation of strings x and y. The NCD has been
applied in many fields such as construction of phylogeny trees, music clustering,
handwriting recognition, and distinguishing images [32, 33].
Arbuckle et al. [34] are the first to apply NCD in software evolution field. In his
series of publications, Arbuckle demonstrated how NCD could be used to visualize
software changes [35] and study software evolution [36–38]. Arbuckle’s research
inspired the work presented in this chapter, i.e., using normalized distance to
measure software evolutionary stability.
118 D. Threm et al.
As the size of a project or a process increases, so does the complexity of the system
as a whole. When a system reaches a certain size or complexity, it becomes more
difficult to see how the system functions as a whole. The DSM was created to
specifically address this problem. The decomposition of a large or complex system
into less complex, smaller subsystems allows one to break down the problem into
smaller and more approachable components. A term coined in the 1970s by Don
Steward [39], the DSM’s purpose is to give a compact view of an otherwise
complicated system.
A DSM will take a complex system and break it down into three basic steps.
First, the whole system will be broken down into subsystems of similar parts. Each
one of these parts should be easier to address, and each part should almost always
function autonomously, not strongly influenced by or heavily reliant on other
subsystems. The next step is to pinpoint all of the existing relationships between
subsystems and how they interact. This will systematically show how the entire
system works as a whole, broken down into steps. The third and final step includes
noting all external outputs and inputs used to drive the system. Annotation of how
each impacts the system is vital in order to understand how the system works and
how it is affected by outside forces.
The DSM could break the entity up even further, taking subsystems and
breaking them down into specific elements. From those elements, the system can be
broken down further into system chunks. These chunks are then assigned to either
different individuals or different teams, based on the specificity of the chunk and the
skill set of the party assigned to it. Breaking the system down in such a way creates
a skeleton and defines its architecture. It also divides the work load. The matrix
itself is a square matrix with rows and columns for each of the elements in the
system. The following DSM, Fig. 2, illustrates a simplified example.
The matrix shows us the dependencies of each element in the system. Reading
the matrix by row reveals each element that the chosen row provides to. For
example, element B provides to the elements A, C, D, and F. Reading the individual
columns will give you what each element depends on. Reading the column for
element C, it is clear that the element depends on B, E, and F.
Furthermore, the DSM can be broken up into two different subsets based on
linearity. These two different kinds of DSM are known as the static DSM and the
time-based DSM. From there, each category can be broken down further into two
more applications of DSM. The static DSM can be defined as either
component-based (architectural) or team-based (organizational), while the
time-based DSM can be broken down into either activity-based (schedule) or
parameter-based (low-level schedule). Figure 3 graphically represents the break-
down of the DSM. Each subcategory and application will be explained in further
detail in the following sections.
A B C D E F
A A 2 2
B B
C 2 C 2 2
D 2 D
E E
F 2 2 F
A C F D E B
A A 2 2
C 2 B 2
F 2 2 C 2
D 2 D
E E
B F
A C F D E B
A A 2 2
C 2 B 2
F 2 2 C 2
D 2 D
E E
B F
In a time-based DSM, the ordering of the rows and columns coincides with the flow
of time. Activities are marked as either upstream or downstream.
2.5.5 Activity-Based
This type of DSM is appropriate when a system is built on a group of activities with
a specific schedule. The matrix is then designed chronologically by order of the
122 D. Threm et al.
activity. This makes more sense with a time-sensitive project or process and is why
this DSM is also referred to as process based.
One can visualize what the DSM would look like based on the previous
exemplar DSMs. The difference lies in the elements. Now elements are replaced
with time-sensitive processes that take chronological precedence from one another.
In that sense, the processes are automatically grouped by dependence since a
process depends most likely on the feedback, or the previous process. The goal of
creating the DSM and analyzing the process is typically to reduce duration.
2.5.6 Parameter-Based
In this section, we introduce evolutionary stability metrics for software systems and
the mDSM Methodology.
X
p
VSðmÞ ¼ 1 VDðm; ni Þ=p ð4Þ
i¼1
subsequent versions. Consider the stability of Version 1.1 in Fig. 7. Assume within
one year after the release of Version 1.1, two subsequent versions based on 1.1 are
released: 1.5 and 1.8. The stability of Version 1.1 in this period can be calculated as
The concept of version distance has a reverse relation with the general concept of
software stability, where long distance means low stability and short distance means
high stability. This relation is reflected in Eq. 4. The value of version stability could
then fall in the range of [0, 1], where 0 means the lowest stability and 1 means the
highest stability.
(Definition) Inter-Component Version stability: Given one version of a
software artifact may have multiple interacting software components (say c), the
inter-component version stability ICVSðcÞ is defined as the average version stability
among an interacting software artifacts’ components mci and other subsequent
p components ðc1; c2; . . .; cpÞ.
X
p
ICVSðcÞ ¼ VSðmci Þ=p ð5Þ
i¼1
The value of inter-component version stability is in the range of [0, 1], where 0
means the lowest stability and 1 means the highest stability among interacting
components.
(Definition) Branch stability: Given one branch (say b) of a software artifact
that has released p versions in a given period of time, its branch stability BSðbÞ in
mDSM: A Transformative Approach to Enterprise Software Systems Evolution 125
this period is defined as 1 minus the average version distance between every two
releases in p.
X X
BSðbÞ ¼ 1 VDðm; nÞ=Cp2 ð6Þ
m2p n2p;n6¼m
Consider the stability of Branch 2.0 in Fig. 7, which contains 3 versions (re-
leases). According to Eq. 6, its stability can be calculated as
BSð2:0Þ ¼ 1 ðVDð2:0; 2:4Þ þ VDð2:0; 2:9Þ þ VDð2:4; 2:9ÞÞ=3. In our definition
of branch stability, we measure the distance between every two releases instead of
the distance between two consecutive releases. Our definition captures the evolution
of the entire branch as a whole. The version distance between every pair of con-
secutive releases can be small, but the earlier release and later release can be
dramatically different if every change is performed on different parts of the artifact.
(Definition) Structure stability: Given an artifact (say a) of a software product
that has p major releases in a given period of time, its structure stability SSðaÞ in
this period is defined as 1 minus the average version distance between every two
major releases in p.
X X
SSðaÞ ¼ 1 VDðm; nÞ=Cp2 ð7Þ
m2p n2p;n6¼m
In the definition of structure stability, major releases refer to those versions that
experienced dramatic changes comparing with their previous versions. For example
in Fig. 7, Versions 1.0, 2.0, 3.0, and 4.0 can be considered as major releases in a
specified period. The assumption behind this definition is that major releases most
likely will include some structure changes.
(Definition) Aggregate stability: Given an artifact (say a) of a software product
that has released p versions in a given period of time, its aggregate stability ASðaÞ
in this period is defined as 1 minus the average version distance between every two
releases in p.
X X
ASðaÞ ¼ 1 VDðm; nÞ=Cp2 ð8Þ
m2p n2p;n6¼m
Aggregate stability measures the entire evolution of the artifact. The computing
process might be expensive and time-consuming. It might be a feasible metric for a
product with small number of releases, however, if a system contains too many
versions, such as Linux with over 700 releases, structure stability is a better sub-
stitute for aggregate stability.
To make these definitions more understandable, we make the following remarks.
First, the above definitions are applicable to any software artifacts that can be
documented as pieces of text or strings, such as source code, requirement
126 D. Threm et al.
Decomposition of the software system is simply the breaking down or factoring the
software system into smaller and more manageable elements, subsystems, units,
modules, or components. This is the first step in the mDSM process and will be
clearly shown in Sect. 4. Modularization is a beneficial technique for reducing
interdependencies among components of system [43]. Removing unnecessary
dependencies and systems redundancy is critical to an efficiently running system.
The mDSM can reduce unnecessary dependencies through effective cluster analy-
sis, by the reordering of rows and columns in the matrix [40].
Step 2 in the development of the mDSM is to identify and document the interaction
and integration of the subsystems or components. Understanding the direct inter-
actions, or dependencies, among systems elements or subsystems is essential to
developing a complete mDSM. There are two distinct types of direct data flow
interactions, dependencies, in mDSM: inputs and outputs. Inputs are traditionally
annotated by an “X” in a standard DSM at the row and column input pairing at the
mDSM: A Transformative Approach to Enterprise Software Systems Evolution 127
end of each rule. Outputs are traditionally annotated by an “X” in a standard DSM
at the column and row output pairing at the end of each rule. Defining inputs and
outputs allows for the establishment of semantic rules to describe the systems
interactions to be mapped within the mDSM.
The mDSM is built off the decomposition of the system. The mDSM is populated
by utilizing the semantic rules that define the systems interactions, established in
Sect. 4.2. The original DSM convention is In Rows (IR) notation which places
inputs in rows and outputs in columns [39]. The same IR convention is used for the
mDSM, yet in the place of a traditional “X” notation or simple quantification
scheme, a value for the selected evolutionary stability metric will be used.
3.2.5 Clustering
The mDSM utilizing evolutionary stability metrics will render as a matrix con-
taining a value between [0, 1]. 0 pertains to low evolutionary stability and 1 is high
evolutionary stability. Figure 8 shows a simple mDSM, where A through F are a
systems module and the values, across direct information-level interactions, are the
inter-component version stability of the interactions. Each module is annotated on
the row and column levels.
128 D. Threm et al.
A A
B B
C .78 C
D .91 D .55
E E
F F
4 Case Studies
The Apache Ant and Apache HTTP studies are to introduce a method of mea-
surement of software evolutionary stability utilizing information-level metrics
based upon Kolmogorov complexity. These open-source systems case studies in
Sects. 4.1.1 and 4.1.2 to establish the feasibility and methodology of computation
are critical to the refinement of the mDSM. The closed-source system case study in
Sect. 4.2 builds upon the use of the evolutionary stability metrics applying to a
much more complex system. The mDSM provides a holistic view of a complex
systems evolutionary stability.
Table 4 Summary of Apache Ant and Apache HTTP (up to September 2011)
Ant HTTP
Category Build automation tool Web server
Language Java C
Source code file *.java *.c, *.h
Initial release 2000 1995
Number of branches 8 5
Total releases 21 84
Figure 9 illustrates the evolution tree of Apache Ant. It is worth noting that in
Branches 1.1, 1.2, and 1.3, there is only one release in each of them.
To study version stability according to our definition, there must be some
subsequent releases based on this version. In the case of Apache Ant, we used the
time frame of two subsequent releases to calculate the version stability. Eight out of
the 21 releases satisfy the time frame requirement. Their version stability is shown
in Table 5. It can be seen that in our study period, the most stable version is 1.6.3,
which has version stability 0.998; the most unstable version is 1.6.1, which has
version stability 0.914. These values can be interpreted as the following: From
Version 1.6.3 to Versions 1.6.4 and 1.6.5, 99.8 % of the source code remains
unchanged; From Version 1.6.1 to Versions 1.6.2 and 1.6.3, 91.4 % of the source
code remains unchanged.
The measurement of version stability is expected to decrease with the increase of
evolution time frame. This observation is illustrated in Fig. 10, where the evolution
stabilities of Versions 1.5.0, 1.6.0, and 1.6.1 are measured with different time
frames. Figure 10 further indicates that software evolution stability is a
time-dependent concept. We can only measure a software product’s evolutionary
stability within a certain time period. There is no absolute measurement of a pro-
duct’s evolutionary stability.
From Fig. 9, we can see that the two longest evolved branches in Ant are 1.5 and
1.6. To see which of these two branches is relatively stable, we calculate their
branch stability. The result shown in Table 6 illustrates that Branch 1.5 is relatively
more stable than Branch 1.6. The result is the same if we compare all five releases
of Branch 1.5 with (a) the first five releases of Branch 1.6 and with (b) the total six
releases of Branch 1.6. Our definition of branch stability (Eq. 6) is based on the
measurement of version distance of every pair of releases in each branch. Therefore,
it reflects the stability of the entire branch as a whole.
Figure 11 illustrates the evolution tree of Apache HTTP. There are five branches
and total 84 releases.
Because Branches 1.3, 2.0, and 2.2 have more releases than other branches (2.1
and 2.3), they are selected to study the general trend of version distance. Figure 12
shows the version distance between the first release and its subsequent releases, and
the last release and its previous releases in Branches 1.3, 2.0, and 2.2. It can be seen
that major changes appear in the earlier releases while minor changes appear in the
latter releases, which are reflected by the growth trend of version distances, i.e., the
version distance becomes more stable with the increasing of time.
To further verify our observations, version stability of selected releases is studied
and the result is shown in Fig. 13. The version stability of these releases is calculated
using the subsequent 10-release time frame. It can be seen in all three branches, with
the selected versions, that with revision versions become more stable over time.
Figure 14 illustrates the branch stability of the five branches of
Apache HTTP. Two measures of branch stability are calculated: (1) using all
releases in each branch and (2) using the first five releases in each branch. The latter
case is in accordance with the total number releases in Branch 2.1. It can be seen
132 D. Threm et al.
Fig. 12 Version distance between the first release and its subsequent releases and the last release
and its previous releases in Branches a 1.3; b 2.0; and c 2.2
that Branch 2.2 is most stable in the first five releases while Branch 2.1 is most
stable for all the releases. Branch 1.3 is most unstable in both the two
measurements.
We studied the version stability and branch stability of Apache Ant and
Apache HTTP in the previous two subsections. In this subsection, we compare the
structure stability and aggregate stability of these two products.
mDSM: A Transformative Approach to Enterprise Software Systems Evolution 133
Figures 15 and 16 show the heat map of version distance between major releases
of Apache Ant and Apache HTTP, respectively. Some long version distances are
found between earlier releases and later releases in Apache Ant. Table 7 further
shows that Apache HTTP is structurally more stable than Apache Ant.
134 D. Threm et al.
Fig. 15 Heat map of version distances between major releases of Apache Ant
The aggregate stability in Table 7 also shows that Apache HTTP is a bit more
stable than Apache Ant despite the fact that Apache HTTP has more releases than
Apache Ant.
mDSM: A Transformative Approach to Enterprise Software Systems Evolution 135
Fig. 16 Heat map of version distance between major releases of Apache HTTP
toolsets used, the stakeholders did not accurately assess the complexities of their
systems. A new tool is needed to rationalize and aide future architectural decisions.
This particular complex software system was chosen for case study for three
reasons: The system meets the definition of a complex system, the number of
system implementation failures, and the author’s ability to study and access the
system of systems. The systems interactions of interest for this study are data flow
interactions. The data flow interactions are of particular interest because the orga-
nization is managed via these data flow interactions. A system within the system
either inputs data to a subsystem or outputs data to a subsystem.
The ERP system of this organization is modular: loosely coupled between
modules and highly cohesive within modules. The technological landscape is very
diverse—it includes DB2 on IBM i-Series (AS400), Microsoft SQL Server on
Microsoft Windows, and Oracle on Unix. Users interact with the data through Java
and .Net web applications, dumb terminals (green screen), and various reporting
technologies. While this case study’s subject is ERP, the same process is applicable
to any system: natural or artificial. The metrics chosen and the interactions explored
would merely be defined to the systems requirements.
ERP systems are complex software systems that support organizations. ERP
systems support organizations functional areas such as operations, engineering,
finance, human resources, and trade compliance. ERP system implementations are
difficult, costly, and prone to failure. High-profile ERP implementation failures are
well documented across multiple industries [1, 4]. This highlights the need for the
rationalization of software systems during design and testing.
Figure 17 shows the major revisions evolution tree of the case ERP over the last
10 years. In order to extract and infer additional information from each component
interaction, we calculate the inter-component version stability (Eq. 5) of each
inter-component interaction. As in Sect. 4.1, we again utilize a program to measure
the NCD between two component files as implemented using C++ with the 7-Zip
compressor [39] in the Windows environment. For example, using semantic rule
number 1, we calculate the version stability of row 1, column 18, capacity planning
(CP) and the version stability of FA, using Eq. 4. The results of VS of CP and VS
of FA are then calculated with Eq. 5. Therefore, using Eq. 4, VS of CP
(11.0.2.0) = 1 − (VD(11.0.2.0, 11.0.3.2) + VD(11.0.2.0, 11.1.1.2))/2 and VS of
FA(11.0.2.0) = 1 − (VD(11.0.2.0, 11.0.3.2) + VD(11.0.2.0, 11.1.1.2))/2. Using
Eq. 5, VS of CP(11.0.2.0) + VS of FA(11.0.2.0)/2 gives us the inter-component
version stability of capacity planning and fixed assets components. The result is a
number between [0, 1] with 0 exhibiting the lowest stability and 1 being stable. This
is calculated for every interaction within the mDSM.
138 D. Threm et al.
Table 8 (continued)
23. Shipping outputs to Inventory Management 47. Skills Matrix inputs into Learning
(c:10, r:2) Management System (r:25, c:24)
24. Shipping outputs to Account Receivables 48. Succession Planning inputs into Skills
(c:10, r:16) Matrix (r:26, c:25)
Overall, the case ERP is very stable as realized in the mDSM in Fig. 18. The use of
the evolutionary stability metric, inter-component version stability, in the mDSM
provides information for where we should focus design and testing efforts. The
evolutionary stability of components within the operations module is very high.
This is the result of manufacturing processes that have not changed in nearly
20 years. The case ERPs high level of stability may explain the difficulty of
replacing the overall ERP. Following, we will examine the less stable interactions.
Examining column 14, showing the interaction of the trade compliance com-
ponents interaction with order processing at row 6, procurement at row 7, and
shipping at row 10, exhibits low evolutionary stability over the last 10 years of the
systems evolution. The trade compliance component interaction with above-stated
components is where systems design and testing efforts should be focused.
Compliance in trade had become a very important issue for the organization, and
efforts were made to address new trade regulations, automate compliance, and to
avoid fines from the US government. That led to multiple changes in the trade
compliance component and to instability with highly evolved system components
in the operations module.
In rows 11 and 12, CAD and PLM, respectively, we find a lower than average
evolutionary stability. Again, a place where design and testing should focus.
As CAD programs were upgraded over the last 10 years, the requirements of the
data captured from the CAD drawings were expanded and required structural
changes at both the application and data interaction level.
In row 20, the Payroll component, we see lower than average evolutionary
stability with the timekeeping and benefits components. Nearly continuous design,
development, and testing efforts take place to manage the interaction among these
components. Yearly changes to benefits providers and worldwide tax law changes
contribute to the lack of stability in these components.
4.4.1 Abstraction
Abstraction also provides a clear view of potential integration and interface issues
that can drive testing and design decisions.
4.4.2 Traceability
Traceability refers to how changes to one artifact impact other artifacts and
therefore set in motion a cascade of changes [43]. Understanding downstream
impact is important to future state systems and change management, especially
when understanding impact of systems replacement. The mDSM provides devel-
opers, testers, designers, and stakeholders with the knowledge of how systems
control will be impacted and upfront knowledge to potential downstream system
impacts.
4.4.3 Optimization
The mDSM provides a matrix view that can be optimized through clustering. The
case ERP is pre-clustered by its higher level modules. Optimization of effort for
testing and design are keys in this case. The mDSM utilizing an evolutionary
stability metric provides clarity to where development, testing, and design efforts
must be focused.
4.4.4 Boundaries
Viewing the lower triangular of the mDSM, we expect to see higher values for
evolutionary stability. On the lower triangular, we expect to see generally lower
values of evolutionary stability. The expectation for this in the case ERP is that the
system in its entirety is highly evolutionarily stable and very modular. We expect
high cohesion within in the higher level modules of operations, engineering, trade
compliance, finance, human resources, and CRM. And loose coupling between the
higher level modules. High stability in the highly cohesive lower triangular and
lower stability in the loosely coupled upper triangular.
Model-based approaches are able to detect design and testing defects [45].
Therefore, allowing testers and designers the ability to validate the model. Given
the use of the mDSM with the case ERP, we can quickly determine impacts to
model. The mDSM provides a simple view of a complex software system. A way to
quickly validate that the software system will meet desired criteria and where
design and testing should focus.
Testing software and designing software without any iteration are challenges yet to
be won! Bugs and issues are reported from software that is in use all the time.
Software evolves with the aim to fix bugs, identify issues, and to add additional
features. mDSM approach offers a meaningful way to optimize the number of test
cases while fixing bugs and issues. In fact, one can even quantify the required
number of test cases in most cases.
As stated earlier, the first step in mDSM is to decompose software components.
The evolutionary metrics will highlight dependencies or interaction among com-
ponents. Stated in another way, evolutionary metrics will outline the coupling and
cohesion between components. The same can be extended to component-based
DSM as well. If components are loosely coupled and autonomous, then testing
should focus on “black box” approach, as the functionality of the component in
terms of input and output remain unchanged. Once the approach is confined to
black box testing, then test cases relevant to the component that has evolved or
changed alone need to be executed. As mentioned earlier, one can apply appropriate
techniques to quantify and identify the relationship between software evolution and
test cases that need to be executed.
Just in case if the software evolution results in additional features or enhance-
ments to functionality, then only test cases to test the additional or “delta” func-
tionality need to be executed. If this results in any change to the interaction matrix,
then test cases as appropriate need to be executed as well.
From a management point of view, version stability defined by mDSM approach
provides a rule of thumb to decide on the effort estimation for testing. Version
stability score and test efforts are inversely proportional with mDSM. The higher
the version stability score, the lower the test efforts needed. Since our version
stability metric identifies the percentage of source code that changed between
versions, the less changes in source code must result in lower test efforts. This
provides an excellent insight since it is not uncommon to find case studies where all
test cases of a module are run if any change is made to the module in question and
very little attention, if any, is addressed to the amount of change and its impact.
mDSM: A Transformative Approach to Enterprise Software Systems Evolution 143
Rationalization of design and test scenarios can be realized through the mDSM
model without making changes to the software system. Change impacts can be
viewed through the model. We can easily validate and rationalize design and test
scenarios based upon interactions with low stability. This provides direction for
where test and design utilization should be focused.
If we can quickly rationalize a software system and address interactions that exhibit
low evolutionary stability the greater our ability to advance software to market. In
order to be innovative, transformational, and disruptive, we must be able to quickly
rationalize architectural and informational level decisions. By zeroing in on prob-
lem areas, we can pinpoint where testing and design efforts should be focused.
6 Conclusions
Future work would include utilizing the mDSM with evolutionary stability for
establishing baselines for design, testing, development, and deployment.
mDSM: A Transformative Approach to Enterprise Software Systems Evolution 145
Exploration of the mDSM with alternative metrics and different levels of abstrac-
tion. Additional work around the combination and complement of information-level
metrics to architectural-level and program-level metrics will be also be explored.
Determining impact dependent on audience: user, developer, designer, or tester.
References
17. M.E. Fayad, S.K. Singh, Software stability model: software product line engineering
overhauled, in Proceedings of the 2010 Workshop on Knowledge-Oriented Product Line
Engineering (ACM Press, New York, 2010), Article 4
18. P.E. Xavier, E.R. Naganathan, Productivity improvement in software projects using
2-dimensional probabilistic software stability model (PSSM). ACM SIGSOFT Softw. Eng.
Notes 34(5), 1–3 (2009)
19. E.R. Naganathan, P.E. Xavier, Architecting autonomic computing systems through
probabilistic software stability model (PSSM), in Proceedings of International Conference
on Interaction Sciences (IEEE Computer Society Press, Washington, DC, 2009), pp. 643–648
20. D. Grosser, H.A. Sahraoui, P. Valtchev, Predicting software stability using case-based
reasoning, in Proceedings of the 17th IEEE International Conference on Automated Software
Engineering (ACM Press, New York, 2002), pp. 295–298
21. J. Bevan, E.J. Whitehead, Identification of software instabilities, in Proceedings of the 10th
Working Conference on Reverse Engineering (IEEE Computer Society Press, Washington,
DC, 2003), pp. 134–145
22. Z. Wang, D. Zhan, X. Xu, STCIM: a dynamic granularity oriented and stability based
component identification method. ACM SIGSOFT Softw. Eng. Notes 31(3), 1–14 (2006)
23. H.S. Hamza, Separation of concerns for evolving systems: a stability-driven approach, in
Proceedings of 2005 Workshop on Modeling and Analysis of Concerns in Software (ACM
Press, New York, NY, 2005), pp. 1–5
24. S.S. Yau, J.S. Collofello, Some stability measures for software maintenance. IEEE Trans.
Softw. Eng. 6(6), 545–552 (1980)
25. S.S. Yau, J.S. Collofello, Design stability measures for software maintenance. IEEE Trans.
Softw. Eng. 11(9), 849–856 (1985)
26. T. Nakamura, V.R. Basili, Metrics of software architecture changes based on structural
distance, in Proceedings of IEEE International Software Metrics Symposium (IEEE Computer
Society Press, Washington, DC, 2005), pp. 54–63
27. D. Kelly, A Study of design characteristics in evolving software using stability as a criterion.
IEEE Trans. Softw. Eng. 32(5), 315–329 (2006)
28. L. Yu, S. Ramaswamy, Measuring the evolutionary stability of software systems: case studies
of Linux and FreeBSD. IET Softw 3(1), 26–36 (2009)
29. L. Fortnow, Kolmogorov complexity, in Aspects of Complexity, Minicourses in Algorithmics,
Complexity, and Computational Algebra (Walter De Gruyter Incorporation, 2001)
30. C. Bennett, P. Gacs, M. Li, P. Vitányi, W. Zurek, Information distance. IEEE Trans. Inf.
Theory 44(7), 1407–1423 (1998)
31. M. Li, X. Chen, X. Li, B. Ma, P. Vitányi, The similarity metric. IEEE Trans. Inf. Theory 50
(12), 3250–3264 (2004)
32. R. Cilibrasi, P. Vitányi, Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–
1545 (2005)
33. N. Tran, The normalized compression distance and image distinguishability, in Human Vision
and Electronic Imaging, vol. XII (2007), 64921D
34. T. Arbuckle, A. Balaban, D.K. Peters, M. Lawford, Software documents: comparison and
measurement, in Proceedings of the 19th International Conference on Software Engineering
& Knowledge Engineering (Knowledge Systems Institute Graduate School, Skokie, 2007),
pp. 740–745
35. T. Arbuckle, Visually summarizing software change, in Proceedings of the 12th International
Conference on Information Visualisation (IEEE Computer Society Press, Washington, DC,
2008), pp. 559–568
36. T. Arbuckle, Measure software and its evolution using information content, in Proceedings of
the Joint International and Annual ERCIM Workshops on Principles of Software Evolution
and Software Evolution Workshops (ACM Press, New York, 2009), pp. 129–134
37. T. Arbuckle, Studying software evolution using artefacts’ shared information content. Sci.
Comput. Program. 76(12), 1078–1097 (2011)
148 D. Threm et al.
38. T. Arbuckle, Measuring multi-language software evolution: a case study, in Proceedings of the
12th International Workshop on Principles of Software Evolution and the 7th Annual ERCIM
Workshop on Software Evolution (ACM Press, New York, 2011), pp. 91–95
39. S.D. Eppinger, T.R. Browning, Design Structure Matrix Methods and Applications, 1st edn,
vol. 1 (MIT Press, Cambridge, 2012), number 0262017520
40. T.R. Browning, Applying the design structure matrix to system decomposition and integration
problems: a review and new directions. IEEE Trans. Eng. Manage. 48(3), 292–306
41. A. Thurimella, S. Ramaswamy, On adopting multi-criteria decision-making approaches for
variability management in software product lines, in First International Workshop on
Requirements Engineering Practices On Software Product Line Engineering, 16th
International Software Product Line Conference, Salvador, Brazil, 2–7 Sept 2012
42. A. Engel, T.R. Browning, Designing systems for adaptability by means of architecture
options. Syst. Eng. 11, 125–146 (2008). doi:10.1002/sys.20090
43. T.B. Callo Arias, P. Spek, P. Avgeriou, A practice-driven systematic review of dependency
analysis solutions. Empirical Softw. Eng. 16(5), 544–586
44. R.R. Yager, Intelligent control of the hierarchical agglomerative clustering process. IEEE
Trans. Syst. Man Cybern. B 30, 835–845 (2000)
45. D. Aceituna, Do. Hyunsook, G.S. Walia, S.-W. Lee, Evaluating the use of model-based
requirements verification method: a feasibility study, in Empirical Requirements Engineering
(EmpiRE), 2011 First International Workshop on, 30 Aug 2011, pp. 13–20
46. Frequently Asked Questions. Apache Ant. N.p. (n.d.). Web. 6 Dec 2015
47. Apache Release History, Apache Release History. N.p. (n.d.). Web. 6 Dec 2015
48. A. Causevic, D. Sundmark, S. Punnekkat, An industrial survey on contemporary aspects of
software testing, in Software Testing, Verification and Validation (ICST), 2010 Third
International Conference on, 6–10 April 2010, pp. 393, 401. doi:10.1109/ICST.2010.52
49. M. Venkataraman, Testing services, 29 June 2015. Retrieved 2 Aug 2015
Testing as a Service
Keywords Taas Test optimization Testing as a service Taas on cloud
Outsourcing Test outsourcing Test service Taas enablers Benefits of Taas
Taas accelerators Traditional testing to Taas Taas limitations Taas disad-
vantages Taas architecture Taas working model Taas framework Taas
demo Taas example Taas infrastructure Industry case study in Taas Taas cost
model Pricing test service Industry leaders in Taas Taas provider Taas
consumer
1 Introduction
helps us commute, ATMs for money withdrawal, smart phones for communication
to television for entertainment, almost everything runs on the software.
Software plays such an important role in our life, so it is of utmost importance to
ensure software quality. Testing is the process of verifying the quality of any
product. Let us consider an example from the textile industry. While buying
clothes, we expect them to be durable, have appropriate size and long-lasting color.
Before selling a cloth, it is the seller’s responsibility to test these basic features of
the cloth or in short, the seller is responsible for the quality of the cloth.
Software testing is the process where software engineers must assure the quality
and correctness of any software before selling it to the customers. Any compromise
in software testing can cause huge monetary loss, sometimes even loss of human
lives; e.g., In Scotland, a Chinook helicopter crashed due to a software defect and
all 29 passengers got killed [3]. Arianne-5 rocket was set to deliver a payload of
satellites into Earth’s orbit, but a software defect caused 27 seconds delay and
eventually resulted in the mission failure. More than 370 million dollars were lost
in this mission [4]. A study by the National Institute of Standards and Technology,
USA, found that every year software defects cost the US economy $59.5 billion [5].
To elaborate software testing, let us consider a flight-booking portal. It is the portal
owner’s responsibility to ensure correctness of the information on the portal.
Each software development company hires, especially skilled testers for testing
products. Testing is about 25 % of software life cycle [6]. It is the tester’s
responsibility to
1. Set up the test environment by positioning the right set of machines. For
example, for the flight-booking portal, we need various devices like laptop,
tablet, and mobile of various screen sizes to make sure user interfaces (UI) work
for all the devices.
2. Write expected behavior for various features and tests to verify. This phase is
termed as creating a test plan. For the flight-booking portal, we ensure that all
the flight information including seat availability is correct. Based on given
information, a test plan should cover test cases to verify all the functionalities
the portal provides.
3. Once a product is ready, it has to be tested under various setups and environ-
ments. For the flight-booking portal, we need to test its usability, scalability, and
performance on various device setups.
Software testing can be of various types [7], a tester mainly considers following
testing types while creating test plans:
1. Functional Testing—Test the software output and verify whether it is the same
as mentioned in the requirements specification.
2. Integration testing—Test various software modules to verify combined func-
tionality after integration. For the flight-booking portal, we have various mod-
ules like portal UI to display the flight information and database to store flight
information. It is important to verify that the information displayed on the portal
is same as that stored in the database.
Testing as a Service 151
3. Sanity testing—In sanity testing, we test just the basic feature and ignore the
secondary features. Sanity testing is the subset of functional testing and we
perform sanity testing, when we do not have enough time for complete testing.
For flight-booking portal, displaying correct flight name and seat availability are
primary features. In sanity testing, we might overlook if some flights are missing
from the portal but there should be no wrong information.
4. Regression testing—Test the complete product after modification of a module.
For a flight-booking portal, it is important to ensure that addition of information
like adding a new flight detail or modification to a module should not risk the
correctness of the portal.
5. Load/Stress testing—Test under heavy loads to determine at what point the
system’s response time degrades or fails. This is accomplished by making a
system respond to both fast evolving data stream and large volume of data as
inputs. Incase of a flight-booking portal, at any point of time, if thousands of
users are trying to book flights, the portal should be able to handle them with
reasonable response time.
6. Usability testing—To test whether a new user can use the software easily. The
product should have help or tips whenever needed. A flight-booking portal
needs to have help and warning notes wherever needed.
As discussed above, in traditional testing model, testers set up a testing envi-
ronment, create test plans, and execute the test plans. We discussed various testing
types. Overall, the model looks promising. However, the traditional testing model
has a few challenges such as:
1. High cost: Usually, the cost associated with building test expertise, buying or
creating best automation tools, and maintaining test infra is very high especially
for small and medium businesses. Moreover, we do not use test resources during
complete software product life cycle. At times, we do not perform any testing
but need to invest in the test infrastructure maintenance cost. For a
flight-booking portal, we do not need the testing devices all the time, but the
maintenance associated with them is a recurring cost.
2. Lack of Resource flexibility: Traditional models do not allow to staff up testing
experts quickly when needed and release when not required. There are certain
testing experts that software companies need during stress testing, but may not
need for other testing types. As a result, you will have to hire a person per-
manently for limited work. In case of a flight-booking portal, you might want to
test it against security threats and hacks and need to hire specialized testers to
perform security testing. These testers are helpful only after the product is ready.
3. Expense for Advanced Test Environments: Advanced tools, latest devices,
and platforms need frequent updates and are expensive. In case of a
flight-booking portal, with new devices and versions releasing almost every
month, it becomes a bottleneck to have all the latest devices in-house for testing
purpose.
152 P. Mishra and N. Tripathi
2 Testing as a Service
“As a Service” solutions are becoming popular as these enable better ROI without
infrastructure investments. In testing, advanced testing efforts require expertise,
large amount of resources, and expensive testing tools. Pay-per-use models allow
Testing as a Service 153
software companies to get what they need and when they need it. As a result, TaaS
solutions are becoming increasingly popular. Let us understand what makes TaaS
delivering a testing service. In this context, next we present an infrastructure that
enables to deliver TaaS.
A service provider provides testing services to any company. The key services
include test consultation, planning, automation, execution, and management.
Figure 2 illustrates the responsibilities of a service provider.
Some of the key driving objectives while planning the testing strategy are:
1. Quality: A number of users, at times millions, interact with the applications or
products. With such high usage and dependency on the data correctness, quality
Fig. 3 Responsibilities of a
service provider
Testing as a Service 155
A service consumer uses the outsourcing model to get a service and pay for it
accordingly [8]. Automated regression testing, security testing, performance testing,
monitoring and testing of cloud-based applications, and testing of major ERP
software such as SAP are most suited candidates for TaaS model since they require
expertise and infrastructure. TaaS is also a promising solution for smaller business
houses, who do not have the resources to invest in testing skills and want to focus
entirely on product development.
When a service consumer moves to TaaS model, the consumer needs to establish
a well-defined process, identify the specific testing area where TaaS is applicable,
and align development teams to TaaS model. Therefore, preparing a high-level
transformation road map is important in achieving the benefits of TaaS model.
A consumer cannot simply jump to an end-point but must plan series of phases.
These include
1. Identify current test model and verify whether to move complete testing or any
specific tests to TaaS.
2. Select the service provider based on the testing requirements for better ROI.
3. Start the transition with smaller release cycles and pilot programs.
4. Clearly define the testing requirements before release cycles.
Figure 4 highlights role of consumers in TaaS model.
After the pilot program, the consumer assesses the effectiveness of the current
model. The service consumer considers following points before deciding to con-
tinue with the service provider:
• Communication is the key so it is important to establish an engagement model.
• Identifying how well the service provider aligns to the process.
• A mechanism to monitor the performance of services through service level
agreements (SLAs).
• The payment mechanism suits the consumer needs with no hidden cost.
Service consumer initially shares the test requirements with the provider. The
requirements include product details, type of testing needed, and timelines. Based
on the requirements, service provider calculates the total cost and creates a test plan.
Based on prices, the consumer can negotiate with the provider regarding the total
cost. The consumer can also choose among the services offered by the provider.
After finalizing the contract, the provider performs the testing and shares the results
with the consumer. Figure 5 shows the flow of interaction between the service
provider and the consumer.
Service provider interacts with the consumer typically over e-mails. Initially,
service provider should have a few in person meetings and communication with the
consumer to speed up the process. It also helps to establish an initial trust between
the provider and the consumer. However, the most common model is service
providers having their Web portals. Service consumers can use these portals for any
kind of communication.
Now, we understand the TaaS architecture and various roles involved in TaaS. At
this time, let us look at the factors favorable to TaaS:
1. In the past, software companies looked at testing as investments done for pro-
duct quality. Today, software companies explore options to reduce testing costs
and consider external experts to perform testing.
As described above, TaaS looks promising. Let us consider the efforts involved in
moving from traditional testing to TaaS. This transition [9] from traditional testing
to TaaS involves positive risks. Let us understand the initiating point, key steps and
challenges involved in moving from traditional testing to TaaS. Figure 6 outlines
the systematic flow [10].
• Identify goals: The service consumer needs to understand the goal when moving
to TaaS. This goal can be to focus on core, cost savings, service of experts, etc.
3 TaaS Infrastructure
test plan creation, pricing, test execution, and interactions between a service pro-
vider and a consumer.
Figure 8 gives an overview of the proposed TaaS framework.
As shown in Fig. 8, the service provider exposes an interface to the service
consumer. The consumer places a request, by sharing all the testing requirements
through the interface. The service provider internally processes the requirements
and provides a test plan. After the service consumer confirms the test plan, the
service provider shares pricing details. Once the service consumer accepts the
pricing, the service provider executes the test cases and shares the test results with
the consumer.
The proposed TaaS framework has three main components: interface, test
consultation unit, and test execution unit. Let us look at these units in detail.
3.1 Interface
The service provider exposes interface unit to a service consumer. The unit contains
the user interface and keeps internal processing abstract. Therefore, the consumer
need not worry about the internal processing.
160 P. Mishra and N. Tripathi
As shown in Fig. 9, the service provider receives test requirements from the
service consumer through interface using ReceiveTestingRequirements method.
After receiving the test requirements, interface invokes GetTestPlan method to get
the test plan from test consultation unit. Through ShareAndFinaliseTestPlan
method, interface shares this test plan with the service consumer. Once the con-
sumer approves the test plan, interface invokes GetPricingModel method to get the
pricing from test consultation unit. Interface shares this pricing model with the
service consumer in ShareAndFinalisePricingModel method. After the consumer
agrees upon the pricing model, interface invokes InstructTestCaseExecution
method from test execution unit. After test execution, interface invokes
ShareTestResults method to share the test results with the service consumer.
Figure 10 shows the class diagram for interface.
Test consultation unit mainly creates the test plan based on the consumer’s
requirements and generates the price accordingly. Figure 11 shows an overview of
test consultation unit.
As shown in Fig. 11, test consultation unit does not directly interact with the
consumer. It gets the instructions and details from the interface.
Test consultation unit mainly offers two functionalities:
Test plan generator: The framework provides the support for test plan creation
based on requirements specified by the consumer. It fetches test cases from an
existing repository and updates the test plan. When the test plan generator does not
find any test cases in Test case repository, it updates the test plan with no test cases
found message.
Pricing calculator: The framework provides a generic support to calculate the price.
Section 5 in this chapter provides more details on how the service provider and the
consumer decide price. The agreed upon test service cost is stored in Pricing repository.
Let us look at these functionalities in detail.
Figure 12 shows the flowchart for test plan generator. The interface first invokes the
ProcessTestRequirement method. This method reads the test requirements shared
by a consumer and converts the requirements into a standard format understandable
by the framework.
Based on the requirements, the GenerateTestScenarios method generates a set of
user test scenarios. GetScenarioDependents method identifies the devices
(phone/tablet/laptop, etc.) and the operating system versions on which the product
needs to be tested. Simultaneously, FetchApplicableTestCases method gets required
test cases from test case repository. There can be manual or semiautomated
Fig. 13 TestPlanGenerator
class
explains testing units. After identifying testing units, interface fetches the price per
unit and gets its dependents. As mentioned earlier in test consultation unit, these
dependents represent the devices and software platforms on which an application
needs to be tested. In case a platform is not available with a service provider, the
cost of building/hiring new platform is also included. After all these calculations,
GeneratePricingModel method generates the final price for testing of an application.
Figure 15 shows the class for price calculation unit.
Test execution unit is responsible for automating or reusing the test cases, executing
required test cases, and sharing test results based on the test plan shared by the
interface. Figure 16 shows an overview of test execution unit.
Let us look at the operations performed by test execution unit in detail:
• Automate test cases: The framework supports to add new test cases and modify
or reuse the existing. As described in Sect. 3.2, if a new test case it added, tester
needs to automate it and add to the test case repository.
• Execute test cases: The executor fetches all the applicable test cases and exe-
cutes them one by one. It understands various test case formats. Accordingly,
the executor executes the test cases. The executor takes care of all the setup
required for a test case.
• Generate the test results: Test result generation is the most critical part of the
framework. In the report generator, we implement the logic to convert the result
of test cases into a readable and user-friendly format. The report uses easy
English or local language, and for better readability, inserts tables and graphs.
In this section, we present a framework that supports above three operations.
This is a generic framework and we can add any number of test cases to it.
Testing as a Service 165
The below TaaS framework is implemented using C#; users can implement
similar solutions using other languages as well.
For automating a test case, we ensure that all the test files follow a well-defined
convention, as the generic framework understands the files based on this conven-
tion. New test case adds a new test class instead of modifying the existing one.
Tester needs to define a class, inherited from ITestCase called as test class.
ITestCase specifies the methods that a test class needs to implement. Figure 17
shows ITestCase.
After fetching existing and the newly added test classes, we need to write methods
that support the execution of these test classes. For this purpose, we implement a
test case executor shown in Fig. 18.
The executor fetches all the test class and executes them on the test machine, one
by one. Figure 19 shows code snippet for the method that fetches all the test classes.
This method reads all the C# classes that inherit from ITestCase. As shown in
Fig. 19, scenarios object fetches all the test classes with respect to a test case
category and returns all scenarios pertaining to a given test case.
Figure 20 shows the basic functionality of test case executor. It first fetches all
the test classes and then executes them one by one as shown in Fig. 20. We can
extend the same method to execute test classes across multiple devices.
Now, we have a basic understanding of creating and executing test classes. After
executing all the test cases, it is very important to generate test results. The service
provider needs to share these test results with the consumer. The results include
number of tests executed, outcome, error details for any failure, and time taken, etc.
Code shown in Fig. 21 generates test results.
We use GetAllTestCase method implemented in Fig. 19 to get list of all the test
cases. We read results from all test cases and store in an HTML Object. We are
using .Net libraries in the GenerateReport method to create HTML file. Figure 21
presents a code snippet for generation of test result document.
In this section, we have a presented an abstract implementation of the proposed
generic framework that we can use for TaaS solution. We can add test cases in the
test case repository and extend the executor to execute tests on various devices.
Accordingly, we need to modify the test result generation. In next section, we
present a case study illustrating steps involved in availing a test service by making
use of the proposed framework.
168 P. Mishra and N. Tripathi
4 An Experiment
scenario, adding a new flight carrier requires testing coordination between the flight
carrier and the company (flight-booking portal). The company needs to maintain a
dedicated team to coordinate with the flight carrier. The team creates required test
plan and executes the test cases as per the test plan.
Suppose, there is a service provider who takes care of testing framework to add a
new flight carrier. Figure 23 shows the interaction between the portal and the
service provider. Service consumer here is the flight-booking portal.
As shown in Fig. 23, service consumer (the portal) shares the testing require-
ments. The service provider assesses the requirements and identifies the compo-
nents involved to fulfill this requirement. The pricing model is decided. We discuss
TaaS pricing model details in Sect. 5. Up to this time, both service provider and the
service consumer understand the requirements and agree on the cost.
From here on, the service provider coordinates with new flight carrier for all the
required testing. Now the company can focus on its core to enrich its database and
provide better user experience.
Figure 24 explains the interaction between flight carrier and service provider.
As shown in Fig. 24, the service provider validates the flight details sent by the
carrier. The provider does validations through automated test cases. As and when,
we need to add a new flight carrier; the service provider fulfills the testing needs.
Now, the flight-booking portal (company) need not maintain the testing infras-
tructure continuously. This results in cost savings for the company as it is using
pay-per-use model. The service provider utilizes the same automated test frame-
work for any new carrier.
Hence, TaaS provides advantage to both service provider and the consumer if
chosen wisely. Taking one more step, we can add this TaaS solution on cloud. We
can use cloud storage for storing the data from the carrier before moving the same
to the database.
170 P. Mishra and N. Tripathi
Fig. 24 Interaction between the flight carrier and the service provider
Testing as a Service 171
4.2 Implementation
In parallel, the interface executes FetchTestCase method that fetches all the
applicable test cases for these scenarios from the existing repository. Figure 28
shows the output of FetchTestCase method.
Next is ReviewAndUpdate method, which is a manual step. In this step, the tester
reviews the test cases and adds the test cases for test scenario 2 for which no test
cases are present in the repository. Figure 29 shows the test cases added by the tester.
Finally, the test plan is generated using GenerateTestPlan method. Figure 30
shows a test plan for different test scenarios to be executed in different platforms.
The service provider shares the test plan with the consumer for review. The
consumer seeks any clarification as needed and finalizes the test plan. Now, the
interface calls Pricing calculator module to estimate test cost and propose to a
service consumer. Test service cost depends on testing units each test case consists
of. A discussion on test service cost is made in the next section.
As explained in Sect. 3.2, the service provider performs testing after consumer
agrees on cost of the service. GetAllTestCases() (Fig. 19) fetches test case code
available in the Test case repository, and for other test cases, tester may write test
code. TaaSExecutor (Fig. 18) executes all the test cases with respect to a test plan,
and corresponding test result report is generated.
In this section, we have extended TaaS framework suggested in Sect. 3 to
support an industry example. We hope this brings clarity regarding the framework
implementation. We can extend this framework for supporting multiple service
providers.
A test service provider needs to price its service in a logical manner not only for its
cost benefit but also to make a service consumer agreed upon. There are many cost
models for assessing software development cost. All these models primarily con-
sider human efforts, problem complexity, and related issues for pricing software.
Similar, approach for costing a test service can be considered here.
Methods of costing can be either linear or nonlinear. Linear model calculates
cost based on the estimated effort and rate of cost per unit of effort, whereas in the
nonlinear pricing model the cost depends on various factors and varies during the
product life cycle. The nonlinear model provides flexibility to consumers. A service
provider and a consumer explore various options for a task in hand while costing.
Costing for test services can adopt any of these two based on quantum of a service.
A relatively small in size software testing service requiring not much time can adopt
linear model whereas testing of larger size and spanning over long time for various
test conditions can adopt nonlinear model for costing. So, for test service costing
size of software and its complexity and varied test conditions are the points of
considerations.
The steps involved in costing starts with sharing of test requirements to test
service provider by a service consumer. For example for updating flight-booking
portal, portal agency needs to specify not only the desired updates but also the
devices, operating systems, and browsers on which updated system is required to
perform. Further security, language support, and other domain-specific issues are to
174 P. Mishra and N. Tripathi
6 TaaS on Cloud
6.1 Benefits
While TaaS model allows the usage of several hardware and software platforms for
a testing service, a provider may not have such resource on its own. Further, as
technology often changes, there could be several versions of these resources. A test
service provider acquiring all these versions is also an expensive proposition. In
order to meet this challenge, solution can be found with cloud technology on which
a service provider can find required resources.
A cloud model provides a simplified [12] solution to manage resource utilization
costs since the servers, hardware, and their applications are on cloud. Depending on
the enterprise requirements, the amount of infrastructure utilized can be increased or
decreased to manage the existing load through the benefits of a pay-per-use model.
Consider an application having both web and mobile interface. We need to test this
on various laptops, desktops, tablets, and mobiles. Hardware purchase and recurring
maintenance costs are significant. In such scenarios, it is advisable to move to cloud
and use virtual machines instead of hardware for better ROI.
Testing as a Service 175
7 Conclusion
wishing to avail such a service first has to undertake a well-defined SLA. Then, the
required testing service is provided as per the agreed SLA. IBM, Wipro, and HP are
a few leading test service providers [14] that offer tools enabling software devel-
opers in availing their test services.
On recognizing the recent TaaS developments, the chapter identifies the issues
involved and presents a mechanism for implementing a generic TaaS framework.
We have explained behavior of the proposed generic framework with the help of an
example drawn from real-life application. Industry and academia can further
research and improvise the proposed framework. In this chapter, we have presented
TaaS architecture, TaaS enablers, factors involved in moving from traditional
testing to TaaS, and its benefits and limitations. By moving traditional testing to
TaaS, it is possible to optimize testing costs by opting for outsource service and not
making a permanent investment for it. The approach has inherent potential for
quality because of well-orchestrated collaboration of developers and testers. We
further have explored how TaaS on cloud presents cost and efforts optimization
opportunity. TaaS is nascent but promising. As it happens with any emerging
solutions, standard universally agreed options are missing. Big technology players
can research and innovate more in TaaS to realize its true potential and standardize
available solutions and frameworks.
References
1. http://www.academia.edu/8477067/Testing_as_a_Service_on_Cloud_A_Review
2. https://en.wikipedia.org/wiki/Software
3. http://en.wikipedia.org/wiki/Chinook_helicopter
4. https://www.ima.umn.edu/*arnold/disasters/ariane5rep.html
5. www.nist.gov/director/planning/upload/report02-3.pdf
6. www.cs.toronto.edu/*jm/340S/02/PDF2/Lifecycles.pdf
7. http://www.softwaretestinghelp.com/types-of-software-testing/
8. http://blog.qatestlab.com/2011/02/26/testing-as-a-service-outsourcing-your-specialized-software-
testing/
9. http://www.nasscom.in/software-testing-emerging-opportunities
10. https://en.wikipedia.org/wiki/Service_(economics)
11. www.cognizant.com/InsightsWhitepapers/Taking-Testing-to-the-Cloud.pdf
12. http://finance.flemingeurope.com/webdata/4899/LeanApps_White_Paper_Testing_as_a_Service_
ENG.pdf
13. http://www.csc.com/independent_testing_services/offerings/82906/83166-testing_as_a_service_
taas?ref=Ic
14. http://www.softwaretestinghelp.com/software-testing-service-providers/