Software Testing Techniques, 2nd Edition
Software Testing Techniques, 2nd Edition
TECHNIQUES
Finding the Defects that Matter
https://archive.org/details/softwaretestingtOOOOunse
Software Testing Techniques:
Finding the Defects
that Matter
LIMITED WARRANTY AND DISCLAIMER OF LIABILITY
CHARLES RIVER MEDIA, INC. (“CRM”) AND/OR ANYONE WHO HAS BEEN IN¬
VOLVED IN THE WRITING, CREATION OR PRODUCTION OF THE ACCOMPANY¬
ING CODE IN THE TEXTUAL MATERIAL IN THE BOOK, CANNOT AND DO NOT
WARRANT THE PERFORMANCE OR RESULTS THAT MAY BE OBTAINED BY USING
THE CONTENTS OF THE BOOK. THE AUTHOR AND PUBLISHER HAVE USED
THEIR BEST EFFORTS TO ENSURE THE ACCURACY AND FUNCTIONALITY OF
THE TEXTUAL MATERIAL AND PROGRAMS DESCRIBED HEREIN. WE HOWEVER,
MAKE NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, REGARDING THE
PERFORMANCE OF THESE PROGRAMS OR CONTENTS. THE BOOK IS SOLD “AS IS”
WITHOUT WARRANTY (EXCEPT FOR DEFECTIVE MATERIALS USED IN MANU¬
FACTURING THE BOOK OR DUE TO FAULTY WORKMANSHIP).
THE SOLE REMEDY IN THE EVENT OF A CLAIM OF ANY KIND IS EXPRESSLY LIM¬
ITED TO REPLACEMENT OF THE BOOK, AND ONLY AT THE DISCRETION OF
CRM.
Scott Loveland
Geoffrey Miller
Michael Shannon
TFtf~°T£SJ- Bat,
^ORoutZi^lTy
°KTARhO
CHARLES
RIVER
MEDIA
No part of this publication may be reproduced in any way, stored in a retrieval system of any
type, or transmitted by any means or media, electronic or mechanical, including, but not limited
to, photocopy, recording, or scanning, without prior permission in writing from the publisher.
Scott Loveland, Geoffrey Miller, Richard Prewitt Jr., Michael Shannon. Software Testing Tech¬
niques: Finding the Defects that Matter.
ISBN: 1-58450-346-7
All brand names and product names mentioned in this book are trademarks or service marks of
their respective companies. Any omission or misuse (of any kind) of service marks or trademarks
should not be regarded as intent to infringe on the property of others. The publisher recognizes
and respects all marks used by companies, manufacturers, and developers as a means to distin¬
guish their products.
CHARLES RIVER MEDIA titles are available for site license or bulk purchase by institutions,
user groups, corporations, etc. For additional information, please contact the Special Sales De¬
partment at 781-740-0400.
Notices
This book represents the personal opinions of the authors and they are solely responsible for its
content, except where noted. Use of the information contained in this book is at the user’s sole
risk. The authors will not be liable for damages, direct, indirect, consequential or otherwise aris¬
ing out of or in connection with the use of this book or its content.
Reprinted with permission from Loveland, Miller, Prewitt, Shannon, “Testing z/OS: The pre¬
mier operating system for IBM’s zSeries server,” IBM Systems Journal 41, No 1, 2002, portions
as denoted in the text. © 2002 by International Business Machines Corporation.
Reprinted with permission from of IEEE Std 610.12-1990 IEEE Standard Glossary of Software
Engineering Terms, definitions of complexity, test case, and test procedure, Copyright 1990, by
IEEE and ANSI/IEEE Std 792-1983 IEEE Standard Glossary of Software Engineering Terms, de¬
finition of software complexity. The IEEE disclaims any responsibility or liability resulting from
the placement and use in the described manner.
Enterprise Systems Architecture/390, eServer, IBM, Lotus, Notes, Parallel Sysplex, System/360,
System/370, z/OS, z/VM, zSeries, and zArchitecture are trademarks of International Business
Machines Corporation in the United States, other countries or both.
Intel and Pentium are trademarks of the Intel Corporation in the United States, other countries,
or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, and service names may be trademarks or service marks of others.
\
*
m
Contents
Notices v
Acknowledgments xix
Preface xxj
Summary 176
Contents xiii
References 335
Glossary 339
Index 349
A
■ Acknowledgments
xix
A
Preface
C omputer software runs the world. If it crashes, even briefly, the impact to a
company’s bottom line can often be measured in millions. But crash it does.
Well-publicized failures from both industry and government have under¬
scored the need for mission-critical software to withstand harsh conditions. Before
it sees the light of day, software must prove its reliability under a white-hot beam
of rigorous interrogation. It must be poked and prodded, squeezed and stretched,
trashed and torn. In other words, it must be tested.
If you are reading this book, then you are probably already interested in software
testing. Perhaps, like many testers, you learned how to write programs, but found
you had more fun breaking things than creating them. We did too. So we kept at it,
fighting in the trenches to kill software bugs before they could escape into some un¬
suspecting user’s system. We wouldn’t have it any other way. To a tester, there’s
nothing quite as satisfying as ripping into databases, crushing transaction monitors,
blasting Web application servers, or forcing operating systems to their knees.
This book will help you succeed in your own hunt for bugs. We’ve distilled
decades of experience across a spectrum of test disciplines into concrete guidelines,
recommendations, and techniques for test practitioners. For newbies, it’s a practi¬
cal guide to the fabulous world of software testing. For veterans, it offers additional
perspective and new ideas, and may challenge some long-held beliefs. Additionally,
there is useful information included for beleaguered test managers and those in¬
trigued by test processes. But at its core, this book is a survival guide written by
testers, for testers.
We will zoom in on approaches that have demonstrated their effectiveness over
time. The book’s concepts are aimed at testing the kind of software that companies
bet their business on. Key methods and techniques are illustrated through case
studies of actual product tests drawn from a software development laboratory.
These case studies reveal how software is tested on what many consider to be
the gold standard for reliable, robust computing: the mainframe. But this is by no
xxi
Preface
S omeone once said that if you don’t know what you want, you’ll probably
never get it. Likewise, unless you understand the nature of industrial-strength
software and the environments in which it operates, your testing may miss the
mark. That’s because computers are no longer just tools for making organizations
more productive and efficient. The use of computer technology has evolved to the
point where small businesses, corporations, and governments simply cannot sur¬
vive without computers and software. But with that dependence comes a certain
fear: that a software bug will strike at the worst possible moment and cause a dev¬
astating outage. Fortunately, one brave group stands between enterprises that rely
on software and bugs that lurk within it. They are called testers.
This section looks at why we need to worry about testing. It starts on the people
side of the equation, and explores why different personalities play different roles in
the software development process. It examines the attributes of people willing to
take on the testing challenge and the mindset needed to be successful. The section
then turns its attention to technology and puts a microscope on the kind of soft¬
ware that has become so ingrained in business and government that it has fostered
such dependence. It studies the software’s characteristics, the nature of the com¬
puting environments in which it must survive, and the tremendous challenges in¬
volved in its testing. The next two chapters set the stage for the rest of the book, and
will help you base your testing on insight, rather than guesswork. Now let’s get
started.
1
The Testing Challenge and
Those Who Take It On
In This Chapter
■ The challenging nature of software testing
■ Why test tools do not supply the complete answer
■ Typical users of large-systems software
■ Personality traits of good testers
A s a software tester, the fate of the world rests on your shoulders. This state¬
ment is not an exaggeration if you accept the dual premises that computer
software runs the modern world and that all software has bugs—then you
will quickly reach the inescapable conclusion that unless the most disruptive bugs
are removed, the world as we know it will grind to a halt. You are the one in the
software development process that has the role of unearthing those bugs.
Ensuring that software is production-ready is the goal of testing. That sounds
easy enough, because after all, anyone can test—it’s just an annoyance a software
developer has to put up with at the end of a product development cycle. If you agree
with this, you have demonstrated one challenge testers face. On the surface, testing
seems to be such an easy concept to grasp that performing it “must be simple.”
However, most software is not simple. It is complex, the demands placed upon it
are immense, and the breadth of the environments into which it is deployed is
3
4 Software Testing Techniques: Finding the Defects that Matter
tremendous. Once you see this, you can begin to understand the scope of challenges
faced by those who choose to toil within the test discipline.
The testing challenge stretches beyond software complexity. It reaches into the very
nature of the software engineering discipline.
Software Development
Over time, there have been many improvements to the process and tools needed to
develop software. High-level programming languages have evolved to where they
can greatly ease the burden of developing code. Design modeling techniques have
matured, code generation tools have grown and expanded, and many other soft¬
ware development tools have appeared and been refined. Much effort has been
spent, and continues to be spent, on making software developers more productive.
This means developers can generate more code in less time with less effort. Unfor¬
tunately, it does not mean they are writing it any better. To the tester, more code
simply means more bugs.
Software Test
During this time of wide-ranging improvements in software development tools,
not nearly as much research or effort has been spent on similar improvements in
test tooling. Perhaps this is because it is a much more difficult problem to solve.
One can quickly grasp the requirements and approaches to improve a software de¬
veloper’s ability to create software. On the other hand, it is not trivial to even un¬
derstand what a tester needs, let alone create test tooling to facilitate it.
testers could not be successful without load/stress tools. However, none of these test
execution tools assist the tester in defining what tests to execute. Automated test ex¬
ecution is highly recommended, but not the total answer to the testing challenge.
Many attempts have been made to automate test creation. One approach requires a
formal design specification written in a compilable language. Some success with this
approach has been demonstrated, but its usefulness lasts only as long as design and
development agree to write formal specifications, and write them in the language re¬
quired—not a common occurrence in most commercial software projects.
Another approach is to generate test cases from a tester-defined software be¬
havior model known as a state transition diagram. Such a diagram identifies all of
the software’s possible states, and the permitted transitions between those states.
But in large software products, the total number of potential states is huge by any
measure. Attempting to cover all potential states is not cost effective or even desir¬
able since it would waste precious time by generating many uninteresting tests. As
a result, this approach doesn’t scale well. The challenge for the tester to define and
create the most appropriate tests is still there.
All software has bugs. It’s a fact of life. So the goal of finding and removing all de¬
fects in a software product is a losing proposition and a dangerous objective for a
test team, because such a goal can divert the test team’s attention from what is re¬
ally important. In the world of general-purpose, large-scale, business-critical soft¬
ware, it is not the goal of a test team to find all of the defects. That is a practical
impossibility. Rather, it is to ensure that among the defects found are all of those
that will disrupt real production environments; in other words, to find the defects
that matter.
ASTUB CSECT
BR 14 Branch back to the caller
END
is that the behavior ot this program is not consistent. By assembler language con¬
vention, register 15 contains a return code. Upon exit from this program, the con¬
tent of register 15 is unpredictable. It is common practice to examine the return
code of programs to determine success or failure. A program with an unpredictable
return code that gives no indication of success can cause other programs such as
automation routines, to behave incorrectly. So the failure of this program to set a
return code is indeed a bug. Furthermore, the fact that this bug might cause other
software using it to fail, and conflicts with common practices surrounding its in¬
tended use by customers, means that it could impact customer production envi¬
ronments. That makes it a defect that matters.
The fix:
ASTUB CSECT
SR 15,15 Set return code
BR 14 Branch back to the caller
END
It is hard to imagine a more trivial example of a program, a bug, or a fix, but the
test challenge it illustrates is indeed complex. Testers need to go beyond testing per re¬
quirements, testing for code coverage, and showing that abnormal termination or sys¬
tem crashes are not present. Finding what is missing from a product based on a
thorough understanding of common practices surrounding its use, or its expected
use, by customers is a nontrivial, but important challenge testers need to address. Un¬
fortunately, there is no test tool that will automatically find this type of bug. It takes a
good tester with the right mindset, proper preparation, and in-depth knowledge of the
software and how it will be used to find and destroy these types of defects that matter.
WHAT IS A USER?
The requirement of understanding the expected customer use of the software under
test has already been mentioned. But, as with the nature of testing itself, this is not
as simple as it seems. In the world of large, mission-critical systems that keep major
companies in business, the term “user” does not necessarily mean “end user.” Large
systems that run on large servers support tens of thousands of end users (or even
more, in some cases) who rarely interface directly with the large-systems software.
Instead, they interface with an application or set of applications running on the
servers. The true users of large-systems software are a set of IT professionals whose
job is to support a production environment.
System administrator is a common term for an IT professional who supports a
server at a departmental level. However, large production environments need more
8 Software Testing Techniques: Finding the Defects that Matter
All of these users have different needs, requirements, and expectations. Systems
programmers are particularly interested in the Reliability, Availability, and Service¬
ability (RAS) characteristics of the software. In other words, systems programmers
expect software to recover from failure, be configurable to avoid any single points of
failure, provide ways to dynamically make changes to the systems, and provide di¬
agnostic data when a problem occurs so the root cause can be determined and fixed.
Remember, when thinking about mission-critical software, the term reboot and
what it represents is not an option. Operators are interested in the software’s ability
to alert them accurately to situations they may need to respond to—and doing so in
real time, before it is too late. They are also interested in ease of automation of rou¬
tine operations. Database administrators are interested in the transaction logging,
forward recovery, data integrity, and dynamic change capabilities of the software.
Other system support professionals have their own set of requirements and expec¬
tations. The entire system support staff makes use of the software to provide very
high levels of availability to their business’s mission-critical applications and the end
users of those applications.
The point is, testers must understand the expectations placed upon the software
they test. Testing to remove defects defined in the traditional sense (e.g., logic flaws,
abnormal termination) is just the beginning. Testing is also required to prove such
things as systems management capabilities, system longevity, and system recovery
from failures. Almost all existing test tools, research, and academic study is geared
toward removing the “traditional” defects. This is very important and a requirement
The Testing Challenge and Those Who Take It On 9
of any software development project, but it is not the total answer. The other testing
needs described are a challenge because they rely on the knowledge and skill of the
test professional. The demands upon this knowledge and skill increase as customer
production systems become more and more complex.
Good testers enjoy breaking things, especially developers’ software. They delight in
identifying likely vulnerabilities and devising attacks to expose them, including gen¬
eralized categories of attacks that can be applied to a wide range of software products.
The testing personality is different from that of most other software professionals,
who view defects as annoyances to be avoided—“bumps in the road” toward their
goal of seeing things work. Having this “breaker mentality” is a crucial trait for those
testing software developed to support continuous operation [Loveland02].
works, while testers seek to prove the code doesn t work. It s human nature. Be¬
cause the developer wrote the code, he believes it works. It is not a good idea for the
software developer to be its sole tester, except in the rare case when the only user of
the software is the developer himself.
Tester Traits
Testers are a special breed for a variety of reasons. Certain personality traits are re¬
quired to be a successful, happy, long-term tester. The “anybody can test” myth is
exactly that—a myth. Let’s consider some common personality traits shared by
testers.
Curious
Good testers consistently demonstrate high degrees of curiosity. “Why” and “What
if” are common questions they contemplate. They want to know about the func¬
tion they’re testing, the rationale and requirements that instigated the develop¬
ment of that functionality, and how the software will behave when they get a chance
to put it through its paces. Testers are not shy about asking questions.
Skeptical
Testers are natural skeptics, and relate easily to Missouri’s motto: “Show me.” They
ask questions and read specifications, but are not satisfied until they experience
things for themselves and draw their own conclusions.
Restless
Successful testers are rarely completely satisfied with their work. If a bug escapes
from test into production, a tester does not argue facts or become defensive—he
takes the problem, analyzes it in detail, and determines what can be enhanced or
added to the test to find the same kind of problem in the next test iteration. Good
testers always strive to improve, realizing that any problem found by a customer that
makes that customer unhappy is automatically considered a defect that matters.
Upbeat
Another trait exhibited by successful testers is the ability to stay positive in a nega¬
tive environment. Testers do not consider a defect a negative as do all others in¬
volved in the software development process. Testers are always under time
constraints, frequently asked to take risks beyond their better judgment, and often
challenged to defend their view of the product. It is not easy being part of the end
phases of a long development process. Testers routinely get squeezed from both
ends—late delivery of code combined with an immovable delivery date. Good
The Testing Challenge and Those Who Take It On 11
Diplomatic
Diplomacy is another important trait possessed by good testers. Good developers
have strong egos—it’s often one of the things that makes a developer good. The
tester needs to be able to be the bearer of bad news and, at times, be prepared to tell
a developer that his “baby is ugly.” This must be handled in a way that elicits the
correct solution without causing contention. To be a successful tester, you need to
cultivate excellent relationships with developers, understanding that everyone
shares the same goal of delivering the highest-quality products to the marketplace
at the right time. For more on this topic, see Chapter 4, “The Test and Development
Divide.”
Insatiable
With an unquenchable thirst for knowledge, testers want to fully understand the
technology they work on, how it is used in real-world environments, and the real
concerns of their customers. They enjoy seeing the big picture, understanding com¬
plete systems, and working on different problems. A major challenge for testers is
keeping up with new and changing software technology and test techniques. The
life of a tester is never dull.
Generous
Testers are also educators at heart. No one person has the knowledge and ability to
understand all aspects of very large systems in detail, so testers frequently rely on
each other. They teach each other about different aspects of systems and share in¬
formation about various software components. Also, since it is testers who get the
initial view of new software and systems technology, how it behaves, and what
should be avoided, they make very good customer educators. Testers see a broader
picture than developers do and can share that perspective and wealth of experience
with their customers.
Empathetic
Another invaluable trait is empathy. One of the most important roles a tester plays is
that of customer advocate. A good tester continually thinks in terms of his customers,
what is important to them, and what he needs to do to “protect” them. Without a
working knowledge of customers and their businesses, testers do not have much
chance at being successful at finding the defects that matter most. They should have
a complete understanding of how their customers use the software, how they deploy
that software, and the day-to-day activities of the software support staff. Good testers
12 Software Testing Techniques: Finding the Defects that Matter
jump at the chance to interact with their customers through discussions, working to
re-create problems they uncover, or spending time at their site. Testers are happiest
when their customers are happy and successful when their customers are successful.
Resilient
Resiliency is also important. The customer (either another test team or an external
customer) is bound to find a bug or two with any new software. These customers
are not privy to the many defects already removed. They only see the tester’s fail¬
ures, not his successes. Many times (actually all times if the test was a good one) the
ratio of problems found by the tester is disproportionate to those found by the cus¬
tomer. Testers must be resilient and thick-skinned when after they remove hun¬
dreds of defects from a product, a customer installs it, finds one problem, and asks,
“Didn’t you test this?”
SUMMARY
There are no hard and fast answers when addressing many of the challenges faced
in the software test discipline. The testing of complex, critical software is not made
up of one thing, accomplished with one tool, or even conducted by one test team.
This book identifies and describes some proven approaches to addressing these
challenges.
In Chapter 2, “Industrial-Strength Software, It’s Not a Science Project,” we’ll
look at another side of the testing challenge, namely the monster that testers must
wrestle to the ground: large-scale, general-purpose software. We will identify its
characteristics, typical environments in which it is required to perform, and some
examples of the demands placed upon it.
m
m
m
5
4 5 Industrial-strength
Software, It's Not a
Science Project
In This Chapter
■ Characteristics of industrial-strength software
■ The nature of production environments
■ Examples of mission-critical software in action
■ A Case Study
13
14 Software Testing Techniques: Finding the Defects that Matter
INDUSTRIAL-STRENGTH SOFTWARE
PRODUCTION ENVIRONMENTS
Test-sensitive Characteristics
What is the nature of these environments? There is no single answer to that question.
That simple fact alone causes headaches for software testers, especially those working
on programs that will eventually be sold as commercial products, because it’s impos¬
sible to predict (or emulate) every environment in which the program will someday
run. In fact, it’s not uncommon for a tester who reports a defect to hear a developer
deem it a low priority because the tester found it by emulating an “unrealistic envi¬
ronment” in which no customer would ever run—only to have that very same prob¬
lem become the first customer-reported product defect. Guessing what every
customer will or will not do with a piece of software is about as easy as predicting the
16 Software Testing Techniques: Finding the Defects that Matter
rise and fall of the stock market. However, there are some typical characteristics we
can examine.
Heterogeneity
First and foremost, production environments are heterogeneous. It is the rare en¬
terprise that utilizes a single make and model of server across its business, let alone
software from only one manufacturer. Unheard of might be a more accurate de¬
scription. Rather, most large enterprises have a broad mix: servers ranging from
small Intel® processor-based workstations to blade servers to midrange boxes to
mainframes. Collectively, these processors run such things as Microsoft Windows,
UNIX®, Linux®, IBM® z/OS®, and IBM z/VM® operating systems. The networking
infrastructure likely supports multiple speeds of Ethernet and perhaps one or more
less-common technologies, such as asynchronous transfer mode (ATM), Fiber Dis¬
tributed Data Interface (FDDI), or even token ring. Disk, tape, and other periph¬
erals span several brands.
A related aspect is the concept of a software stack. A stack typically starts with
an operating system. Layered on that are one or more middleware products, such as
a relational database, a Web application server, or a transaction monitor. On the
top are the applications that end users actually interact with.
Perhaps most important for this discussion, all of this software running both
within a single server and across all servers in the production infrastructure prob¬
ably come from many different suppliers. At any one time, a large IT shop may be
running hundreds of software products from a dozen or more vendors. Try fitting
all that into a test variation matrix and you will have a new appreciation for the
word “headache.”
Size
A related characteristic of production environments is their sheer size: terabytes of
data, thousands of MIPS (Millions of Instructions per Second, a metric for proces¬
sor capacity—sometimes affectionately known as Meaningless Indicator of Proces¬
sor Speed), miles of interconnecting cable, acres of floor space, and megawatts of
electrical power. You get the idea. This kind of scale is probably impossible for the
tester to replicate and difficult to emulate, but it’s necessary to try.
on meeting them. As a tester, it’s not too much of a stretch to suggest that their jobs
are in your hands. SLAs often limit both planned and unplanned down time to
minutes per month. That in turn leads to an insistence on tight control over
changes and very narrow windows for rebooting systems to pick up software fixes,
make configuration updates, or perform other maintenance. Think about that the
next time you’re asked to recreate a software problem a dozen times to gather
enough diagnostic data to debug it.
Continuous Availability
Globalization and the influence of the Internet have put additional pressure on cor¬
porate IT departments and their SLAs. For decades, companies have run their in¬
teractive applications during the day, then locked down their databases at night to
perform consolidation and reconciliation activities, reconfigure the network, or do
other disruptive tasks, often by running streams of specialized batch jobs. There is
even a term for this down time: the batch window. Now, as companies expand op¬
erations across several continents and time zones, consolidate data centers, and
serve customers who use the Internet to place orders, access accounts, or perform
other transactions at any time of day, the concept of a batch window is often little
more than a memory.
Instead, companies must keep their systems up for weeks, months, or longer
without an outage. System down time or unplanned outages, even of short dura¬
tion, can cost millions of dollars in lost revenue. Enabling Technologies Group
[BothwellOl ] reports that surveyed IT users identified down-time costs of up to one
million dollars a minute.
In such an environment, reboot is not a valid recovery option. Production
users don’t think about availability in terms of minimizing the length of an outage;
they think in terms of minimizing outages—period. The most important element
in keeping these systems humming is rock-solid, reliable software that refuses to
fail, regardless of how heavily it is pushed or how long it has been running. Prov¬
ing such levels of reliability is a major challenge for testers, requiring special focus
in such areas as recovery and longevity testing.
backup and restore capability intended to ensure that a data center can pass the
“roaming elephant” test (i.e., if an elephant wanders into the data center and ran¬
domly picks a spot to sit, could it take down any critical applications?). Such plans
rely on tools and technology to achieve their desired ends, and so have their share of
bugs for testers to target.
Compatibility
Few production environments pop up overnight. They evolve and grow over time,
and so do their application suites. It is not unusual for a business to have many mil¬
lions of dollars invested in critical applications that have been operating and evolv¬
ing for years. Needless to say, businesses are not anxious to jettison these
investments and start over in order to use a new version of anything, no matter how
clever its features. They expect each new release of a given software product, be it
an application, middleware, or the operating system itself, to be compatible with its
own prior release or set of releases.
IT departments must be able to migrate smoothly to a new release in a staged
manner, upgrading the software product gradually, one system at a time, until the
entire complex is transitioned over. If a product required a “big bang” approach,
necessitating all copies across the data center to be upgraded at the same time, a
chorus of screams would erupt from IT departments and the new version of the
product would end up on a shelf gathering dust. The poor manufacturer who had
devised such a “migration plan” would be forced to appease CIOs around the globe
by explaining how it intended to solve the problem.
Naturally both compatibility with exploiting or corequisite products and with
one’s own prior releases, must be verified by the tester.
Virtualization
Imagine a case where dozens, hundreds, or even thousands of programs or users
must operate at the same time on a single machine, but each must believe that it is
alone on the system, that it controls the entire machine’s resources and can do
with them as it wishes. To accomplish this, a single server would need to be split
into hundreds of images, each running its own instance of an operating system. To
each such operating system instance it would need to appear as if it has sole pos¬
session of a set of resources (e.g., CPU, memory, disk, networking), so that it in
turn could serve up those resources to programs requesting them.
To perform such magic requires something called a hypervisor, a piece of soft¬
ware that hosts multiple copies of other operating systems. This hypervisor pools
server resources in a way that allows them to be shared efficiently. It then hands out
virtualized instances of those shared resources to the operating systems it hosts
(making them appear as though they are dedicated to each) and manages those re-
Industrial-strength Software, It's Not a Science Project 19
sources to balance their usage effectively, all the while ensuring each virtual host is
completely fenced off from the others. Long used in the mainframe world, this
technology has begun permeating other server types as well as a way to maximize
the use of expensive hardware.
For the tester, virtualization doesn’t present a challenge, but rather offers very
useful opportunities for creating effective test environments.
MISSION-CRITICAL SOFTWARE
Some companies must do more than simply satisfy their customers—they must
also satisfy governments. For instance, the United States Securities and Exchange
Commission (SEC) dictates trade clearing, account balancing, and other daily
deadlines that financial services companies must meet, or they will face significant
fines. For such companies, any system that is key to their compliance with govern¬
ment regulations is mission-critical.
Most midsized and large companies provide a pension plan for their employees, be
it a traditional defined-benefit plan, a defined-contribution plan (such as a 401(k)),
a cash-value plan, or some other variation. Many companies don’t want to be both¬
ered with managing the plan themselves. Instead, they outsource the plan’s man¬
agement to one of several vendors who specialize in that area. Figure 2.1 illustrates
how such a vendor might configure its data center to manage this workload for
multiple client companies.
The data lives at the center of their operation. It includes the complete status of
pension accounts for each client’s employees, as well as detailed information about
Industrial-strength Software, It's Not a Science Project 21
each client’s pension plan rules and parameters. Controlling this data is a set of main¬
frame servers, each running relational-database and transaction-monitor software.
Within these servers runs the vendor’s core benefits applications, which handle every¬
thing from processing employee and retiree inquiries and investment change re¬
quests, to processing applications for retirement, to completing death-benefit
transactions. These servers also run a set of administrative applications that handle
functions such as trade settlement and reconciliation, proxy voting, and governmen¬
tal regulation compliance and reporting.
(PSTN) into the vendor’s Private Automatic Branch Exchange (PABX) switch. The
switch interfaces with Computer Telephony Integration (Cl I) servers running a set
of communications applications that interact with callers through either touchtone
or voice recognition technology, process the requested transaction, then relay re¬
sponses to the caller through a text-to-speech application.
If the IVR system cannot satisfy a caller’s request (or, if the caller indicates up
front that they wish to speak to a person), then the CTI server can transfer the call
to the next available Customer Service Representative (CSR) in the Call Center.
When the CSR’s phone rings, a screen on his desktop computer simultaneously dis¬
plays detailed information about the caller and the stage reached in the conversa¬
tion with the IVR system, so the CSR can pick up where the IVR left off. Additional
servers provide the CSRs with direct access to client employee data.
The final direct presentation channel is Web Self-service. Here, employees can
access their account directly from their Web browser, perform queries, initiate trans¬
actions, or request account summary reports to be generated and e-mailed to them.
Administration
Finally, the vendor’s own system administrators access the back-office systems
through terminal emulator sessions in order to monitor system status and initiate ad¬
ministrative batch jobs. Also, the vendor mirrors its critical data at a remote disaster-
recovery site.
Industrial-strength Characteristics
This example environment embodies the characteristics of production systems de¬
scribed earlier. It’s certainly large and heterogeneous. Self-service IVR and Web ap¬
plications require 24 hours a day, 7 days a week availability. Contracts between the
vendor and its clients would certainly specify response time and availability re¬
quirements, leading to SLAs for the vendor’s CIO and support staff. And disaster
recovery is integrated into the overall operation.
The environment is rife with what was described earlier as industrial-strength
software. The back-office systems alone are running large benefits applications that
manipulate huge amounts of critical data, and must interface with a variety of other
applications ranging from IVR, Call Center support, and Web Self-service to the
systems of mutual fund and other financial-service providers. These back-office
systems would likely be stressed constantly by hundreds to thousands of employees
from multiple client companies, requiring intricate serialization to keep all the par¬
allel activities from interfering with each other. Robustness and the need to run at
high levels of system stress for extended periods of time are a given. Of course, soft¬
ware supporting an environment this complex would need to evolve over time, re¬
quiring migration to new releases.
SUMMARY
25
The Development Process
In This Chapter
■ The challenging nature of software testing
■ Test phases and processes
■ Traditional software development models
■ Iterative and agile software development models
■ The pitfalls of skipping phases
T here’s a popular phrase among software professionals faced with tight bud¬
gets and impossible deadlines: “Cheap, fast, good: pick any two.” Most
testers instinctively focus on high quality, so if offered a vote they would
likely be forced to choose between cost and speed.
Let’s consider cost. One determining factor of a defect’s price is where in the
development cycle it is uncovered. The earlier it is found, the less it costs the proj¬
ect. A big piece of this expense is the number of people who get involved in the dis¬
covery and removal of the bug. If a developer finds it through his own private
testing prior to delivering the code to others, he’s the only one affected. If that
same problem slips through to later in the development cycle, it might require a
tester to uncover it, a debugger to diagnose it, the developer to provide a fix, a
builder to integrate the repaired code into the development stream, and a tester
27
28 Software Testing Techniques: Finding the Defects that Matter
again to validate the fix. And that doesn’t even include the time spent by a project
manager to track its status.
Johanna Rothman studies these costs and suggests formulas to help clarify the
actual costs associated with late-cycle defect discoveries [Rothman02]. You can plug
in your own estimate of a defect’s price into these formulas and confirm that it is in¬
deed more and more costly to fix problems as the project proceeds. In a perfect
world, all bugs would be found and removed by developers before their code sees the
light of day—or before those bugs are inserted into the code in the first place. How¬
ever, that’s not realistic, which is why the world needs professional testers.
Then there’s speed. Your customers probably want new solutions yesterday,
and if you’re in the business of selling software, then your management team is
probably in a hurry too. Various development models have been built to streamline
the code delivery process. They take different approaches to attempt to reconcile
speed and cost without sacrificing quality.
In this chapter, we’ll examine different test phases, their associated costs and lim¬
itations, and the classes of bugs each is best suited to extract. Then we’ll take a look at
several development models, with an emphasis on how each incorporates testing.
Different test phases target different types of software bugs, and no single phase is
adept at catching them all. That’s why we can’t squash every bug during the earli¬
est and cheapest phases of testing—because they aren’t all visible yet. Finding bugs
during a development cycle is like driving for the first time along a foggy road at
night. You can only see ahead a few feet at a time, but the further you go, the more
you discover—and if you drive too quickly, you might end up in a ditch.
Each test phase has its own associated limitations and costs. Software that as¬
pires to be customer-ready must cross through several. Let’s examine them.
testing. Regardless of the definitions, it’s more important to keep sight of what you
do and accomplish, rather than what category of test it falls under.
Scope
During unit test, the developer of an individual module or object tests all new and
changed paths in his code. This includes verifying its inputs and outputs, branches,
loops, subroutine inputs and function outputs, simple program-level recovery,
and diagnostics and traces. That’s the technical scope of unit test. A large software
project will involve many such programs. So, multiple programmers will be work¬
ing in parallel to write and unit test different modules that will likely be shipped
together.
In addition to single module testing, groups of developers can choose to work
together to integrate their unit-tested programs into logical components. This is
optional, but in a complex project it can serve as a useful precursor to the next test
phase. An integrated UT effort takes aim at uncovering simple module-to-module
integration failures. It verifies that when parameters are passed between routines,
the pieces communicate correctly and generate appropriate outputs. This allows the
development team an overall view of the combination of parts and whether they
will provide the promised function to the next test phase.
Environment
Unit testing is done primarily on a single native system. In some cases, new hard¬
ware may not be available to drive the unit test scenarios, or it is in short supply.
This often leads to the use of virtualized or emulated environments during unit test.
Such environments also have some unique advantages in terms of debugging capa¬
bilities that interest unit testers, whether or not they have a shortage of hardware.
These environments are discussed in detail in Chapter 16, “Testing with a Virtual
Computer.”
30 Software Testing Techniques: Finding the Defects that Matter
Limitations
A single major component of a complex software product usually consists of many
individual modules or objects, all working together to deliver that component’s
functions. Before those pieces come together, different developers will work on
them independently, and perhaps even on staggered schedules. The testing is also
largely isolated to a single module at a time. That isolation enables UT’s tight focus
on new and changed code paths, but at the same time is a major limitation. It pre¬
vents the tester from seeing how the code will behave in a realistic environment.
This weakness can be partially relieved through the integrated UT approach, but is
nevertheless a reality for most development teams.
This isolation leads to a related limitation of UT. Much of the component may
not be in place when the UT for any individual module is done. These holes in the
component create the need for scaffolding. Like the patchwork of temporary plat¬
forms that serve as a supporting framework around a building under construction,
scaffolding used during UT is a collection of bits of code that surround the compo¬
nent to prop it up during early testing. These bits of code, or stub routines, do little
more than receive invocations from programs within the component in question
and respond to them according to an expected protocol. In essence, they fool the
other program into believing it is actually interfacing with its peer. One real weak¬
ness of UT is that developers must rely on such program-to-program scaffolding
rather than being able to test their modules alongside their true counterparts within
the component.
Integration Test phase, because it’s the first time all of the pieces of a function are
integrated. However, that usage suggests product-wide integration, which obscures
a simple reality: for large, complex software projects the pieces are usually inte¬
grated and tested first on a function-by-function basis, rather than across the entire
product. That’s why we’ll use the more specific term FVT.
Scope
The function verification test team focuses on validating the features of an entire
function or component such as a storage manager, work dispatcher, or input/output
subsystem. The team looks at the function as a whole and validates its features and
services. The testers go after its mainline functions, internal and external interfaces,
operational limits, messages, crash codes, and module- and component-level recov¬
ery. FVT is focused on a white-box approach.
Environment
The FVT can be performed on native hardware platforms or virtualized and simu¬
lated environments. For very large projects where native environments are either
limited or very difficult to use in diagnosing problems, virtualized environments
can really shine. In Chapter 16 we will review, in much greater detail, how these vir¬
tualized environments provide many benefits to software development teams.
FVT can exploit these virtual environments to allow each tester the capability
to have his own virtual machine, without worrying too much about the cost of the
32 Software Testing Techniques: Finding the Defects that Matter
real thing. This capability allows the team to wreak havoc on the software without
impacting others. Cool concept!
Diagnosis of the kind of defects targeted by FVT can be challenging on native
hardware platforms. Not all such platforms have appropriate tracing and break¬
pointing capabilities to allow for instruction stepping or storage alteration without
creating a totally artificial test environment. Some virtualized environments are
specifically designed with these capabilities in mind, making it easy for testers to
create intricate scenarios and track down those tough problems.
Limitations
FVT’s limitations lie in a few areas. First, by definition, its scope is limited to a sin¬
gle function or component of the overall software package, rather than on the pack¬
age as a whole. This is by design. It enables the team to zoom in on lower-level
functions that later test stages can’t address, but it also can prevent the testers from
seeing the full-product picture.
FVT is also largely a single-user test. It doesn’t attempt to emulate hundreds or
thousands of simultaneous users, and therefore doesn’t find the kind of load-related
problems that alternative testing does.
If the team chooses to reap the benefits of virtualized environments, it must
also realize that such environments can, on rare occasions, mask certain types of
timing-related problems. For example, such environments can alter the order in
which hardware interrupts are surfaced to the software. The FVT team would then
have to rely on later test phases to catch such problems.
Scope
SVT tests the product as a whole, focusing on the software’s function, but at a
higher level than UT or FVT. It’s the first time testing moves beyond single-user
mode and into the more realistic realm of multiple, simultaneous users. Heavy
load and stress is introduced, with workloads and test drivers simulating thousands
of clients and requesters. SVT pushes the software to its limits and ensures that the
entire solution hangs together while exercising both old and new functions.
SVT views the software from the perspective of a customer seeing the whole
product for the first time. For example, the team validates that businesses will be
able to migrate from one version of a product to another smoothly and without dis¬
ruption. It also determines if multiple versions of the software can coexist and op¬
erate across several loosely coupled or clustered systems, if appropriate. And,
unfortunately, there are times when a customer’s software deployment does not go
as planned. The SVT team ensures that customers can fall back to the prior version
of the software. The testers also verify that the software can withstand harsh envi¬
ronmental conditions such as memory shortages and I/O saturation—and if a fail¬
ure does strike, that the software handles it gracefully.
Environment
On some computing platforms, SVT is where the software meets the real, native
hardware for the first time. UT and FVT may have been performed under emulation
or virtualization. In that case, SVT will help identify defects such as architectural dis¬
connects that may have occurred. But even if the earlier tests were done on native
hardware, the systems used during SVT are generally significantly bigger and faster,
as they must accommodate the requirements of load and stress workloads.
In addition to processors, the SVT team will attach lots of peripherals. These
might include large numbers of storage devices, networking routers, printers, scan¬
ners, and digital recording equipment.
SVT teams can sometimes exploit virtualized environments similar to those
used in FVT and UT when enough real hardware is not within the budget, or the
goals of the test can be achieved without it. However, the virtualized environment
34 Software Testing Techniques: Finding the Defects that Matter
in which SVT is performed needs to be robust enough to achieve its goals and
objectives.
Limitations
By design, SVT’s aim is limited to a single product. If that product will later part¬
ner with others to deliver a complete solution to customers, SVT s focus usually
isn’t broad enough to catch cross-product defects. At the other end of the scale, be¬
cause of SVT’s full-product view, it is limited in its ability to catch problems with
messages, commands, and other low-level interfaces that are the purview of FVT.
Because SVT is normally performed on native hardware, the tools available to
diagnose problems may be limited, as tools used on virtualized environments are
not available. This can also be considered a strength, however, since the SVT team
must then rely on the same debugging tools that their customers do, such as log or
trace files and memory dumps. As a result, they may discover defects or weaknesses
in the tools themselves.
In cases where SVT is performed in virtualized or simulated environments, the
amount of time the software package spends on real hardware before customer use
is limited. This is a risk that the project leaders must consider before deploying the
software to customer sites.
Scope
The PVT team’s objective is to identify performance strengths and weaknesses. The
team designs measurements that target performance-sensitive areas identified dur¬
ing the software’s design stage or by customers executing within existing environ¬
ments. Measurements are performed, results are documented, and a detailed
analysis is conducted. Conclusions drawn from the analysis may suggest the need
for further measurements, and the cycle repeats.
PVT may also focus on how the software compares to industry benchmarks.
I his work also tends to iterate between analysis and measurement cycles, until all
teams are satisfied.
The PVT team searches for bottlenecks that limit the software’s response time and
throughput. These problems usually lie in excessive code path length of key func¬
tions. They may also be related to how the software interacts with a specific hard¬
ware platform. This is particularly true with some industry benchmarks. For
example, the software may run well on a processor with a broad and shared mem¬
ory cache, and poorly on one where the cache is deep but narrow, or vice versa.
Environment
Real performance studies are measured on native hardware. Any studies performed
on virtualized platforms are primarily targeted at studying distinct path lengths.
Limitations
One issue associated with true performance analysis and measurement work is that
the software and hardware need to be stable before formal performance studies can
be completed. If the slightest error pops up while a measurement is being taken, it
can force extra processing and thereby invalidate the results. So a close relationship
between the SVT and PVT teams is critical to the success of the performance test.
The stress driven by SVT creates the foundation for a successful PVT.
costly and throw a real monkey wrench into what you thought was a project that
was winding down.
But one efficiency offered by PVT is the use of simulated or virtualized envi¬
ronments to help review basic path length. Combined with skills in real-time mon¬
itoring and workload tweaking, performance measurement specialists can uncover
major path-length problems early, giving development ample time to provide fixes.
Removing Bottlenecks
Functional bottlenecks are those unintended behaviors in the software that act as
choke points for throughput. For example, all processing might get backed up be¬
hind a slow search algorithm. Note that functional bottlenecks are different from
physical ones, such as memory or disk exhaustion. Heavy workloads help bring
these functional bottlenecks to light. This is the first place where load and stress fac¬
tors into the performance measurement and analysis portion of the process. The
stress workload must be smooth and repeatable since, as with any scientific experi¬
ment, it’s through repeatability that the performance team gains confidence in the
results.
The Development Process 37
Once the bottlenecks are removed, the performance test team sets out to determine
the maximum value to place on the software’s speedometer. You’ll find many in¬
dustry benchmarks for performance that help measure some widely available
classes of computing such as Web serving or transaction processing. A variety of
benchmarks are maintained by consortiums such as TPC (Transaction Processing
Performance Council) [TPC04] and SPEC (Standard Performance Evaluation Cor¬
poration) [SPEC04]. In addition, computer manufacturers often establish their
own performance measurement workloads to drive their individualized compar¬
isons. These workloads and benchmarks are used to determine the maximum
throughput for a specific hardware and software combination. As stated before, re¬
peatability is important, but so are consistent environments. You have to make sure
that you are comparing apples to apples and not apples to kumquats.
Integration Test
So far, the test teams have done a lot of work to stabilize the software. The new code
has handled heavy workloads from thousands of users and performs like a streak of
lightning. But if a customer can’t integrate it with their other software solutions,
run it all day, all night, and all year, then the job isn’t done. That’s when its time for
integration test.
38 Software Testing Techniques: Finding the Defects that Matter
Scope
While some software products lead isolated lives, most must work with others
to provide a complete solution to customer problems, as we saw in Chapter 2,
“Industrial-Strenght Software, It’s Not a Science Project.” Integration test looks at
that entire solution. It moves beyond the single-product domain of SVT and inte¬
grates the new software into a simulated customer environment. Integration test,
also known as Acceptance Test, takes the big picture approach where the new soft¬
ware is merely one of many elements in the environment—just as if it had been
thrown into live production.
How necessary this phase is depends on the software and the breadth of its ex¬
pected interactions, dependencies, and exploiters. It’s not a test that every product
undergoes, and in practice is rarely used for new, immature products. But experi¬
ence has shown that the more mature a test organization becomes, the more likely it
is to have learned (perhaps the hard way) the value of performing an integration test.
This test is often done in parallel with beta test (early customer testing). It
sometimes even continues after general availability of the product. The goal there
is to stay one step ahead of your customers in exercising extremely complex envi¬
ronments, improving usability, and exploring other “soft” aspects of the support.
Environment
Configurations for your integration test efforts are as varied as your customers.
They are comprised of banks of servers, networks, and storage devices, all with the
The Development Process 39
goal ot driving scenarios and workload in a way similar to that of businesses and or¬
ganizations.
To illustrate, IBM's mainframe platform supports a nonstop testing environ¬
ment that provides the first real customer-like configuration into which the newest
products are introduced. The testing here operates with many objectives, including
achieving service level agreements, maintaining availability targets, migrating to
new levels of software in a controlled fashion, and providing a complex configura¬
tion on which to execute heavy stress and inject failures against the platform. The
software is then exercised with resource managers, networking products, and ap¬
plications as part of an end-to-end solution to a business problem [Loveland02].
Limitations
Because integration test takes such a broad view, its success relies on earlier test
phases extracting lower-level bugs and significant stability problems. If the integra¬
tion test team spends all of its time fighting mainline functional problems, it will
never get to the interoperability defects the team was put in place to find.
Also, while the integration test team attempts to achieve a customer-like envi¬
ronment, it can’t possibly be all-inclusive. It must, however, aim to be representa¬
tive. The team’s effectiveness will be limited by the quantity and quality of customer
information at its disposal. Talking directly with customers to better understand
how their configurations are built and how they choose to use software packages to
solve their business problems can help the integration test team build an environ¬
ment that is an amalgamation of many customers. See Chapter 5, “Where to Start?
Snooping for Information,” for techniques on how to better understand your cus¬
tomers.
Service Test
Even after you perform a wide variety of tests against the software, inevitably and
unfortunately, defects will escape and be discovered by customers. Fixes to software
defects need the same attention as a new function. Once the defect is identified, a
streamlined process similar to that of the product’s original development should be
used to design, develop, and test the fix. The same rigor is required because a cus¬
tomer’s expectations don’t relax. In fact, the customer expects more after some¬
thing severe has occurred that has disrupted his business.
Scope
While new releases of software are being developed, the primary test activity that
keeps watch over existing software is service test. Its fundamental objective is to test
software fixes, both individually and bundled together. It should look not only at
the fixes themselves, but also ensure those fixes don’t have side effects that interfere
with other areas of the software. It is required at the UT, FVT, and SVT levels. Some
organizations may decide to omit one or more of the primary phases we have dis¬
cussed here, but such a decision should only be made after careful consideration.
A typical flow goes like this: individual fixes are validated by a UT and/or FVT
team. Then they are grouped into a fix pack or bundle of service. The bundle is fed
into a comprehensive service test environment, which may or may not be the same
physical environment as the product’s SVT. Here the SVT team (or a separate service
test team) will run all the test scenarios and workloads they can to ensure that no sin¬
gle fix or combination of fixes causes the software to regress. If a particular fix or fix
pack requires targeted, system-level testing under heavy load and stress conditions,
the development, FVT, and SVT teams will design a special test as the primary plan of
attack. Service test is the customer’s day-to-day ally in maintaining system stability.
Environment
The environment for service test can vary. It can exist on simulated or virtualized
platforms, especially for UT and FVT, so they can more easily ensure a bug has been
fixed. However, more comprehensive platform environments are needed to vali¬
date maintenance at the SVT or integration test level. As a result, you’ll sometimes
find service test teams operating with large, complex configurations that mimic
those of their largest customers.
The Development Process 41
Limitations
The amount of testing that the service test team can perform is often limited by time
constraints. This comes down to speed versus thoroughness. There’s the speed re¬
quired because your company wants to get a new software product out in the mar¬
ketplace. There’s an entirely different degree of speed needed when a customer’s
business is stopped dead and they demand a fix. That intense need for speed in turn¬
ing around fixes often forces compromises in test coverage, particularly if the vol¬
ume of fixes is high.
Beta Test
At some point, usually part way through SVT or integration test, project and test
management start to consider passing their software jewel along to real customers.
This is referred to as a beta test. It’s also known as a first customer ship or an early
support program. Whatever you call it, it’s essentially another test phase from
which you can benefit.
Scope
All of the testing discussed so far has had its unique contribution to validating the
software. So what is gained by giving a select set of customers the code early?
No matter how realistic you try to make your test environment, ultimately, it
is still artificial. Beta customers deploy the software into real environments, with all
of the associated variability and complexity. Because each customer is different,
beta testing broadens the range of environments and experiences to which the soft¬
ware is subjected, leading it down new paths and revealing fresh bugs.
Your beta customers will also become essential, outside critics. Each customer’s
unique perspective on the software’s impact to its business during migration, de¬
ployment, and use serves as valuable input to the development team in assessing its
readiness for general availability.
Environment
Every beta environment will be unique—that’s the beauty of this test phase. Each
customer will generate different, new defects based on their configuration.
Limitations
An obvious limitation of the beta is that it can’t cover every possible environment.
An additional limitation may be the amount of time that you allow your beta cus¬
tomers to run with the code before you determine it is ready to ship to everyone.
Some early support programs depend on customers installing and running the soft¬
ware in a production environment in addition to their test environments. If the
time allotted for beta is too short, it can limit this possibility.
phases can be applied to a variety of development models. They can be your and
your customers’ ally in the quest for robust, reliable software. Let’s examine these
development models and discuss their pluses and minuses.
If you look across the industry, it seems there is always a new type of software de¬
velopment model arising. Whether it’s a sophisticated software product imple¬
mented by multibillion dollar corporations or a summer project created by
tenth-graders, some type of model will be used. Let’s explore the landscape for
models used today.
Waterfall Model
One of the classic software development models is the waterfall. It is a linear and se¬
quential approach that depends on the prior phase completing before the next be¬
gins. The waterfall software development model offers a very rigorous and strict
approach for creating software. Key to the effectiveness of the waterfall model is
clear, concise documentation created from one phase to the next. Figure 3.1 gives
a basic representation of this model.
Customer and
Business
Requirements
Specification
Development
Design
Development
Testing
Phases
Customer
Implementation
Maintenance
Each phase contributes key deliverables to the next. Checkpoint reviews are
conducted throughout to assess progress against entry and exit criteria and deter¬
mine readiness for the next phase.
If we expand the “testing phases” block of the waterfall model from Figure 3.1,
you can see that it looks like a small echo of the waterfall (see Figure 3.2). Unit test,
function verification test, system and performance verification test, and integration
test flow in a similar fashion to the overall model. Service (or maintenance) test is
appropriately tied to the Maintenance block of the larger model.
Unit
Test
Customer and
kr Function
Verification
Business Test
Requirements
System &
Performance
Specification
Development
\ Verification
Design
\ Integration
Test
Development
Testing
Phases Service
Test
Customer
Implementation
Maintenance
The waterfall model requires that each phase is entirely completed before the
next test phase begins. Its strength lies in its thoroughness and rigor, helping to pro¬
duce extremely robust code. The most significant weakness of this model, however,
is its slowness to market because of the dependency between test phases. Since each
of the phases of the model likely will not run as expected up front, it is reasonable
to expect that problems will arise in finishing a phase and allowing the test to move
forward. This is one of the issues that the waterfall model users encounter when im¬
plementing it. In fact, it’s what makes the task of implementing a pure, classic wa¬
terfall model very difficult. What can we do to address this?
The Development Process 45
Waterwheel Model
Time-to-market pressures demand a different approach to developing software.
Within the development stage of the classic waterfall model, the entire pool of code
is delivered before the test phases begin. This is usually overkill. It delays the over¬
all development cycle and squelches the opportunity for quick market penetration.
Hence, the need for what we dub the waterwheel model. This model addresses this
weakness ot the waterfall model while retaining its strengths. It keeps the upfront
activities of gathering requirements, developing a specification, and designing the
product. However, the development and testing phases are staged in the waterwheel
model. By shipping code in logical groupings, testing can proceed with one chunk
of code while the next chunk is being created by development. Another benefit of
the model is that it fosters continuous feedback between the development and test
teams, enabling ongoing adjustments to the software’s design.
require more “filtration” up in the development stream before they are ready for
the waterwheel to deliver them back into the testing pool. After the final delivery of
code has been made to FVT, development will continue to deal with defects arriv¬
ing from FVT and all the other test teams that will “stir up” the code throughout
the product cycle. The waterwheel will spin until all of the defects have been re¬
solved and the pool of code is clean and clear. Figure 3.4 provides a visual repre¬
sentation of the waterwheel model.
Soft^&fo
Defect
Fixes
Design
Changes
UJ VQ
product—and that’s hard to accomplish with only pieces of function. Also, with
both the FVT and SVT teams playing in the same pool, they’re likely to bump into
each other frequently—in other words, they will waste time and effort tripping
over some of the same bugs. However, in excessively large and complex projects, a
staged SVT is usually needed to meet the time-to-market goal.
The initial drop of code into SVT would likely be run through comprehensive
regression testing by a few members of SVT entering the pool. Or, this early testing
might focus on system or application initialization, exercising parameters and set¬
tings. Additional deliverables to SVT would follow this initial one. If the software
has several major functions with little or no operational intersection, they can be
delivered at different times. Not optimal, but it’s been done before and in some
cases it can work pretty well. Consider this to be the rest of the SVT team diving
into the water. Eventually, the last stragglers from the FVT team will tire of the
water and climb out, leaving only SVT in the code pool.
Parallel development streams can also be maintained to avoid disruption to the
test environment in the event that a major function is delivered to SVT late. In this
case, this late-arriving function can be kept in a separate code stream until it has
48 Software Testing Techniques: Finding the Defects that Matter
achieved stability, at which point the two streams can be merged and SVT per¬
formed on this final package. This is akin to two different streams of water feeding
into two different waterwheels, each with its own set of testers playing in separate
pools down below, with a dike between them. When the SVT leader gives the nod,
the build team pokes a hole in the dike and the two pools can finally intermingle.
The classic waterfall software development model is a well-known approach.
Morphed into the waterwheel method of staging testable entities, it gives the de¬
velopment and test teams more flexibility and can significantly reduce time to mar¬
ket. The waterwheel model is a proven method that has delivered strong results on
many industrial-strength projects.
Common Elements
Both the waterfall and waterwheel models require careful project management
focus and a strong build process. Let’s briefly examine each.
between the various components and functions. Ultimately, development and test
need to agree on a schedule that both can live with.
When the development team delivers all of its planned code, the test teams still
need a mechanism to receive fixes to defects. Because of this the overall build plan
must include these deliverables in the schedule. Fixes to problems that aren’t too se¬
vere will generally be included in these drops. For severe bugs, some software de¬
velopment processes also give the teams a way to provide fixes quickly, without
waiting for the next planned code deliverable.
The product build schedule is a key driver behind the software development
process in the models we have discussed thus far. As the schedule nears completion,
the install team will take all of the final deliverables and build an official software
product. A final regression test will be performed, and the package will be ready to
deliver to customers.
Industry’s need to move in many different directions and meet time-to-market chal¬
lenges forces the software community to make intelligent shortcuts to meet cus¬
tomer needs. Iterative software development models help that effort. Let’s contrast
these models with the waterwheel model and examine the role of testing in each.
Agile Processes
Agile software development processes drive for working code with a focus on rapid
production [Turk02]. Their base selling point is to address the primary issue many
professionals have with the traditional waterfall model—time to market. The ob¬
jective of agile development processes is to deliver software early and often, making
it an iterative development model. Some Agile “test-driven development” tech¬
niques consider development as the actual customer of testing. Unit tests are de¬
vised and created before the code that will actually implement the desired function
is completed. The intent is to cause the program to fail and then fix it.
Waterfall and waterwheel software models depend on crisp technical require¬
ments defined at the beginning of the project and the expectation that new, in¬
coming requirements will be few and far between. That’s not necessarily reality. The
strength of Agile development processes is their ability to handle changing require¬
ments throughout all development and test phases. Finding a development process
that addresses the problem of constant and changing requirements is a treasure for
any software team. So can this model work for all projects?
According to Turk and his associates, the Agile model works very well for small
teams in close proximity. This belief is supported by Williams and Cockburn, citing
50 Software Testing Techniques: Finding the Defects that Matter
Agile researchers who found that Agile is best suited for collocated teams of 50
people or fewer [Williams03]. This is a significant limitation of Agile models, since
many software development teams are larger than this.
The Agile process is also dependent on close collaboration between developers
and eventual users of the technology. In fact, it is suggested they physically reside
near one another. However, it’s not unusual for large development and test teams
to span locations, and often they cannot possibly reside within the same location as
their users. In such situations, change control is a critical means for ensuring that
everyone stays in sync and all the pieces of code mesh smoothly. However, change
control appears to be less of a focus in the Agile models.
Since the Agile model is all about time to market, code is delivered with in¬
credible speed. Coders are coding as fast and as efficiently as they can, which is a
strength of Agile models. However, there is also a hidden weakness. When coding
furiously, there will be times when you don’t have time to stop and look around at
the solutions your colleagues have come up with. It’s likely they have written some¬
thing that can be reused to help fix your problem. Don’t let the speed of coding get
in your way of reuse [Turk02]. In large projects where the waterfall or waterwheel
model is used, you have more time to plan for reuse. That way, the software routine
only has to be coded and tested once and then can be used many times. It also re¬
duces the possibility of the same function being implemented multiple times, which
could result in unusual symptoms from the same problem. So when following an
Agile development model, be careful not to ignore opportunities for reuse. Now
we’ll take a look at one of the most popular Agile models.
ment and test across all phases fosters the feedback that is crucial to the success of
the XP model.
XP also makes extensive use of the test-driven development feature of Agile
software development practices. Developers code or build their test cases for unit
test before they begin coding the actual software that will be tested. This is intended
to prepare the developer to cover all issues in the code up front. This approach cer¬
tainly has merit, in that the developer will be constantly considering the tests his
code will have to pass. The tests are accumulated and then each drop of code (from
any programmer) must pass all of these tests. Progress in test-driven development
must be carefully monitored. But who identifies the tests that the developer does¬
n't consider? The XP model seems to address this question through the customer
feedback mechanism.
Traditional software models rely on there being completely different people to
test and assess the quality of the code in FVT, SVT, and other phases objectively. XP
and other Agile models instead depend on the developers of other pieces of the
overall software function to become familiar with their colleague’s code so that
they can point out issues and concerns. This can only be achieved through cooper¬
ative code ownership. Certainly, this approach will provide some amount of objec¬
tivity in the product cycle. But, does it offer the same amount as an independent
FVT team? As a software project is being planned, its leaders need to consider these
issues when examining the requirements.
Another difference between XP and waterfall models is the level of reliance on
documentation. The traditional models must maintain requirements and design
documentation for use in later phases, particularly testing phases. XP doesn’t at¬
tempt to provide that level of documentation, because the focus is on providing
production-ready software as quickly as possible. Unfortunately, the lack of docu¬
mentation maintained during an XP development process has a potential down¬
stream disadvantage after it has been completed and people have moved on to
other work. If there is a problem and someone needs to revisit the original project,
the speed at which new or “once familiar” programmers can get back on board is
slowed. By maintaining design and implementation documentation, traditional
models have an advantage in this kind of situation. In fact, testers can pick up this
documentation and use it to help them devise new test scenarios. Part of that effort
may include updating existing tools and test cases in order to be able to test the new
function. Under the XP model, this would be difficult at best.
XP’s continuous integration practice helps address one of the challenges of the
classic waterfall model. In fact, the practice is comparable to that used in the
waterwheel. In both models, the iterative code and defect fix updates provide a way
to address problems quickly so that testing progress is not delayed.
Agile processes such as eXtreme Programming are best suited to straightfor¬
ward software development projects. It is unclear whether these models address all
52 Software Testing Techniques: Finding the Defects that Matter
Spiral Model
The spiral development model, another type of iterative model, also attempts to ad¬
dress the shortcomings of the classic waterfall model. Even though this is not tech¬
nically an Agile model, software engineering guru Barry Boehm’s proposed
development model has given the industry a different perspective on software de¬
velopment practices. The distinguishing characteristic of this model is the intro¬
duction and use of risk assessment and analysis throughout the development
process. Figure 3.5 shows Boehm’s Spiral Development Model.
cummulattveA
COST
COMMITMENT
PARYtltOI
REVIEW
DEVELOP, VERIFY
NEXT LEVEL PROOUCT
Boehm’s research asserts that the classic waterfall model does not lend itself to
some classes of end-user application software solutions. Boehm claims that up¬
front requirements are extremely difficult to finalize for this type of software be¬
cause the design is subjective in nature and users are not able to foresee the final
product easily.
The spiral model’s four primary processes, represented by quadrants in Figure 3.5,
are objective definition, risk analysis, product development and engineering, and
next phase planning. These four activities drive the model and are fundamental to its
effectiveness. With each revolution of the model, the four processes are performed.
Through prototyping, requirements definition, design, and implementation, each
revolution independently examines the objectives, risks, implementation, and plan¬
ning of the phase that follows.
Objectives
Objectives definition and development are an essential part of any software devel¬
opment phase. The spiral concept of regularly reviewing the objectives as the activ¬
ity type changes provides a continuous focus on ensuring that the project is moving
in the desired direction.
All of the activities in the spiral model and all of the phases of the waterfall
model look at changing objectives as time moves on. No matter which model you
choose, it’s important to constantly ask yourself whether you are still in line with
the project’s objectives.
Risks
Boehm’s approach is risk-driven rather than document- or code-driven processes
[Boehm88]. This risk-driven analysis is intended to provide a decision point for de¬
termining whether the software should proceed to the next phase of work, stay in
this phase and continue efforts, or completely terminate the project. The waterfall
model lacks this kind of risk review and management, as the only place it can logi¬
cally occur is between major phases. However, as discussed earlier, the waterwheel
model does provide natural risk-assessment checkpoints—through iterative func¬
tional code and fix deliverables.
The spiral model supports prototype and risk assessment activities at each
phase of the process to address problems such as funding, requirement updates, or
hardware issues. The project team will evaluate the risks at each revolution of the
spiral so that any improvements can be made to enhance the software or the proj¬
ect itself. The issues identified in the analysis can inspire the team to develop dif¬
ferent ways to attack the problems. For example, maybe a prototype approach for
54 Software Testing Techniques: Finding the Defects that Matter
development was being used but it was not providing the quality needed to make
progress. To help address the quality problem, the team might seriously consider
other, scaled-down development models that could be used against this piece of the
software.
In addition, the risks identified could indicate a need for a focused testing ef¬
fort in a particularly soft area. The team will identify the actions and tests (for ex¬
ample, parts of UT, FVT, or SVT) to be targeted against the problematic software.
Through iterative analysis, the development and test teams will constantly assess if
this focused testing is successful in meeting the quality objectives and the time-to-
market challenges. This kind of scenario demonstrates the benefit of the spiral
model.
Product Development
The function of the model’s product development quadrant is to validate require¬
ments, create deliverables, and verify the next level or iteration of the product. The
spiral model is distinct because it assumes a relatively short amount of time within
this iteration of product development. On the other end of the spectrum, the wa¬
terfall and waterwheel models assume that the engineering or product development
phases tend to be quite long. In general, the duration of the phases is a function of
the size of the software project itself. This supports Boehm’s conclusion that the
spiral model best supports the creation of short, end-user software projects.
[Boehm88]. The waterwheel model has proven itself to be successful for large and
complex software projects.
It is in the product development phase of the spiral model where there is the
greatest intersection with test activities. The spiral model’s staged test phases are
quite similar to those of traditional software development models. They include
unit test, integration and test (comparable to FVT), acceptance test (comparable to
beta or early-release testing), and implementation or product deployment.
The integration and test phase of the spiral model seems to include the process
content of the waterfall model phases of FVT, SVT, and integration test. It appears
that the entire concept behind the spiral model, evaluating risks along the way,
could be applied specifically in the integration and test phase of the spiral. Boehm’s
work does not seem to discuss in detail how risk assessment and analysis is per¬
formed against only a portion of the revolution (i.e., integration and test).
As discussed earlier, the waterfall model has an overall planning stage to create
the project’s end-to-end schedule and define individual planning tasks for each
phase as the prior phase concludes. In this planning phase, the team reviews readi¬
ness criteria (both technical and project-oriented) for the upcoming phase. In
essence, the spiral model’s planning phases are embedded in the traditional phases
of the waterfall model.
Evolutionary Model
The Evolutionary Software Development model is a combinational model. It ex¬
ploits the basics ot the waterfall model by flowing from one phase to the next. It also
makes use of a feedback loop to ensure that any required improvements to the
product or process are made as the project proceeds.
May and Zimmer outline their use of the evolutionary development model
(or, as they refer to it, EVO) as a set of smaller iterative development cycles. They
point out that breaking the implementation phase of the software development
process into smaller pieces allows for better risk analysis and mitigation. [May96].
An often-cited shortcoming of the traditional waterfall model is the lack of a nat¬
ural feedback loop. As May and Zimmer explain in their research, the waterfall’s pri¬
mary source of feedback is its large test phases—after development has completed.
Not so with the evolutionary model, which has imbedded feedback loops within the
small waterfall cycles. In the EVO model, the smaller cycles tend to last two to four
weeks. Feedback from the prior cycle is evaluated during the execution of the next
cycle, and can be critical to the ultimate success of the project. The short cycles in¬
clude all aspects of design, code, and initial testing of a new version of software.
In large, complex software projects where chunks of code take weeks to de¬
velop, cycles as short as these are not as applicable. However, the concept of having
miniature “development cycles” within the overall development process is compa¬
rable to the waterwheel model discussed earlier because the waterwheel is iteratively
delivering functions in code drops, each of which goes through design, develop¬
ment, and test prior to delivery to beta customers, along with additional, post-de-
livery testing. The difference is that within EVO, interim versions of the product are
developed and then provided to real customers as beta code for their feedback. The
customers are heavily relied on as testers for the product in the EVO model. The
waterwheel model stresses more internal deliveries of code to professional testers
who look at the product from both a clear-box and black-box perspective. In other
words, the EVO model’s minidevelopment cycles are end-to-end in nature (quickly
going through development, internal testing, customer delivery and test), whereas
the waterwheel is a single development cycle, with an iterative, cumulative code de¬
livery loop between the development and test teams.
56 Software Testing Techniques: Finding the Defects that Matter
Our discussion of feedback so far has mostly focused on defects and product
improvements. But don’t lose sight of how the development and test processes can
improve from these iterations as well. As each code drop is iteratively developed,
you should take into account all of the process issues that were encountered in the
prior iteration. For example, assume that during the prior iteration, the team did
not hold code inspections. As this iteration of software is developed, you can learn
from that error and ensure that all appropriate parties are represented in a code re¬
view with the development team prior to the code’s integration into the software
bundle. These kinds of process improvements, made during the smaller develop¬
ment cycles of the EVO model, can help improve the overall quality of the final
product.
As we’ve seen, the evolutionary software development model has many of the
same characteristics as the waterwheel model. Its scale is different, but its concepts
apply to many types of software development projects.
Experimentation Points
This method utilizes a white-box testing technique to ensure the system’s key algo¬
rithms are operating properly. The approach is simple. It starts by defining a set of
experimentation points. These points are places where the algorithms make signif¬
icant decisions that drive the software behaviors. Next, those experiments are per¬
formed and data on the results is gathered. The results are analyzed, the algorithms
are adjusted, and the experimentation cycle starts all over again. The continuous it¬
erations allow the team to gradually hone the algorithms until they are optimized.
Many times, these algorithms are difficult to drive in the desired manner. Get¬
ting the correct mix of characteristics in the workload is part of the challenge, and
knowing the software’s algorithms is the key. This is the white-box aspect of this
model. What is extracted from the white-box analysis helps define the workload. In
some cases, this requires brand-new workload development though, in other cases,
only adjustments to existing workloads are needed. The workload execution is what
provides AVT with an environment upon which to tune the algorithms.
The Development Process 57
Data Analysis
The development team needs to determine how closely the algorithms are tracking
to the desired result. As a result, data analysis must be done in order for adjust¬
ments to be made to the code. Of course, in order for there to be data to analyze,
the software being evaluated must provide information in a way that can be post-
processed. Special tools may need to be purchased or developed in order to massage
that data into a form that’s useful for analysis. Once the data is analyzed and the de¬
velopment team modifies the code for the next iteration, the team performing the
AVT experiments takes the software out for another spin. Chapter 15, “Test Exe¬
cution,” delves further into the process of these iterations.
Reactionary Approach
Software development models tend to be very well planned from the start regard¬
less of the technique chosen. Right from the beginning, you know what you want
to do and how you want each phase to flow into the next. Unfortunately, in the
business world, things don’t always go as planned. So when things fall apart and the
plan doesn’t cut it, a reactionary iterative approach can be initiated.
Let’s review. The project starts out nicely and is progressing rather normally.
Slowly but surely, problems crop up and it is soon clear that the planned develop¬
ment model is starting to collapse. In seemingly no time at all, the impact of the
problems with the software becomes so severe that no progress can be made in test¬
ing or, worse yet, the product doesn’t work at all. The development team provides
fixes to no avail. You get a little further, then quickly hit the same wall—test
blocked, product broken.
58 Software Testing Techniques: Finding the Defects that Matter
As the project end date rapidly approaches, this continuous state of blocked
progress demands that you review what you are doing and how you are doing it.
Since extending the deadline isn’t an option, how can this problem be resolved?
A One-team Approach
Quickly, the developers and testers need to get together and outline the tasks down
to a weekly and, in some instances daily, schedule. The team needs to get a handle
on how to get the code back on track and working again. In some cases, it might not
be a code issue. Instead, the test team may have fallen behind schedule, and needs
help to catch up.
Early daily meetings can help set priorities and goals for the day and help each
group to address its current set of issues. When code is ready, the test team needs
to install it promptly and run a comprehensive test suite to ensure that it’s whole.
It’s fine for some things to fail, but the base functionality had better hold together.
If not, all hands on deck . . . again. This time, developer sitting next to tester is the
remedy. Test, identify problems, patch the code, and move on. This iterative ap¬
proach, even though it’s out of necessity, can accelerate your time to market. This
means development will start coding immediately and, as code is ready, deliver it to
FVT. FVT and SVT can work simultaneously—parallelize. While there are cer¬
tainly risks with this method, sometimes a team does not have a choice.
This reaction to catastrophe is effective at getting the software back on track.
But what happened to the code that the developers should have been writing when
they were working so closely with the testers to get stability back? Those functions
are still expected to be delivered on time. To handle that, we highly recommend re¬
viewing the work remaining and developing a thorough, proven approach for the
development and test and not reverting to a reactionary approach again. The reac¬
tionary approach is certainly not recommended as a model to use to kick off your
project, but it could be just the thing you need to salvage a disaster.
As the testing process matures, software teams sometimes become confident that
they can take some shortcuts to get the product out the door more quickly. They
seem to lose sight of their prize—stable software that satisfies their customers.
1 hey consider skipping portions or entire phases of testing. When they actually
follow through with the idea, a subset of the class of defects targeted by the skipped
phase will no doubt escape to customers. Those that don’t escape will impact the
phases that follow, so much so that they may put the software’s ship date in jeopardy.
It Isn't Hopscotch
When you were a kid and played hopscotch, you may have hopped around so much
that you ended up with blisters. Well, if you skip test phases, you will end up with
something much worse than blisters. You’ll have dissatisfied customers who are
60 Software Testing Techniques: Finding the Defects that Matter
discovering the problems you missed, or are still waiting for your product to arrive.
Therefore, think before you jump!
SUMMARY
In This Chapter
■ Testing your own code
■ Establishing your credibility
61
62 Software Testing Techniques: Finding the Defects that Matter
First and foremost, experience suggests it’s not a good idea for a person to be solely
responsible for testing his own code. This is especially true when developing mission-
critical software. The importance of the software demands that it is sanitized by mul¬
tiple objective and unbiased test engineers, each with a different focus.
It certainly is normal for software developers to perform unit test (UT) of their
own code. Verification of code at the unit test level benefits directly from the de¬
veloper’s familiarity with it. The program’s design was based on a specific set of
input, an expected set of output, and targeted functions. The developer’s job is to
ensure the program produces the desired results based on the input provided in a
given environment. It is safe to assume that developers will attempt to address all
of the use cases they envision. There is no other person within the development
process who would have a better command of the program than the person who
took the design and made it real. The software developer’s responsibility is simply
to obtain a clean compile of the program and make it work.
As previously discussed, UT is first an individual effort and then an integrated
effort with other developers to validate the initial creation or modification of a
component or function. Some software developers believe that a clean compile of
the program completes UT. This approach falls short from a quality perspective. An
explicit, individualized set of UT actions lets the developer confirm that the sim¬
plest mistakes are handled.
The integrated UT effort described in Chapter 3, “The Development Process,”
is targeted at uncovering the basic module-to-module integration failures. The con¬
solidated knowledge of the group of developers performing the integrated test helps
bring the first degree of objectivity to the entire assembly of the code. The diverse
skills and personalities of the people integrating the code provide an improved level
of review of the whole function or component.
work is sometimes termed, “happy path testing.” At the conclusion of the happy
path test, the developer declares the software ready to ship.
A seasoned tester may innocently (or not so innocently) try things the devel¬
oper didn’t anticipate. That’s one reason why there must be a clear line dividing test
and development at the beginning of the FVT phase.
skill and bug-busting passion. In addition, most large open source projects have
their share of dedicated beta testers.
It is clear then that the open source community thrives not only on people who
enjoy writing software, but also on those who enjoy poking holes in it—and for any
particular piece of software, a natural separation is maintained between those
groups.
What! Are you kidding? That’s not a big deal,” snaps the developer, his face
pinched into a scowl. “Don’t bother me with these types of problems.”
After this, the tester picks up his crushed ego, heads back to his office, and
opens his first real FVT defect. It may not be an earth-shattering problem with sig¬
nificant customer impact, but the success nonetheless gives the tester the drive to
continue finding defects and, more importantly, finding bugs that matter. The in¬
sensitive developer described showed little or no diplomacy or compassion, but did
help the tester to thicken his skin.
As the tester discovers numerous and more complex defects, he becomes more
confident in his ability to uncover the problems with the biggest impact on cus¬
tomers. Over time, testers develop a rapport with developers and, instead of a con¬
frontation, they will engage in a discussion about a flaw and a proposed solution.
Even veteran testers get push back from development—as they should. It is this
professional tension that fosters the discussion of real issues and comprehensive so¬
lutions to a wide range of code and design concerns. In the software development
industry, managed conflict results in superior quality.
How do testers and developers build the rapport required to arrive at an ami¬
cable and comprehensive solution to problems? Testers can take a variety of steps
toward that end. Perhaps the most effective way to build that rapport is by estab¬
lishing your credibility.
Building Credibility
One of the things that testers must do to build a valuable and respectful relationship
with development is to establish technical credibility. In many organizations, the
development team is led by experienced designers and programmers who have
built complex software solutions and received positive feedback from their cus¬
tomers. This feedback goes a long way toward building the developer’s reputation,
confidence, and attitude—the same traits the tester will battle. By enhancing their
own technical credibility, testers will be on equal footing with developers and can
build a mutually beneficial relationship.
The answer to how to grow your technical credibility is twofold. First, get a
broad view of how the software you are involved with works, both externally and
internally. For massive, mission-critical software packages, this is no small under¬
taking. Second, dig deeply into one component of the software to understand it
thoroughly. Once you’ve achieved indepth knowledge in one area, you’ll find it be¬
comes easier to branch out and extend your low-level knowledge in other areas of
the software as well.
The combination of technical breadth and depth is a powerful one. An expansive
understanding of the entire software package will give you perspective. It will enable
you to draw inferences from system behaviors that a developer who is narrowly
66 Software Testing Techniques: Finding the Defects that Matter
focused on a single component might miss. On the other hand, insight into one com¬
ponent will put you on par with its developers. You’ll speak their language, under¬
stand their issues, and be able to contribute in meaningful ways to problems they are
trying to solve. Any developer would consider such a tester to be a valued partner. To¬
gether, these two dimensions create a multiplier effect on your technical credibility.
Let’s take a look at some ways you can achieve that first step and get a broad
technical understanding of complex software. Then, in the next section of the book,
we’ll look at approaches for diving deeply into one area as a precursor to building
your test plan.
Study Internals
However, many software components are chiefly internal in nature, such that their
entire function is provided without any external interfaces to the end user or sys¬
tems programmer. In an operating system, such components might include storage
managers, the input and output subsystem, and the system dispatcher. These com¬
ponents can be much more challenging to learn.
To familiarize yourself with internal components, turn to component note¬
books and functional documentation. Mature software products will (hopefully)
already have these documents, which provide a lower level of detail on internal in¬
terfaces, structures, and flow of control. This information can be your window into
gaining a basic understanding of internally-oriented pieces of the software.
The Test and Development Divide 67
Sometimes software components fall into both internal and external categories.
In any case, your goal is to become familiar with the full suite of components that
comprise the complex software product. By gaining such a wide view of the total
package, you are well on the way toward establishing a solid base of technical cred¬
ibility.
A Sweeping View
By mining these sources of information, you can begin to build a wide base of
knowledge about the software you’re working with. You may have to do much of
this leg work on your own time, but if you’re serious about testing as a career, it’s
well worth it.
Later in the book we’ll explore ways to acquire an in-depth understanding of
one or more components of the software as you build a test plan. But there’s one as¬
pect of the “deep dive” we’ll look at now: debugging.
68 Software Testing Techniques: Finding the Defects that Matter
comes with such expertise will allow you to pursue even the most difficult and, po¬
tentially, controversial code defects and design flaws.
Organizational Challenges
We might need to borrow our favorite contractor’s tripod and transit to determine
the height of the barricade between development and test. Many ideas and tech¬
niques have been implemented to try and determine the correct height of this wall.
One way is through organizational alignment. Large software development organi¬
zations are always looking for the right balance between the two groups so that nei¬
ther one is too strong or dominant. So how many bricks should we place on each
side of the scale?
Business alignment decisions are often cyclical in nature. The organization works
in a certain mode for a few years and then goes back to the way it was before. This may
happen because, over time, we see the advantage of change and how it can modify the
business environment and employee behavior. In other words, change for change’s
sake. All organization models have their pros and cons and it’s important to thor¬
oughly understand the cons and target them specifically. This phenomenon also ap¬
plies when determining where to place the FVT team in the organization.
We established that when developing large, mission-critical software systems,
a clear separation of development and test is required to provide objectivity and en¬
sure that the important defects get removed. It is a generally accepted practice that
SVT, and any other parts of the verification process that follow it, should be in its
own organization and midmanagement chain. The real balancing act is between the
code development and FVT teams.
Before exploring the various organizational models used to address these is¬
sues, let’s take a moment to review organizational definitions. We’ll define a first-
line department as one that contains individuals who perform a specific function
for the organization, such as develop a key software component, and who report to
a single manager. A second-line department is comprised of a logical grouping of
first-line departments led by a manager with broader responsibility and account¬
ability for products. In turn, second-line departments are brought together into a
70 Software Testing Techniques: Finding the Defects that Matter
Model 1
Advantages
This model offers certain advantages:
Disadvantages
There are also some significant disadvantages to this approach:
Impedes Communication Flow: One downside of Model 1 is the very high wall
between development and test can reduce effective communication between
the groups. This may also lead to an “us” versus “them” mentality.
72 Software Testing Techniques: Finding the Defects that Matter
Impacts Early Test Focus: Another potential drawback is how resources and
people are allocated within the second-line FVT organization. Since the group
provides FVT services to multiple second-line development organizations,
there is a possibility that priorities set at the third-line level will inhibit FVT’s
early involvement in design and development activities. If everyone is testing
the current release, no one will be looking ahead to the next one. Priority calls
made by the FVT second-line manager are critical in ensuring that the wall is
scalable and that the communication pipelines are not disrupted.
Increases Cycle Time: This model does not enable the organization to address
time-to-market requirements and challenges with the same aggressiveness as
other models do. The extra time spent in resolving cross-organizational issues
can cause delays and hamper the ability of test to identify bugs and develop¬
ment to address them with the best technical solutions.
Advantages
This model has its own advantages:
Disadvantages
Of course, there are some disadvantages to Model 2 as well:
The Test and Development Divide 73
Advantages
The following benefits arise from this model:
Disadvantages
Naturally, Model 3 also has some disadvantages:
Advantages
Model 4’s advantages include:
Disadvantages
This model has all of the disadvantages of Model 3, plus:
Uses Strengths Poorly: The people who would prefer to be writing code
(because they are “makers”) are forced for a time to be testing other developers’
code, which they do not enjoy (because they’re not “breakers”). Frequently, the
job of being a tester is relegated to the newest, least-experienced team members
The Test and Development Divide 75
Is Organization Enough?
Although organizational structures can have a positive effect on maintaining the cor¬
rect balance between FVT and development, is that the whole story? Models can be¬
come irrelevant when the organization has individuals with strong personalities and
excellent leadership skills. Planting strong test leaders in departments throughout
76 Software Testing Techniques: Finding the Defects that Matter
the organizational model will have a more immediate impact than almost anything
else. These leaders and role models provide the technical insight and the interper¬
sonal, negotiation, and communication skills needed to establish the best balance of
development and test. These test leaders build respectful relationships with devel¬
opers across multiple organizations and are able to help other testers build the same
kind of connections. They can advise testers on the types of questions to ask, the best
approach to inform developers of problems and issues, and recommendations for
following up with development throughout the process. Having the right people
with the right job responsibilities along with highly skilled test leaders will accentu¬
ate the positives and lessen the negatives of these organizational models.
By implementing a model supplemented with an infantry of strong leaders, the
end result will likely be an organization that delivers higher-quality software to sat¬
isfied customers. Regardless of the organizational model selected or the strength of
the leaders of the team, simply altering your test team’s construction can have a
huge impact. Analyzing the current set of issues with your team and identifying and
carrying out specific action plans to address them can be effective as well.
SUMMARY
Employing best practices such as not allowing developers to be the sole testers of
their own code, building technical credibility with development, and maximizing
the success of organizational models through strong leadership can help foster the
most effective relationship between developers and testers and deliver the highest-
quality product. Customers of mission-critical software expect this level of com¬
mitment.
In Chapter 5, “Where to Start? Snooping for Information,” we’ll look at how to
acquire detailed insight into a specific area you’ve been assigned to test. Such in¬
sight will serve as a basis both for building a strong test plan and for extending your
technical prowess.
The Best-laid Plans
ou have just been assigned to test a major new function in a piece of complex,
mission-critical software. Congratulations! Now what?
Jl You need a plan. It’s easy to plan a lousy test, but harder to plan a good
one. The next five chapters will show you how to do it right. The first order of busi¬
ness is learning everything you can about the new function and what problem it’s
trying to solve. It’s also important to understand as much as possible about the ul¬
timate customers of the software, so that your plan hits the same pressure points
that they will. If the complete software package is especially complex, your entire
team may need to work together to break that complexity into manageable chunks
to ensure that subtle interactions are not missed.
Once you’ve finished your research, it’s time to put something down on paper.
Test plans for different test phases will have unique focus areas to tackle. The targets
will range from common ones such as interfaces and mainline function, to others
that are sometimes overlooked, such as recovery and data integrity. Unfortunately,
even the best laid plans can go awry when trouble hits, but there are techniques you
can use to anticipate risks and adjust your plan to account for them. You might even
find yourself proposing changes to the software itself to make it more testable.
Don’t underestimate the value of a test plan. A comprehensive one will chart
your course through the turbulent waters of test execution, while a weak one will
leave you floundering. The methods described in this section have proven their
worth in countless large-scale projects. Apply them to your own situation and reap
the benefits.
Vv
Where to Start?
Snooping for Information
In This Chapter
■ The importance of knowing what you test
■ Test preparation approaches
■ Where and how to find information
■ Obtaining knowledge about customers
■ Finding your first defects
I t’s okay to not know something; it is not okay to test something you do not
know. Proper preparation for a test is as important as the test itself. In fact,
many of the steps used to prepare for a test are themselves a form of testing. De¬
fects can be identified through this investigation, making the tester’s job easier dur¬
ing the test execution phase.
Good preparation for the test of complex software is well worth the investment.
It makes the test more effective and efficient and actually saves time. Also, if done
right, the in-depth knowledge you’ll gain in the specific area you’ve been assigned
to test will complement your broad product view which we discussed in Chapter 4,
“The Test and Development Divide,” and cement your technical reputation in the
organization. So before you write a test plan, do some “investigative reporting.” Ask
questions—lots of them.
79
80 Software Testing Techniques: Finding the Defects that Matter
Subtle Bugs
Understanding poker may not be necessary for you to recognize when the software
crashes, but it is needed to recognize a defect that identifies a poker hand of two pair
as a winning hand over one with three of a kind. Even if the design document states
that a straight beats a flush, tests should not be thoughtlessly developed to show that
requirement was met. Externally, there maybe no indication of a problem, but if the
software doesn’t do what it is supposed to do, it’s a defect that matters.
A Customer's Advocate
Of course, understanding the requirements of the software is the designer’s and de¬
veloper’s responsibility, but it is also an important element of a tester’s job. A tester
needs to go even further to understand the targeted users of the software, and act as
their advocate. The expression “testing to specifications” is common in the testing
world and should be heeded. But a good tester goes further and prepares well
enough to be able to recognize when something in the specifications might harm
customers more than help them. A few examples are presented later in this chapter.
New Technology
More often than not, industrial-strength software will not be as simple as the poker
example—it won’t be the programmatic implementation of something that is well
known. Even if it’s entirely new technology, a tester still needs to thoroughly un¬
derstand it before beginning the test.
Where to Start? Snooping for Information 81
Resources to Tap
Information must be gathered before developing the test plan. So, where do you
start?
Requirements Document
Most software isn’t created or enhanced in a vacuum. Savvy development teams
first gather requirements from potential or actual customers to identify what the
product should do. Then they fold these requirements into a document, which be¬
comes a key input into the design process. By reviewing this design document,
testers gain insight into the software’s objectives and can create tests aimed at en¬
suring the final product achieves its intended goals. It’s a great place to start.
Specifications
Once the requirements are understood, they must be incorporated into a design
that can be implemented. This design is often documented in what’s called a speci¬
fication (“spec”). If the requirements are the what, then the spec is the how. It is one
of the best sources of information testers can use to begin understanding new func¬
tion. Also known as a design document, you can use the spec to review the same in¬
formation that the developer followed when creating the software.
Dependencies
You need to gain a clear understanding of any dependencies of the new software,
whether on other software or on hardware. Are they “hard” or “soft” dependencies?
Hard dependencies are ones that are absolutely required for the software to work
correctly, whereas soft dependencies are required only for the software to work a
certain way. For example, an application written to monitor the poker game de¬
scribed earlier has a hard dependency on that poker game. But if that monitor pro¬
gram provides the capability to store historical data in a flat file or a relational
database, it has a soft dependency on a relational database manager.
Exploiters
Does the new software provide services or APIs for use by other software? If so,
make note of how those services can be invoked. Are the anticipated exploiters of
these services documented? If they are, make a note to find out more about that
software and the plans of this “real exploiter.” Timing is important here. The ex¬
ploiters’ development plans may not line up with the plans of the software you are
assigned to test. If they do line up, you may be able to use the exploiting software
to help test the services. If they don’t, you may not be able to use that software dur¬
ing your test, but you can still ask its development teams for details on how they
plan to apply the new service. This information can be good input for test cases you
need to create. If no mention of anticipated exploiters is documented in what you
are reviewing, make a note to find out later. Testing of APIs or software services
should not be done without an understanding of real or anticipated exploiters.
Book Drafts
Most software is accompanied by manuals, release notes, help panels, or some other
form of external documentation. You can gain a different perspective on the soft¬
ware by reviewing planned external documentation versus internal design docu¬
mentation.
Teaming
In addition, learn about past teaming issues. What kinds of interactions best helped
the team meet its objectives? What hurt? Which communication vehicles were most
effective? What was good about the make up and organization of the team? What
organizational models should be avoided? Did the team itself identify specific areas
of improvement? If so, what were they? Do they apply to your test? Should you im¬
plement them?
What’s available to enhance? Were risks mitigated in novel ways? What can be
learned from the actions put in place during times of trouble?
Developers
Software developers will not be shy about discussing what they’re producing. People
generally like to talk about their work—they will be happy you have shown enough
interest to go to the trouble of scheduling time with them. Start by describing your
view of the functionality—you’ll want to show them you have done your homework.
Then ask them questions. Some sample questions might be:
You may be surprised where the answers lead you. Many developers will gladly
point out weak spots in their code. They will be happy to provide input to your test
preparation. However, they will not be happy if you haven’t done your homework.
However, if you are on a project for which the developers have not produced
any sort of specification to describe what it is they’re creating, then your meeting
with development will be quite different. In the absence of other information, you’ll
need education on what it is you’re about to test. Schedule time for a “chalk talk.”
This is a casual sit-down exchange with the developers. Ask them to take you
through the externals, then the internals of what they’re building. Take lots of
notes, and look for interfaces, pressure points, serialization, recovery capabilities,
dependencies, and all the other things you would watch for if you were scouring an
actual specification.
Once you’ve obtained this background information, then the meeting can
morph into the more test-oriented one described earlier. Discuss the types of test
scenarios that should be run against the code. Get the developer’s perspective on
potential areas of weakness in the new component or function. Ultimately, these
sessions can bring all the developers and testers together for an exchange of knowl¬
edge that leads to a better product.
Service Professionals
Testers can gain another perspective on the software by meeting with customer
support professionals, especially if they are servicing customer calls or problem re¬
ports in a similar functional area. They are very familiar with the types of problems
customers are reporting, and can share their thoughts on serviceability needs and
how hard or easy earlier software was to debug. You’ll gain insight into the types of
problems that escaped previous test efforts, and you can make note of additional
tests to include in your plan. Pay particular attention to any interesting combina¬
tions of events that are not self evident but have occurred in production environ¬
ments and subsequently exposed a defect. For example, support reps may describe
a scenario where the software reacted incorrectly when system memory was con¬
strained because of an activity spike of a completely unrelated software application.
Other Testers
Leveraging the skills and knowledge of fellow testers is always a good idea in every
part of the test, including the preparation phase. You’ll get the most from their ex¬
pertise by exploring several test concepts in depth. Meet with test team members as¬
signed to the same project as you are. Describe what you have learned so far about
the function you’ll be testing. Have your team members do the same. Discuss po¬
tential areas for cooperation and take notes about them. Then define how the dif¬
fering test activities can be combined to save time and effort and produce a better
86 Software Testing Techniques: Finding the Defects that Matter
test. A format for such discussions is described in Chapter 6, “Coping with Com¬
plexity through Teaming.”
Large software projects will have several test phases, each focused on specific
objectives, each attacking different aspects of the software. Meet individually with
testers from the other teams for a high-level discussion about what each of you has
learned so far. Go into detail on your thoughts about approach, coverage, and sce¬
narios. If there is a test phase in front of yours, learn where they are seeing the most
problems. You may want to emphasize these same soft spots in your own testing.
Review each others’ notes and look for things that would duplicate effort. Then de¬
termine whether it is appropriate for just one test team to address that area or if
there is unique value in multiple teams trying a similar plan. Many times, testers are
short on time. Ensuring a test activity is not repeating something already attempted
or that is part of someone else’s test plan will save time and effort. It will also help
you stay focused on the very things that provide the most value.
In addition to uncovering duplication of effort, you should also look at the test
plans from the opposite angle and identify any holes. Identify specific tests that are
needed but are not currently planned by either team. When meeting with other
testers, describe your assumptions on what they will cover. They should do the same,
and together you can discuss what appears to be missing. You may not find all the
holes right then, but at least each team will not rely on assumptions that haven’t been
validated.
Compare Notes
After you have met individually with the designers, developers, and other testers, a
formal get-together of all the players at one time can yield additional benefits when
preparing for a test. Each tester can describe his view of the most important things
to cover. The other meeting participants can provide a sanity check of the proposal
and work to reach agreement on the best approaches and focus areas for each
Where to Start? Snooping for Information 87
planned test effort. The testers should also present the previously identified holes
and redundancies, which may help the bigger team uncover additional ones. The
test review should satisfy everyone’s concerns about the total testing picture. After
that objective is met, each tester can feel confident that his test will be properly fo¬
cused, complete, and effective. It is an excellent way to validate the prior research.
CUSTOMER RECONNAISSANCE
There are as many definitions of software test as there are of the role of a tester.
Which one you subscribe to is not as important as it is to act as the customer’s ad¬
vocate. Of course, performing this role is difficult without understanding your
88 Software Testing Techniques: Finding the Defects that Matter
customers. Gaining their true viewpoint, understanding their issues and expecta¬
tions, and knowing what is most important to them takes a long time. 1 he task of
getting to know customers is not something a tester starts to do after being assigned
a test. It is something that is built over time. One benefit of using professional
testers to test software is that they spend their careers learning about customers.
They also have a view of a much bigger picture than many developers have. Testers
get the opportunity to have hands-on experience in large, complex environments
where system test and integration test are performed. They get the chance to act like
a customer. Applying knowledge of the customer environment during test prepa¬
ration helps testers define customer-aligned approaches and focus areas and to de¬
velop a plan that will find the defects that matter.
Where to Snoop
Obtaining the mindset of the software’s users will not happen by itself. You should
take any available opportunity to interact with customers, since having this in-
depth understanding of their needs is of utmost importance for an effective test.
Let’s review a few ways you can obtain it over time.
Get Involved
You will need to seek out opportunities to interact with customers. Sitting by the
phone waiting for a call will not help, at least not initially. Start by learning about any
activities involving customers and volunteer to assist on those projects. For example,
if the customer service organization is nearby, let them know you are available to re¬
create or help debug any critical problems reported from the field. Afterward, in¬
vestigate and implement ways to keep that kind of problem from escaping your
testing in the future. Offer to present your findings and actions to the customer. The
resulting experience is one little nugget you can collect. Each time you add to the col¬
lection, you build a little more knowledge of your customers.
sibility of viewing one of their unique issues as a pervasive problem when it may not
be. Each customer has its own little quirks.
A lot of software goes through a beta test, where the code is introduced to a subset
of customers prior to making it generally available to all. You can get involved in
these beta tests in a number of ways, from educating beta customers about the new
software to helping support some of them. Assisting in beta test efforts offers you
the chance to interact with the customers who are most interested in the new func¬
tionality, as they are the ones most likely to participate in a beta test. Beta tests also
offer the timeliest feedback, since they happen either in parallel with a test effort or
immediately following one. Things you learn during the beta test let you add an¬
other nugget or two to your customer-knowledge collection.
Build Relationships
You will find that some customers are very open to starting ongoing relationships.
These relationships are good for both sides. The testers win by having a sounding
board for ideas, and customers benefit by having influence over the types of tests
and scenarios executed. Opportunities may arise where an offer can be extended to
a customer to make a presentation to the test team. The customer can be asked to
speak about their environment, what technology they use and how it helps them,
what daily struggles they have, or any other topic. Testers will get the opportunity
to ask questions and add to their customer-knowledge base. Sometimes such rela¬
tionships even evolve to where customers spend a few weeks actually participating
in system or integration testing, side-by-side with the test team.
Go to Trade Shows
If you get the opportunity to attend trade shows or user group events, use the oc¬
casion to meet with customers. This gives you the opportunity to interact with the
product’s users in more informal settings. Just by talking to them, you can learn
what’s on customers’ minds. You can ask them questions about the technology and
software under discussion or the demonstrations at the show and encourage them
to share their thoughts on how they would implement the new technology in their
environment. By discussing any concerns they have about the design or function¬
ality of the software, you can gain additional insight into the customer world.
90 Software Testing Techniques: Finding the Defects that Matter
In the process of preparing for a test, you’ll learn about the software’s functions
from many different sources. Collecting the information is one thing, aggregating
it is another. Deep thinking is the next step. Many test tools are available, but none
will do this thinking for you. Here is where the simplest of tools can help—create a
simple checklist to use as you progress through your investigative process.
Figures 5.1 and 5.2 show some examples of parts of such a checklist for a sys¬
tem test. The questions are just a guide to assist in the investigation phase. There is
no magic, there is no high technology. It’s merely a way to capture what is found.
The checklist is really a tool for your own use, but it can be used for other pur¬
poses as well. For example, the checklist can be used as a consistent agenda when
educating other test team members. A test team can review groupings of multiple
checklists to find opportunities to merge or perform scenarios together. It can also
be used as a training aid for less experienced testers, helping them prepare for their
first test. But mostly, it offers a single place to gather all the notes taken during test
preparation for use as reference when creating the test plan.
Where to Start? Snooping for Information 91
Question Notes
Function Name
Lead Designer, Developer, Function
Tester, Information Developer
Brief description of function
Why are we developing this functionality?
Value to our company
Value to our customers
What will we externally advertise?
Is this function part of a bigger strategy?
If so, what is it?
What is the nature of this function:
Support for other functions or components?
System constraint relief? Performance
enhancements? Enhancements to
reliability, availability, or serviceability?
Standards compliance? Security
enhancements? Other?
Question Notes
After you have exhausted your sources of information to learn about the code you
will test and you understand its customers, you are now positioned to provide your
first feedback on the software. Much of your activity during test preparation is
geared toward building your knowledge. But you can also begin giving back.
SUMMARY
m
m
45 m
w i Coping with Complexity
through Teaming
In This Chapter
■ Complexity of large system software
■ Reducing complexity through teamwork
■ Leveraging everyone’s expertise to solve the complexity issue
95
96 Software Testing Techniques: Finding the Defects that Matter
needed before a robust test plan can be completed. If you are involved with such
software, then this chapter will show you how the entire team can come together to
solve the complexity riddle.
z/OS
FIGURE 6.1 Composition of the z/OS platform and base control program.
Passed across these intercomponent interfaces are data parameters that have a
slew of their own combinations and permutations. Together these imply a high vol¬
ume of test scenarios that are difficult to fathom, and that’s just the beginning.
basic tasks. As a result, the tester’s challenge is to ensure all intricacies, including
those introduced through integration, are sufficiently masked or managed.
Software can have multiple levels of complexity. There may be core services, mid¬
level services, and end-user services, with many interfaces glueing them together. As
a tester, your challenge is to figure out a way to contend with this complexity.
More
FVT and SVT teams often struggle to get a handle on the labyrinth of code they’re
responsible for covering. This is particularly true when an effective software test re¬
quires a great deal of interaction between so many components that it’s impossible
for any single person to study them all, looking for possible ways that one area
might influence another. An effective approach to building a comprehensive test
plan in such an environment is for individual testers to take ownership of different
components of the software, becoming a component spy.
detailed module control flow, structures, and so on, very much in line with func¬
tion verification testers.
Component Basics
The questions to be answered during the component assessment cover a broad
range of the component’s characteristics. Initially, the assessment focuses on the
basics.
Technical Description
The assessment begins with an overview of the component and what it tries to ac¬
complish.
Functions
All of the functions the component provides are described. This should include
both external and internal functions. Some components may be so large that they
themselves are almost minisystems. In that situation, the discussion can include a
100 Software Testing Techniques: Finding the Defects that Matter
Dependencies
Another key piece to solving the component puzzle is to understand its inter¬
dependencies with other components and products. There will be services that a
component will rely on to exploit the full capability of the system. These services
and functions are provided by other software components and by technologies
within the hardware platform or software infrastructure. The component under as¬
sessment can also be providing services to others. Explaining the dependencies in
both directions gives the other testers more perspective on where the component
fits into the picture and what it provides to end users.
Outage-causing Capability
The component spy also needs to clearly describe the component’s potential to
cause an outage. An outage is a slowdown or complete shutdown of the availability
of an application, subsystem, full system, or system cluster. The component assess¬
ment probes into how the component could potentially cause such outages. With
this knowledge, the team can devise focused tests to expose weaknesses in these and
any related areas.
Component Age
When existing software is being enhanced, it is helpful to discuss the age of the
component in question and/or the version of the software in which it was intro¬
duced. This information helps testers see how deep regression testing must be.
Recent Changes
With existing software, the modifications made to the component over the past few
releases should be reviewed during the component assessment, including the spe¬
cific release numbers and nature of the changes. This history lesson provides the
whole team with a valuable overview of the component’s function and role over
time.
Planned Enhancements
The spy is also asked to describe the enhancements planned for the component. By
communication with designers and developers, the tester gains comprehensive
knowledge about what changes are coming along and the problems they are de¬
signed to solve, and shares this insight with the entire test team. This part of the as¬
sessment tends to lead to discussions about new testing approaches or modifications
to existing approaches.
Competition
When working with a commercial product, testers need to consider the competi¬
tion. The existence of competitive products is also a key piece of information that
the spy should bring to the component assessment. A deeper investigation of a
competitor’s product and its functions helps everyone see the many ways a techni¬
cal problem can be solved. This knowledge can result in a better test plan, especially
in cases where the testers are attempting to solve the same technical issues.
Test Coverage
As we’ve seen, the assessment basics provide the team with an excellent overview of
the component, how it fits into the system, its primary function within the system,
and its key dependencies. But, a spy must offer more than just this basic knowledge
of the component. Testing insight is also critical.
102 Software Testing Techniques: Finding the Defects that Matter
Functional Breakdown
Testers assess coverage by breaking down the component into its major pieces.
They answer key questions about each function to help identify the type of testing
techniques and methodologies that should be used against it, including:
■ Does the component get adequately tested by normal system operation and
natural exploitation?
■ Are there record/playback script workloads to drive various functions of the
component?
■ Are there batch workloads, containing many test programs, that can be used to
exercise the component?
■ Are there manual scenario-driven tests that should be executed as part of com¬
ponent coverage?
■ Are there any ported test cases that can be obtained from a prior test phase?
Functional Coverage
The tester is asked to describe which structural aspects of the component are cov¬
ered by the tests, workloads, and scripts that are to be used against each function
provided by the component. These might include initialization, mainline process¬
ing, recovery, and migration or coexistence.
Defect Analysis
That’s a lot of information already. But an overview of the component, its capabil¬
ities, and the planned test coverage is not quite enough to complete the component
assessment.
Coping with Complexity through Teaming 103
For existing software, an extremely important aspect of ensuring that the test
team is protecting its customers is through a defect escape analysis of the compo¬
nent under discussion. The scope of the review must focus on several key areas.
Defect Rate
How many defects have been found by customers in comparison to the size of the
component? This, or whatever defect metric the product team uses, provides a
means by which the component’s quality can be tracked and understood. From a
tester’s perspective, this could become a measuring stick to use over time for un¬
derstanding test effectiveness. Does the defect rate decline over time? Might it be
dropping because of improved test coverage?
Areas of Improvement
Finally, the tester analyzes the results of the assessment to identify potential areas of
improvement. By identifying critical escapes, and exploring the test coverage for the
component and its new and existing functions, the tester can articulate the current
and future work plans needed for improvement, thereby increasing and enhancing
the component’s test coverage. Finally, the testers are tempted with one last ques¬
tion, “If you had an open checkbook, what would you recommend be done to im¬
prove the testing of this component and its functions?” The answer to this final
question helps to steer the team’s discussion toward what can reasonably be ac¬
complished.
Team Benefits
Think of the component assessment information as a treasure map in the search for
the true pirate’s hoard—bugs. By using this divide-and-conquer strategy, the over¬
whelming task of trying to discover all the nuances of interaction between different
components suddenly becomes manageable. As we discussed up front, such a
process is probably overkill for many software projects. But for the true monsters
of complexity, it can be a life saver.
104 Software Testing Techniques: Finding the Defects that Matter
But the value of establishing component spies and engaging in a component as¬
sessment review doesn’t end with the creation of a great test plan. The component
assessment is also an excellent means of building educational and training re¬
sources. By gathering and documenting this information, the tester provides for the
future of the entire test team. When its time for the tester to move on, results of the
prior component assessment offer a great way for a new tester to get his first expo¬
sure to what the component is all about.
Theme-based Testing
A thorough system or integration test plan for industrial-strength software must
provide end-to-end coverage that might go beyond one specific product to span
functions, subsystems, and platforms. For example, a Web application server test
could involve interactions between multiple Web servers, databases, transaction
monitors, firewalls, intrusion detection systems, user registries, authorization and
authentication mechanisms, workload routers, and operating system platforms
(singly and clustered). In addition, each of these could be involved in activities
such as daily operation, data backup, and system maintenance. How can the test
team create a set of realistic, complex scenarios for an environment like this?
It requires pooling of expertise. Specifically, a strategy team, made up of senior
testers and component spies, can be assembled that contains expertise in all the key
Coping with Complexity through Teaming 105
areas and has the objective of building cross-component and cross-system scenar¬
ios that emulates a customer environment.
then examine the symptoms from the perspective of the component in question to
see if anything appears to be awry, or if they can point to another component based
on interface data that seems inconsistent. As the collective set of testers (i.e., com¬
ponent spies) begins to build a picture of what occurred during the execution of the
failing scenario, the isolation of the component in error becomes more manageable.
Complexity is again dissolved by a divide-and-conquer approach.
SUMMARY
For cases where the complexity of software is enormous, testers need to work to¬
gether and make use of everyone’s expertise to build effective and comprehensive
test plans. Since one tester cannot possibly know everything about such software,
leveraging the technical expertise of component spies and senior test leaders is a
great way to generate a result that will delight your customers.
Now that you have finished the investigation stage, you are ready to begin the
creation phase. Chapter 7, “Test Plan Focus Area,” describes one such creation—
the test plan.
;|
In This Chapter
B Who test plans are really for, and how they’re structured
■ Unit Test focus areas
■ Function Verification Test focus areas
■ System Verification Test focus areas
■ Integration Test focus areas
■ Special considerations for multisystem testing
■ Test Cases versus Scenarios
■ Why every tester should love test plan reviews
C an you spend just a few minutes reading about new, upcoming software fea¬
tures and then immediately begin creating your formal plan to test them?
Perhaps, but the result will likely not be worthy of a professional tester. The
initial investigative work discussed in earlier chapters lays the essential ground¬
work for creating a solid testing strategy. Once you’ve researched the entity you’re
about to test, examined reports of past tests of similar function, met with the ex¬
perts, studied potential customer usage, and compared notes with other test teams,
then it’s finally time to put together a detailed test plan.
107
108 Software Testing Techniques: Finding the Defects that Matter
A test plan has several different consumers. Product developers use it to satisfy
themselves that their code will be properly tested. Other testers on your team, or
other test teams (either before or after you in the development cycle), use it to
eliminate overlap and ensure that you all share an understanding of hand-offs,
entry and exit criteria, and so on. New testers might use a test plan as an educational
tool. Product release managers track progress against it. Auditors use it to deter¬
mine if you know what you’re doing. If the ultimate consumer of what you’re test¬
ing has access to your test plan (for example, if you are all in the same company),
then they’ll use it to assess if deployment will go smoothly.
Structure
What does a test plan consist of? Figure 7.1 shows a sample SVT template. There’s
nothing magical about this template, and you may choose to delete sections or add
others. IEEE Standard 829-1998for Software Test Documentation [IEEE829] defines
a more complex template. Some FVT plans are little more than lists of test scenar¬
ios or variations (unit test rarely has even that much). Ultimately, the test plan is
merely a document. It takes the output of the real work of test planning and pack¬
ages it. What’s important is for that package to be sufficiently complete and di¬
gestible enough to be useful to its primary consumers—yourself, other testers, and
• Document Control
- Distribution
- Approvers/Reviewers
- Change History
• Overview
- Project Summary
- Overall Test Goals and Objectives
• Test Environment
- Hardware Configuration
- Software Configuration
• Tools and Workloads
- Test Tools
- Base Workloads
• Administration
- Test Assumptions and Dependencies
- Entrance and Exit Criteria
- Status Tracking Approach
- Problem Reporting and Tracking Approach
- Maintenance Strategy
- Deliverables
• Schedule
• Test Matrices and Scenarios
those reviewers with the skill, knowledge, and experience to suggest improvements
that will help you find more bugs or add other value. Anything more than that is a
waste of paper.
Front Matter
Everything in our example template except for the very last line is what we call front
matter. Most of it will be boilerplate information that can be reused from test to test
with only minor modifications. Some of it (such as information on configurations
and workloads) is useful to reviewers to give them some context for your plans.
Other parts help ensure different teams are in sync. Good entrance criteria are im¬
portant for ensuring the team doesn’t waste time by starting with software that isn’t
yet ready, and good exit criteria tell you when you’re done. Identifying dependen¬
cies on specific hardware assets or software products and dates when they are
needed will help line things up. The schedule section is certainly optional, since in
any commercial project there will be detailed schedules that are tracked by project
managers using their own charts and tools. But some folks like to include a few
high-level schedule milestones in their test plan as a reference point for themselves
and reviewers.
Other sources describe variations on this front matter in great detail, but we
won’t do so here. It typically covers the entire test team, and so is usually developed
and maintained by the test team leader. Other team members don’t spend much time
contemplating front matter. Their focus is entirely on that innocent looking last line,
for that is where the true core of the test plan lies: the test matrices and scenarios.
Matrices that summarize coverage ot key tests across different hardware or soft¬
ware environments can further condense and highlight important information. If
you use these techniques, always provide a pointer to the test management tool (or
wherever your detailed scenarios descriptions live) as well, so ambitious reviewers
can study the specifics. A cleverly designed test management tool would even allow
you to extract scenario summary descriptions so you could drop them directly into
the test plan document without having to retype them. In fact, some such tools
allow you to package your detailed scenario descriptions as simple HTML docu¬
ments, further easing reviewer access.
You may wonder how much information should go into the detailed scenario de¬
scriptions, wherever they live. There are two schools of thought. The first is to doc¬
ument each scenario in explicit detail. This cookbook scenario approach has the
virtue of completely capturing all relevant knowledge. It has the disadvantage of de¬
manding huge amounts of time—in some cases, the time spent generating the doc¬
umentation could be longer than what’s required to actually execute the test. This
approach is sometimes used by teams where there are a small number of highly
skilled testers (who become the plan writers) and a large number of inexperienced
testers (who become the plan executors).
The second approach is to jot down only enough high-level information to re¬
mind the tester of everything he wants to cover within that scenario. This framework
scenario technique captures the activity’s essence, but trusts the tester to work through
the specific steps required. It is frequently used when there is a broad level of skill
across the team, so that every tester writes his own scenario descriptions. It saves a
great deal of time that can then be applied to researching and executing the test, rather
than sitting at a keyboard. Figure 7.2 shows an example of a framework scenario.
We recommend the latter approach, even in cases where there are many inex¬
perienced testers on the team. Certainly the team leader and other experienced
testers will need to review what the newer folks come up with, but what better way
for the newbies to take ownership of their work and build skills than to do this test
planning themselves?
A first pass at creating the considerations list is done by looking at the software
from the outside, noting all of its inputs, outputs and actions, and adding them to
the list. Then the team pulls back the covers and examines the actual modules that
make up the software component. They look for key internals such as serialization
points, resource access, and recovery support, and add them as appropriate to the
considerations list. An example test considerations list is shown in Figure 7.3. This
list covers a particular product’s ability to process configuration parameters that
would be read during its initialization. A subset of those parameters could also be
dynamically reloaded later to change the product’s behavior on the fly.
Once the test considerations list has been outlined, a list of variations for each
test consideration will be developed. These variations represent the actual techni¬
cal plan of attack on the function and form the core of the test plan. They describe
in more detail the conditions that need to occur to drive a specific code path, as well
as the expected results. Figure 7.4 shows a sample variations list that was derived
from a portion of the earlier considerations list.
Test Plan Focus Areas 113
Reload
Each reloadable keyword:
-not specified
-valid
-invalid
Nonreloadable keywords not reloaded
Ail keywords changed in one Reload
Once completed, the team would send the variations list out for review. Many
FVT teams in particular have had success using this considerations/variations tech¬
nique for creating their test plan.
Content
What should test scenarios or variations focus on? It depends. Different test phases
are most efficient and effective at finding different types of defects, so they should
focus their testing in those areas. This is true regardless of how a given development
model may have arranged those test phases. Let’s take a look at the key test plan
focus areas for four different test phases: unit test, function verification test, system
verification test, and integration test.
114 Software Testing Techniques: Finding the Defects that Matter
The unit test of a module is performed by the developer who wrote it prior to
merging that module into the overall development stream. The goal of unit test is
simply to ensure that all the obvious bugs are removed before anyone but the de¬
veloper sees the code. It is achieved by forcing the execution of every new and
changed line of code, taking all branches, driving all loops to conclusion, exercising
all object behaviors, and so forth. Typically, this is accomplished by the developer
stepping through the code, setting breakpoints, and forcing the necessary condi¬
tions. A documented plan isn’t necessary in order to achieve this goal, and frankly
adds little value.
That’s not to imply that unit test isn’t important—quite the contrary. Unit test
clears out the underbrush of obvious defects and thereby allows for a smooth inte¬
gration of the module into the development stream. It also enables the function
verification test team to focus on the next level of defects. Naturally, it’s in the de¬
veloper’s best interest to do as thorough a job of unit test as possible; after all, the
Test Plan Focus Areas 115
more defects he can extract during unit test, the less he will be bothered later by
testers who are proud to find another bug.
Once the developers of various modules have merged them into a common build
stream, it’s time to move beyond the single-module domain of unit test and into
the more complex area of Function Verification Test (FVT). The scope of the FVT
is that of a complete, yet containable functional area or component within the
overall software package. There are several focus areas typically targeted in an FVT.
Let’s examine them.
Mainline Function
A major part of FVT is to see if the program correctly does the big things it’s sup¬
posed to do. In other words, testing its mainline function. If it’s an online bookstore
application, you would define scenarios to browse categories, search for books by
title or author, put items in your shopping basket, check out, and so on. Concep¬
tually, what you need to cover in mainline testing is fairly obvious; though doing a
good job of it requires careful study of the program in question and a solid under¬
standing of likely customer usage patterns and environments in which it must sur¬
vive. Mainline testing overlaps with some of the other areas listed later, although
they are all special areas of focus in their own right.
Security Support
Security support is part of mainline function, but it is so important (and sometimes
misunderstood), that it’s worth calling out separately. There are multiple security
features that may be incorporated into a piece of software:
Authentication: Confirming that a user is who he claims to be, through the use
of a user ID and password combination, token, biometric, or similar approach
Authorization: Limiting an authenticated user’s access to permitted areas only,
through an access control mechanism
Confidentiality: Hiding private information from public view, usually through
encryption
Integrity: Ensuring that information was not altered in transit, by using such
tools as Message Digest 5 (MD5) sums
Nonrepudiation: Preventing a sender of a message from later claiming he didn’t
send it, through digital signatures or similar techniques
116 Software Testing Techniques: Finding the Defects that Matter
The use of Secure Sockets Layer (SSL) encryption to protect the confidentiality
of data flowing across the wire from a user’s browser to a merchant’s Web site has
become pervasive—so much so that sometimes people believe SSL is all there is to
security. But what good is the SSL-protected transmission of a customer’s credit
card number to an online bookstore if it is stored there on a publicly accessible
database? Or, if any user can gain access to a customer’s account information?
During FVT, it’s important to thoroughly exercise all security mechanisms just
as you would other mainline functions. Check that users are properly authenticated
during sign-on, can only access the account data they are authorized for, that en¬
cryption is doing its job, and so forth. It’s also important to look at security from the
outside in, as an invader might. Any software running in production environments
may someday come under attack. This is not only true for external Web-facing ap¬
plications exposed to hackers or terrorists—-surveyed companies continually at¬
tribute a big chunk of security breaches to their own authorized users or employees
[Hulme03].
System security testing, also called penetration or vulnerability testing is an art
unto itself. To discover a program’s vulnerabilities, you’ll need to learn to think like
an intruder. You must search for any unintended behaviors in the program that
allow you to bypass its armor and do things you shouldn’t be able to do. Generat¬
ing buffer overflows, forcing the program to use corrupt files, faking sources of
data, sabotaging the flow of a communication protocol, and dozens of other tricks
are the name of the game here.
A quick Internet search on “hacking tools” will find a bounty of programs used
by (or sometimes against) the bad guys. Some may be useful for your testing, or at
least get you thinking in the right direction. Entire books have been written on this
topic. For an excellent introduction to the ins and outs of security testing, see Whit¬
taker and Thompson [Whittaker03].
Software Interfaces
There are several different classes of software interfaces to consider.
about the various pressures the component may find itself in, and the more devious
you are, the longer your list of API-related FVT scenarios will become.
Human Interfaces
In addition to communicating with other components, some functions will need to
communicate with humans as well. This communication can take several forms.
Let’s briefly look at several.
Line-mode Commands
In addition to GUIs, most software products have the capability of being manipu¬
lated through line-mode commands. While these commands are often handy for
the experienced user who has tired of wading through GUI screens, their primary
value is probably for automation. Line-mode commands can be included in au¬
tomation routines ranging from simple shell scripts to much more sophisticated
vendor tools.
As with GUIs, testing line-mode commands is conceptually simple but time
consuming. Every option and combination of options must be tested, including
those combinations that are invalid. Your goal isn’t simply to try all these options,
but to ensure you obtain the desired result—and only the desired result.
Test Plan Focus Areas 119
Messages
Most large software packages issue messages to indicate both success and failure.
You must define scenarios to force out every possible message. In some cases, this
will be easy to do. For others it may be quite tricky, especially for some of the fail¬
ure messages. Nevertheless, each message represents a path through the code, and
must be exercised. For particularly thorny cases, the preferred approach of using
external stimuli to create the necessary conditions may not be practical, so condi¬
tions may need to be rigged to force out the desired message. But this should be the
exception rather than the rule.
Naturally, it’s also important to validate the contents of each message. Are its
contents accurate? Is it useful in the context in which it’s displayed? Does it provide
the necessary information to allow the user to take appropriate action? Scrutinize
each message from the perspective of an unknowledgeable user to identify areas of
potential confusion. Also, if the messages are supposed to follow a predefined stan¬
dard, check to see if they have done so. Some automation and monitoring tools de¬
pend on standard message formats.
product’s support team instructed the customer to turn on a specific trace level,
and then restart the product. That action should have forced a message to the trace
log indicating whether or not the product was using hardware cryptography.
The customer complied, but was confused to find that not only did the trace
log not have an entry indicating that cryptography was being used, it had no mes¬
sage saying it wasn’t being used either! It turned out that the product was indeed
using the hardware cryptography correctly, but there was a bug that suppressed its
associated trace entry—something which had been overlooked during testing. A
simple bug, but its escape led to customer confusion and use of the support team’s
expensive time. Don’t fall victim to a similar omission in your FVT.
Limits
This one is less of a single focus area and more something that permeates many
areas. It’s not exclusive to FVT either, though it is critical to any good FVT. Limits
testing (also known as boundary condition testing) simply means testing a piece of
code to its defined limits, then a little bit more. In the movie This is Spinal Tap, a
character proudly explains that while normal guitar amplifiers can only be turned
up to “ 10,” his can be turned up to “ 11.” Limits testing is something like that. If an
input parameter has allowable values ranging from one to ten, test it with a value
of one, then with a value of ten, then with eleven (you would want to try it with zero
and probably negative values as well). It should accept values in the allowable range,
and reject values outside of that range.
It’s not practical, or even worthwhile, to test every possible input value. Test¬
ing just around the limits is both sufficient and efficient. The same approach applies
to many other situations, from loop exit triggers, to storage allocations, to the con¬
tent of variable output messages. Always test the limits.
Recovery
Recovery testing during FVT explores any support the software has for addressing
error situations. Can it restart cleanly after a failure? Does it self-heal? Can it recover
successfully from error conditions it anticipates? What about those it doesn’t an¬
ticipate? Does it provide adequate diagnostic data?
Recovery testing is an area that is often either overlooked completely or given
inadequate attention. This is odd, because not only is recovery support usually rich
in bugs, it’s also fun to test. And given the inevitability of errors, recoverability may
be a software package’s most vital characteristic for ensuring high availability. The
next chapter is devoted entirely to this exciting area.
Test Plan Focus Areas 121
Internationalization
If the software you are testing is a commercial product, it most likely will undergo
translation for sale in multiple countries. The implications go beyond simply trans¬
lating text. A keyboard's keys vary from country to country. Sorting by ASCII val¬
ues may not make sense in some languages. Time and date formats differ. Some
languages require more space than others to describe the same thing, so menus, di¬
alog boxes, and other user interfaces may need to be adjusted. The list of consider¬
ations, and their corresponding scenarios, can grow quite long. Thorough testing of
these functional aspects of the internationalization support is usually done during
FVT, and frequently can be accomplished without the need for testers to under¬
stand the different languages being supported.
In addition to the functional aspects of the internationalization support, there’s
also the accuracy of the translated text to consider. The first time a product under¬
goes translation for a new language, thorough testing is usually in order. That re¬
quires testers fluent in the language at hand, and it often makes sense to outsource
that testing to specialists in the target country. Depending on what has changed,
subsequent releases may only require a brief review. See Kaner, et al. [Kaner99] for
more detail on testing for internationalization.
Accessibility
Testing for accessibility determines if the software will be usable by people with dis¬
abilities. There should be keyboard equivalents for all mouse functions to accom¬
modate blind people or those with limited hand use who cannot accurately position
a mouse. Visual cues should be available for all audio alerts to accommodate those
who are deaf or hard of hearing. Color should not be the sole means of conveying
information or indicating an action, to accommodate blind and colorblind people.
Every window, object, and control should be labeled so that screen-reading soft¬
ware can describe it. Documentation should be available in an accessible format.
If software is intended to be accessible, the development team will probably
have followed a checklist of accessibility techniques and requirements from either
industry or government, such as one available from the United States government
[USGOV1]. The same checklist can be useful for generating test scenarios. Testing
techniques include using the same tools as disabled users will, such as keyboards
and screen readers. Printing GUI screens in black and white is useful for detecting
if any information they communicate is not discernible without the use of color.
Commercial tools are also available to aid with accessibility testing. IBM has a de¬
tailed accessibility checklist available on the Web, along with rationale and testing
techniques for each item [IBM03].
122 Software Testing Techniques: Finding the Defects that Matter
System Verification Test, or SVT, is the point where the entire package comes to¬
gether for the first time, with all components working together to deliver the proj¬
ect’s intended purpose. It’s also the point where we move beyond the lower-level,
more granular tests of FVT, and into tests that take a more global view of the prod¬
uct or system. SVT is also the land of load and stress. When the code under test
eventually finds itself in a real production environment, heavy loads will be a way
of life. That means that with few exceptions, no SVT scenario should be considered
complete until it has been run successfully against a backdrop of load/stress.
The system tester is also likely the first person to interact with the product as a
customer will. Indeed, once the SVT team earns a strong reputation, the develop¬
ment team may begin to view them as their first customer. This broader view in¬
fluences the focus areas of the SVT plan. Let’s take a look at several of those areas.
Installation
Naturally, before you can test new software you must install it. But installation test¬
ing goes beyond a single install. You’ll need scenarios to experiment with multiple
options and flows across a variety of hardware environments and configurations. In
fact, this testing could be done during FVT rather than SVT if the FVT team has ac¬
cess to the right mixture of system configurations. Regardless of when the testing is
done, after each installation it’s important to exercise the software to see if it’s truly
whole and operable. Uninstall scenarios should be covered as well, as should the
ability to upgrade a prior release to the new one, if supported. Your customer’s first
impression of the software will come at install time—make sure it’s a good one.
Regression
The idea behind regression testing is simple: see if things that used to work still do.
Production users insist on this kind of continuity. Testing is required because
whenever new function is introduced into a product, it almost inevitably intersects
and interacts with existing code—and that means it tends to break things. This is
what Kenney and Vouk refer to as a new release of software stimulating the discov¬
ery of defects in old code [Kenney92].
Note that some in the testing field also describe a regression test as the rerun¬
ning of a test that previously found a bug in order to see if a supplied fix works.
However, we prefer the term fix test for that activity.
Regression testing is best accomplished through a collection of test cases whose
execution is automated through a tool. These test cases come from past tests, of
course. It’s hard to imagine something more effective at discovering if an old function
still works than the test cases that were used to expose that function in the first place.
Test Plan Focus Areas 123
Evolving Roles
There are two roles regression testing can play in SVT. First, an automated collec¬
tion, or bucket, of old test cases makes an excellent first workload to run at the be¬
ginning of SVT. It can serve as an acceptance test to determine if the code meets a
predefined stability baseline. By definition, the regression workload won’t explicitly
exercise any of the new (and probably unstable) code. But, it may implicitly test up¬
dates to existing function. This is particularly true for functions that provide ser¬
vices or support for the rest of the software package (such as an operating system’s
lock manager). So, regression testing is a great tool for assessing fundamental sta¬
bility before you proceed to more complex and stressful testing.
But once this initial testing is complete, should the regression workload go into
hibernation until the beginning of the next test cycle? No! It can live a second life
as a source of background stress upon which other tests are executed. As test cases
for new functions are completed, they can be rolled into this existing regression
bucket, or into a new regression bucket that’s specific to this release. This ongoing
test case integration ensures that code added to the product later in the SVT cycle
(for additional features, fixes, etc.) doesn’t break those already-tested functions—
creating a living, ongoing regression test, at no additional cost to the team.
We strongly advocate integrating the regression bucket into daily SVT runs as
one means of driving background load/stress. Such a bucket, built up over several
releases, exercises a rich variety of paths, options, and features in the product. How¬
ever, it will likely be only one of several means available for generating background
stress. Indeed, some teams prefer to run the regression bucket on the side, outside
of their mainstream tests, to ensure it is run continuously. That approach is often
employed by FVT teams as well.
Migration/Coexistence
The intent of migration testing is to see if a customer will be able to transition
smoothly from a prior version of the software to the new one. Which prior release?
Ideally, all currently supported prior releases, though sometimes that isn’t practical.
If we call the new release under test “n”, then at a minimum you’ll need to test mi¬
gration from release n-1 to n. It can also be productive to test migration from re¬
leases n-2 and n-3 as well. This gets especially interesting if the user will be allowed
124 Software Testing Techniques: Finding the Defects that Matter
to migrate from, say, the n-3 release directly to the n release, without first moving
through the interim releases, since the potential for trouble in this case is magnified.
However, the longer n-1 has been available and the more pervasive its use, the less
important it becomes to test older versions.
What exactly does migration testing consist of? It depends on the nature of the
software being tested. For a database program, it includes making sure the new
code can correctly access and process data created by the prior version. For an op¬
erating system, it includes ensuring you can upgrade to the new version without
having to reinstall and reconfigure (or rewrite and recompile) existing applica¬
tions. For a Web application server, it includes checking that the new version can
coexist in a cluster with systems running the older version. In all cases, test scenar¬
ios should include ensuring that any documented steps for performing the upgrade
process are complete and correct.
Load/Stress
As has been mentioned already, load/stress is the foundation upon which virtually
all of SVT for multithreaded software is based, either as the primary focus or as a
backdrop against which other tests are executed. In this section, we look at
load/stress as a primary focus. This testing goes beyond any stress levels achieved
during initial regression testing, reaching to the ultimate levels targeted for the
project. There are two dimensions involved: deep and wide.
This dimension primarily aims for throughput-related targets. You seek timing and
serialization bugs (also called race conditions) that only extreme levels of stress will
expose. Why does stress have this effect? If you’ve ever tried running anything on a
busy server, you know that saturating the system with work slows everything down.
Tasks competing for precious resources must wait in line longer than normal. This
widens timing windows around multithreaded code that hasn’t been correctly se¬
rialized, allowing bugs to reveal themselves.
The throughput targets you choose are up to you. Examples include:
Simply achieving throughput targets using a trivial program loop misses the mark.
The point is to see how the software holds up under the most extreme real-world
conditions, and to do that your workloads must simulate those of actual produc¬
tion environments. You should include an appropriate mix of workload types, run¬
ning either concurrently, serially, or both.
For example, in the SVT of z/OS, workloads might include:
The first three cover wide swaths of the operating system support, while the
fourth narrowly focuses its stress on one key area. All could be run individually to
uncover the effects of a single, focused workload, or simultaneously to find prob¬
lems arising from the resulting interactions. In the latter case, thought should be
given to alternately raising or lowering the intensity (e.g., the number of users or
threads) of each workload to further vary the mix.
the least desirable option and so should be a last resort. It also suggests the need for
a retest of the foreground tests once the stress issues are resolved.
m For an application, the number of files that can be open concurrently or the size
of an individual file
■ For a database program, the count of distinct database tables that can be joined
together, the size of each table, or the number of simultaneous users who can
issue queries or updates
■ For an online retail store, the ratio of browsers to buyers, the number of shop¬
ping carts that can be open at once, or the aggregate contents of individual carts
■ For a file system, the sum of individual disks that can back a single instance
B For an operating system, the amount of real memory it can manage, simulta¬
neous users it can handle, CPUs it can exploit, or other systems it can cluster
with
Again, the objective in system test is not to push these limits through simplis¬
tic, artificial means, but rather by using realistic workloads that might be encoun¬
tered in production environments. Also note that there’s nothing wrong with
driving both the deep and wide dimensions of load/stress at once; indeed, it is an ef¬
fective approach. But it’s useful to keep these different dimensions in mind as you
build your test plan.
Mainline Function
This is similar to what is done in FVT, in that you are targeting new and changed
functionality. But rather than narrowly focusing on function at the component level,
your scope is expanded to view it end to end. Whereas in FVT you would exhaus¬
tively exercise every aspect of individual component interfaces, in SVT you should
devise scenarios to exhaustively exercise the entire software package’s supported
tasks from an end-user perspective. You wouldn’t try every combination of inputs to
a system command. Rather, you would use that command to perform an action or,
in concert with other commands, enact a chain of actions as a customer would.
You’ll want to check if complimentary functions work together appropriately. Of
critical importance is performing these tests against a backdrop of heavy load/stress,
because functions that seem to work fine in a lightly loaded FVT environment often
Test Plan Focus Areas 127
fall apart in a high-stress SVT environment. We’ll explore the contrast between FVT
and SVT test case development in Chapter 11, “Developing Good Test Programs.”
Hardware Interaction
This is really a special case of mainline function testing, but because of its tie-in with
an expensive resource (hardware), it’s worthwhile to document separately. There
are really two aspects to this focus area. First, if the software under test has explicit
support for a new piece of hardware, as is often the case for operating systems, em¬
ulators, and networking tools, then scenarios must be included to test that support.
Second, if the software has any implicit dependencies or assumptions about
hardware that it will run on or interact with, then scenarios must be included to put
those factors to the test. What do we mean by implicit dependencies or assump¬
tions? For example:
Unfortunately, it’s not always apparent that software is making these implicit
assumptions until it meets an unexpected hardware environment and causes may¬
hem. It is often good practice to test software across a range of available hardware
environments to see how it reacts.
Recovery
Recovery testing is just as important in SVT as it is in FVT. Flowever, in SVT the
scope broadens from a component view to a full product view. The need to restart
cleanly after a crash expands to consider system-wide failures. The impact of clus¬
tered systems and environmental failures begins to come into play. Recovery testing
during SVT will be discussed in detail in Chapter 8, “Testing for Recoverability.”
Serviceability
Errors are inevitable. Serviceability support responds to that fact by providing fea¬
tures such as logs, traces, and memory dumps to help debug errors when they arise.
Thoroughness counts here. Customers take a dim view of repeated requests to
recreate a problem so that it can be debugged, particularly if that problem causes an
128 Software Testing Techniques: Finding the Defects that Matter
outage. They’ve already suffered once from the defect; they want it fixed before they
suffer again. In particular, they expect software to have the capability fov first fail¬
ure data capture (FFDC). This means that at the time of initial failure, the software’s
serviceability features are able to capture enough diagnostic data to allow the prob¬
lem to be debugged. For errors such as wild branches, program interrupts, and un¬
resolved page faults, FFDC is achievable. For others (e.g., memory overlays or data
corruption) it is quite difficult to accomplish without continuously running de¬
tailed traces and accepting their associated performance penalties. Nonetheless, for
production software, FFDC is what most customers seek.
System-level testing should explore serviceability features’ ability to achieve
FFDC in a heavily loaded environment. Rather than defining specific serviceability
scenarios in the test plan for FFDC, the usual approach is “test by use.” This means
the testers use the serviceability features during the course of their work to debug
problems that arise on a loaded system, just as is done in production environments.
Any weaknesses or deficiencies in those features should then be reported as defects.
Another aspect to serviceability support is its practicality for use in a produc¬
tion environment. Detailed traces that seem perfectly reasonable in a lightly loaded
FVT environment might fill up an entire disk with data in less than a minute in a
heavily stressed SVT environment. If these traces are in place to capture diagnostic
data for a bug that takes hours to recreate, then they simply are not practical. The
test plan should include specific scenarios to explore the practical aspects of using
serviceability features on a heavily loaded system.
Security
Probing for security vulnerabilities during SVT is similar to what is done during FVT,
but with a system-level focus. For example, this would be the place to see how your
software holds up under a broad range of denial-of-service (DOS) attacks. Such at¬
tacks don’t try to gain unauthorized access to your system, but rather to monopolize
its resources so that there is no capacity remaining to service others. Some of these at¬
tacks have telltale signatures that enable firewalls and filters to protect against them,
but others are indistinguishable from legitimate traffic. At a minimum, your testing
should verify the software can withstand these onslaughts without crashing. In effect,
you can treat DOS attacks as a specialized form of load and stress. There are quite a
variety of well known attacks that exploit different vulnerabilities, and have exotic
names such as “mutilate” and “ping of death.” Once again, an Internet search will un¬
earth tools you can use to emulate hackers, crackers, and other bad guys.
Data Integrity
Certain types of software, such as operating systems and databases, are responsible for
protecting user and system data from corruption. For such software, testing to ensure
Test Plan Focus Areas 129
that data integrity is maintained at all times, regardless of external events, is vital. It’s
also tricky, since most software assumes that the data it uses is safe, it often won’t even
notice corruption until long after the damage is done. Chapter 12, “Corruption,” is
devoted to discussing techniques and tools for this interesting yet challenging area.
Usability
In a perfect world, a program’s end user interfaces would be designed by human
factors and graphic arts specialists working hand-in-hand with the program’s de¬
velopers. They would go to great pains to ensure that input screens, messages, and
task flows are intuitive and easy to follow. In reality, interfaces are often assembled
on a tight schedule by a developer with no special training. The results are pre¬
dictable.
Thus, it may fall upon the test team to assess how good an interface is. There
are two approaches for usability testing: explicit and implicit.
Explicit Testing
This method involves observing naive users as they interact with the software, and
recording their experiences. Where do they get stuck? How many false paths do
they pursue? Can they understand and follow the messages? Such testing can be¬
come quite elaborate, even involving a dedicated lab with two-way mirrors. Ironi¬
cally, that much care would probably only be taken for projects that also had
specialists designing the interfaces—or, on the second release of a product whose
first release was deemed a usability disaster.
Implicit Testing
Testers of a software package are also its first users. Through the course of your work,
you will explore all of a program’s interfaces and task flows. You’ll probably become
frustrated at times—the source of your frustration will likely be a usability bug.
This implicit testing through use is how most usability testing is done. It doesn’t
require a separate test phase or even any specific scenarios in your test plan. It does re¬
quire you to pay close attention, notice when the program is frustrating you, and
write up a bug report. You must resist the natural urge to assume your struggles are
due to your own naivety. By the time you’ve gone through the trouble to learn about
the new software and create a test plan for it, you probably understand it much bet¬
ter than the average new user will. If you’re having trouble, so will they.
Reliability
Also known as longevity or long-haul testing, the focus of this area is to see if the sys¬
tem can continue running significant load/stress for an extended period. Do tiny
130 Software Testing Techniques: Finding the Defects that Matter
memory leaks occur over several days which eventually chew up so much memory
that the application can no longer operate? Do databases eventually age such that
they become fragmented and unusable? Are there any erroneous timing windows
that hit so infrequently that only a lengthy run can surface them?
The only way to answer some of these questions is to try it and see. Reliability
runs typically last for several days or weeks. Frequently there are multiple reliabil¬
ity scenarios, each based on a different workload or workload mix. Also, because a
piece of software must be quite stable to survive such prolonged activity, these sce¬
narios are usually planned for the end of the system testing cycle.
Performance
As noted in Chapter 3, a clear distinction should be made between system testing
and performance testing, since the goals and methodologies for each are quite dif¬
ferent. Explicit performance measurement is not normally part of SVT. However,
some obvious performance issues may arise during system test. If simple actions are
taking several minutes to complete, throughput is ridiculously slow, or the system
can’t attain decent stress levels, no scientific measurements are required to realize
there’s a problem—a few glances at your watch will do the trick. We refer to this as
wall dock performance measurement. An SVT plan should at least note that wall
clock performance issues will be watched for and addressed.
Artistic Testing
As discussed earlier, artistic testing is a freeform activity in which you use new tech¬
nical insights gained during the test to devise additional scenarios dynamically (see
Chapter 15 for more detail). It’s a good idea to define some artistic testing scenarios
in your plan. Initially, there won’t be any content in these scenarios; it will be added
later, as more experience with the software is gained. But documenting the scenar¬
ios up front allows time and resources to be scheduled for them. Listed here under
system test, artistic testing is applicable to function and integration testing as well.
SVT focuses on testing a software package as a single entity. We can move beyond
that to broaden the scope yet again, to view that entity as one element of an overall
system environment and see how well it works and plays with its neighbors. This is
what the mainframe world refers to as an integration test. This test emulates a com¬
plete production environment. Many software packages operate together across a
complex hardware configuration to accomplish multiple goals in parallel. It targets
problems related to the interaction of those different software products. Integration
Test Plan Focus Areas 131
test aims to see if it can provide very high levels of service to applications and sim¬
ulated end users by establishing a pseudoproduction environment.
The test plan focus areas are similar in name to those of an SVT, but the scope
is broader:
P Regression testing ensures the updated product can still interact correctly with
other older products. This is also termed compatibility testing.
* Migration testing may attempt to migrate the full environment to multiple
new or updated products in a recommended sequence.
* New function is exercised against a richer, more complex environment with
more intertwined interactions through which to navigate.
B Not just one, but multiple products must compete for system resources under
heavy load and stress.
Recovery testing can address the failure of not just a single product under test,
but all other products in the software food chain that it depends on. It can also look
at mean time to recovery—focusing as much on the ability to rapidly restore service
as to gather diagnostic data. Because software at this stage in its development life
cycle should be fairly stable, reliability testing is extended to provide a final proving
ground before it is sent into the unforgiving world of an actual production envi¬
ronment.
Multisystem testing can span the FVT, SVT, and IT disciplines. In many ways,
multisystem testing is similar to single-system testing. But a multisystem tester
must devise scenarios to address special considerations.
Sympathy Sickness
If one system in your cluster or grid gets “sick,” it should either shut down or limp
along as best it can, without causing a chain reaction in which other systems in turn
get sick in “sympathy.” The sickness might take on many forms, such as locking
hangs, memory leaks, or endless loops. The remaining systems must remain im¬
mune to side effects from their interaction with the sick system (e.g., deadlocks,
message passing hangs, database conflicts). The tester should define scenarios to
generate or emulate various forms of sickness, and explore how the remaining sys¬
tems react.
Resource Contention
In this case, no system is sick, but all are competing vigorously for common re¬
sources. These resources could be external, such as shared databases or LAN
switches, or internal. By internal resources, we mean programming constructs used
to serialize and control the flow between systems, such as locks and synchronous
messages. Is it possible for one system to be a resource hog, effectively locking all
others out? Can dynamic upgrade or system service actions create a spike in re¬
source contention that spirals out of control? Will resource deadlocks arise under
heavy load and stress? The tester with a highly evolved “breaker” mentality will cre¬
ate scenarios to find out.
Storm Drains
Clustered systems usually have a workload router or sprayer in front of them that
attempts to spread work evenly among all members. That router needs to follow an
algorithm to decide where to send the next piece of incoming work. Some products
will use a simple round-robin approach, in which the sprayer runs through its list
of systems and sends work to each in turn. This approach is easy to implement, but
doesn’t take into account the varying capacities of each system to accept new work.
If one system is overloaded while another is underutilized, it would be better to
send new work to the one with available cycles. So more sophisticated sprayers use
a workload management algorithm in which agents on each system periodically in¬
form the sprayer of their available capacity, and the sprayer in turn uses this infor¬
mation to make its routing decisions.
But what happens if one system in the cluster stumbles into an error condition
that causes it to flush work rather than complete it? If the system is flushing all new
Test Plan Focus Areas 133
work sent its way, to the outside world it might appear to simply be executing very
quickly. Whenever the workload router checks, it finds that system to have available
capacity. The router sends more work to the troubled system, which is immediately
flushed, so the router sends it more, and so on. Eventually, almost all new work
coming into the cluster gets channeled to and flushed down the failing system, like
water down a storm drain.
Well-designed software will include trip wires to prevent such an effect from
materializing. The clever tester will attempt to create a scenario along these lines to
see how the system reacts.
The terms test case and scenario are sometimes used interchangeably, but they mean
different things. IEEE 610.12 defines a test case as: “A set of test inputs, execution
conditions, and expected results developed for a particular objective, such as to ex¬
ercise a particular program path or to verify compliance with a specific require¬
ment.” IEEE 610.12 doesn’t mention scenarios, but does define a test procedure as:
“Documentation specifying a sequence of actions for the execution of a test.1”
This definition of a test case actually maps nicely to what was discussed earlier
as a test variation. In practice, we find that the term “test case” is often used to de¬
scribe a test that is embodied within a test program. When the test requires a se¬
quence of actions to be performed, it’s usually called a scenario. With that in mind,
in this book we’ll use the following definitions:
Test Case: A software program that, when executed, will exercise one or more
facets of the software under test, and then self-verify its actual results against
what is expected.
Scenario: A series of discrete events, performed in a particular order, designed
to generate a specific result.
Test cases are often small in size, though complex ones can become quite large
and may encompass multiple test variations. A scenario is often intended to emu¬
late an expected customer activity or situation. It can consist entirely of issuing a
series of command-driven operator actions, or can include executing a series of
testcases in a particular order. Let’s look at an example that includes both concepts.
'From IEEE Std 610.12-1990. Copyright 1990 IEEE. All rights reserved.
134 Software Testing Techniques: Finding the Defects that Matter
Multithreaded software is capable of running work for many users at the same
time, which often requires serialization techniques to coordinate events. Bugs in
such serialization will typically be exposed only when multiple events are being
More
processed.
Initial Testing
System testing this function involved creating a group of cloned test cases that
would read the same file, then enabling the caching feature for that file and running
the cloned test cases in parallel.
Here we see both the use of test cases designed to exercise the software under test,
and a scenario that includes multiple copies of those test cases running against a
particular file in parallel.
Hore
This simple technique was indeed effective in flushing out bugs in the code, but
eventually a point was reached where the group of applications could run without
error. The next step was to expand the scenario to run multiple groups such as this
in parallel, each against a different file. This also found problems, but again even¬
tually completed without error.
Start with the simplest scenarios to quickly flush out glaring bugs, but don't stop
there.
TIP
different speeds. Also, the applications would normally begin their processing at
staggered time intervals.
Design more robust scenarios based on your understanding of how the software will
be used. For more information on how to gain that understanding, see Chapter 5,
“Where to Start? Snooping for Information.”
Combining these two observations, the team defined another scenario in which
two groups ot test cases were run against a file. The first group was altered to in¬
clude a small, artificial delay after each read. This “slow” group was started first.
Once it was partially through the tile, the next group was started. This second group
had no such artificial delay and read through the file as fast as possible. Before long,
the “fast group caught up to and overtook the slow group. At that moment the
“lead reader,” the actual test case instance whose reads were being cached for the
benefit of the others, changed.
Simple test cases can be viewed as building blocks. They can be combined in dif¬
ferent ways to create fresh scenarios.
TfP
The team found that precisely when this “changing of the leader” occurred, a data
integrity bug hit. In fact, multiple such bugs hit, all caused by very narrow timing win¬
dows. Such windows will normally elude single-user tests or code coverage tools, as
they depend not on a single errant path through the code, but on multiple things oc¬
curring at the same instant across multiple processors on a tightly coupled multi¬
processor system. That is why it is so important to execute such scenarios under
heavy load/stress.
Here a scenario was defined to attack data integrity in an environment of high
concurrency by rearranging the execution sequence of existing test cases. The ap¬
proach worked, and led to the discovery of new and critical bugs. Importantly, the
sequence was not picked at random, but was based on the expected customer usage
of the function.
Earlier we mentioned that one of the key reasons for putting a test plan down on
paper is so it can be reviewed by others. Inexperienced testers often view a test plan
review with trepidation, almost as though they’re handing in a paper to be graded
by a teacher. They fear that others will find flaws in what they’ve come up with; and
of course flaws will be found.
The point of a test plan review is not to provide a venue for others to pat you
136 Software Testing Techniques: Finding the Defects that Matter
on the back for a job well done. The point is to improve your plan. That s why you
should love test plan reviews. Because of their varied experiences, backgrounds,
and insights, reviewers will think of things you did not—things that will help you
find more bugs or avoid wasting your effort. A good review is one in which people
suggest intriguing new scenarios or identify those which you can combine or avoid
altogether.
Good programmers are critical by nature. They are detail-oriented people who,
if they think hard enough, can find a flaw in just about anything. If you send your
test plan out for review and never hear anything back, it doesn t mean people were
impressed by your thoroughness or stunned by your cleverness. It probably means
they never read it.
Reviews can be conducted by either sending the plan out and asking reviewers
to send you comments by a particular date, or by calling a meeting with key re¬
viewers and crawling through the plan one page at a time. The latter approach will
almost always deliver a better review, both because it forces everyone to actually
look at the plan, and also because the verbal interchange often triggers additional
ideas. However, in a busy organization, getting all participants together in the same
room may not be feasible, and you’ll have to resort to the “send it out and hope’
method. In this case, a few phone calls to your most important reviewers to let them
know how much you value their comments can go a long way toward achieving de¬
cent results.
Regardless of the approach used, reviews usually proceed in two phases, inter¬
nal and external. Let’s take a brief look at each.
Internal Reviews
An internal review is the first opportunity you have for others to comment on your
plan. By internal, we mean the review is confined to the tester’s own team. This is
the chance for the entire team to help each other out and synchronize their efforts.
Experienced testers are able to suggest improvements to the plans of their peers, as
well as offer guidance to those with less experience. New testers are able to study the
approaches used by others—and their fresh perspective may generate naive ques¬
tions which can in fact lead to interesting new scenarios. Finally, testers working on
related functions are able to identify and eliminate gaps or overlaps with each
other’s plans and discover areas ripe for collaboration.
External Reviews
Once the internal review is complete, the plan is made available to other groups for
the external review. The software’s designers and developers should be included, as
should any other test teams before or after yours in the process. If practical, the
software’s ultimate end users should be given a chance to offer their insights as well.
Test Plan Focus Areas 137
Information developers who will be writing the software’s documentation will cer¬
tainly have a different perspective on the software than testers or developers, and so
may generate additional scenario ideas. Simply put, your goal is to get your plan in
front of anyone who might be able to make it better.
Depending on your organization’s process, you may also need to send your
plan to one or more approvers to gain final sign off. However, once internal and ex¬
ternal reviews have occurred and you have incorporated the resulting comments
into your plan, this approval cycle is normally a mere formality.
SUMMARY
Test plans serve several important functions for testers. The simple act of creating
one forces the tester to step back, evaluate the project at hand, and develop a set of
strategies spread across a range of focus areas. Once created, the plan becomes a
road map for the tester to follow and a yardstick to measure progress against. It also
provides a vehicle for others to review the tester’s ideas and offer additional sug¬
gestions. But always keep in mind that a test plan is simply a tool to help you find
bugs, not the finish line in a race to see how much paper you can generate.
In Chapter 8 we’ll take a closer look at one of the most interesting, yet often over¬
looked approaches for unearthing bugs in complex software: recoverability testing.
H
O a
O S Testing for Recoverability
In This Chapter
B Attacking a program’s recovery capabilities during FVT
■ Focusing on the entire product view in SVT
■ Expanding the scope in Integration Test
■ An example of clustered server recovery testing
O ne of the authors once found himself working closely with a highly re¬
spected developer on an accelerated project. The developer sat in on the
testing effort so as to be ready to turn around fixes to any defects found on
a moment’s notice. For several weeks, he watched as tests broke his code and forced
it to generate one system dump after another. Finally, one day he sat back, and with
a sheepish grin commented that next time he would write a program’s recovery
code first, because that seemed to be the first thing to get executed.
Software’s ability to gracefully handle or recover from failure is an important
contributor to its robustness. If you accept the premise that all software has bugs,
and that testing will never remove them all, then only by creating self-healing soft¬
ware that anticipates and recovers from errors can any program achieve maximum
reliability.
139
140 Software Testing Techniques: Finding the Defects that Matter
Recovery can also be one of the most interesting test focus areas. How much re¬
covery testing is needed largely depends upon the nature of the target program, as
well as the operating system environment it will operate within.
In the z/OS operating system, virtually every “kernel,” or base control program
(BCP), module is protected by one or more unique, customized recovery routines.
These routines are automatically given control, in hierarchical sequence, if an un¬
expected failure occurs. They then capture program status at the time of failure,
generate system dumps, restore initial state data prior to a retry attempt, and per¬
form other cleanup activities to allow the module to resume operating normally. As
a result, testing any portion of the operating system involves a great deal of error
generation to see if this rich recovery environment is doing its job correctly. The
same is true of testing any supervisor-state application that runs on z/OS and, to a
lesser extent, any problem-state (i.e., user-mode) application.
At the other end of the spectrum are environments in which an application that
fails is simply expected to crash and the entire operating system will need to be re¬
booted before the application can be restarted cleanly. Most software lies some¬
where in between.
Various forms of recovery testing span the FVT, SVT, and integration test dis¬
ciplines. Let’s explore each.
Depending on the situation, there are several different ways in which you can attack
a program’s recovery capabilities during FVT. We’ll look at each of them. But be¬
fore you can check how well a program recovers from an error, you need a way to
generate that error in the first place. Let’s review some options.
Stub Routines
One approach is to borrow a page from the unit tester’s handbook. If you need to
force another module or component to pass bad input into your target software, re¬
place that module with a small stub routine. The stub routine will do little more
Testing for Recoverability 141
than accept incoming requests, then turn around and reply to them in a reasonable
way. However, it will purposely corrupt the one parameter you’re interested in. Al¬
ternatively, rather than replacing a module with a stub you can tamper with the
module itself, altering it to pass back bad data when called by your target software.
These approaches will only work if the module you intend to “stub out” is
called infrequently under conditions which you can externally generate. Ideally, it
would only be called by the module under test. You don’t want to insert a bogus
stub routine that will be invoked millions of times per second for routine tasks by
many other modules in the component. If you do, its identity as an impostor will
quickly be revealed and the software will surely stumble. This stubbing approach
obviously creates an artificial environment, so it’s probably the least desirable
method listed here. But under the right circumstances, it can be useful.
Zapping Tools
Some systems have tools that allow you to find exactly where a particular module
is loaded in memory on a running system, display its memory, and change bytes of
that memory on the fly. This dynamic alteration of memory is called a zap. If you
can’t find such a tool for the system you’re testing on, consider writing your own.
You’ll probably find that creating a crude zapping tool is not a major undertaking.
A zapping tool gives you an easy means to selectively corrupt data. You can also
use it to overlay an instruction within a module with carefully constructed garbage,
so when that instruction is executed it will fail. As with the stub routine case, care
must be used not to meddle in an area that is frequently executed on the running
system, or the volume of errors you’ll generate will be overwhelming. However,
zapping is not nearly as artificial a technique as stub routines. In the right situations
it can be very effective.
counter variable, doubles it, and then exits. The next time the target module tries
to traverse the full queue, it’s in for a surprise.
This is a simple example, but you can imagine other cases where your error in¬
jection program corrupts the contents of a control structure shared by multiple
modules within a component, or performs other nasty deeds. In essence, this is
nothing more than automating the function of a manual zapping tool. But because
the seek-and-destroy program is operating at computer speeds, it can be much
more nimble and precise in its attacks.
Restartability
The most basic recovery option is enabling a program to restart cleanly after a
crash. In FVT, the focus is placed on failures within individual components of the
overall product. You’ll often need to trick a component into crashing. You can do
this in a virtualized environment by setting a breakpoint at some specific location
in its code. When the breakpoint hits you can insert carefully corrupted data, set the
system’s next instruction pointer to the address of an invalid instruction, or zap the
component’s code itself to overlay a valid instruction with some sort of garbage
that’s not executable. You then resume the program after the breakpoint, watch it
fail, and ensure it generates the appropriate failure messages, log entries, dump
codes, etc. If it has robust recovery support, it may be able to resume processing as
if nothing had happened. If not, it may force the entire product to terminate.
Testing for Recoverability 143
It the program terminates, you can then restart it and determine if it restarts
successfully and is able to process new work (or resume old work, depending on its
nature). If you resorted to zapping the component’s code with garbage to force it
to crash, and that code remains resident in memory, then you’ll need to repair the
overlay prior to restarting the program (or it will just fail again).
The objective here is similar to FVT, namely to wreak controlled havoc and see how
the software responds. But in SVT, the focus shifts from a narrow, component-level
view to an entire product view. It also folds load/stress into the picture. This is crit¬
ical, because it’s common for recovery processing to work perfectly on an unloaded
system, only to collapse when the system is under heavy stress.
Restartability
In SVT, there are two aspects to restartability: program crash and system crash. For
program crash, because you’re operating at an end-user level in which techniques
such as setting breakpoints are not applicable, there must be an external way to
cause the program to fail. Such external means could include bad input, memory
shortages, or a system operator command designed to force the program to termi¬
nate fast and hard. Alternatively, input from testers during the software’s design
might have led to the inclusion of special testability features that can aid with error
injection. The next chapter explores this idea in more detail.
An advantage to using external means to crash the program is that you are able
to send normal work to the program so it is busy doing something at the time you
force the crash. Programs that die with many active, in-flight tasks tend to have
more problems cleanly restarting than idle ones do, so you’re more likely to find a
bug this way.
The system crash case is similar, except that any recovery code intended to
clean up files or other resources before the program terminates won’t have a chance
Testing for Recoverability 145
to execute. The approach here should be to get the program busy processing some
work, and then kill the entire system. The simplest way to kill the system is simply
to power it off. Some operating systems, such as z/OS, provide a debugging aid that
allows a user to request that a particular action be taken when some event occurs on
a live system. That event could be the crash of a given program, the invocation of a
particular module, or even the execution of a specific line of code. The action could
be to force a memory dump, write a record to a log, or even freeze the entire sys¬
tem. In z/OS, this is called setting a trap for the software. If such support is available,
then another way to kill the system is to set a trap for the invocation of a common
operating system function (such as the dispatcher), which when sprung will take
the action of stopping the system immediately so you can reboot it from there.
After the system reboot, restart the application and check for anomalies that
may indicate a recovery problem by watching for messages it issues, log entries it
creates, or any other information it generates as it comes back up. Then send some
work to the program and ensure it executes it properly and any data it manipulates
is still intact. Restartability is the most basic of recovery tests but, if carefully done,
will often unearth a surprising number of defects.
Environmental Failures
Depending on the nature of the software under test, it may need to cope with fail¬
ures in the underlying environment. In the case of operating systems, this usually
means failure of hardware components (e.g., disk drives, network adapters, pe¬
ripherals). For middleware and applications, it usually means the failure of services
that the operating system provides based on those hardware components. What
happens if a file system the application is using fills up, or the disk fails? What if a
146 Software Testing Techniques: Finding the Defects that Matter
Natural Failures
During the course of normal load/stress or longevity runs, the software being tested
will almost surely fail on its own, with no help from the tester. Rather than cursing
these spontaneous, natural errors, take advantage of them. Don’t look only at the
failure itself; also examine how the program dealt with it. Monitor recovery pro¬
cessing to see how the software responds to noncontrived failures.
INTEGRATION TEST
Because integration test introduces the new software into a fully populated envi¬
ronment, it expands the scope of possible recovery testing to include interactions
with other software.
customer accounts left hanging, only half completed? What does the end user see
while this is happening? What happens when the TM restarts? Is the banking ap¬
plication able to pick up where it left off, or must it be restarted as well in order to
reconnect with the TM? Now, what if the TM stays up, but the database that it is
updating crashes? Or what if the TCP/IP link between the application and the TM
goes down?
Integration testers should examine the entire software food chain within which
their application operates, and create scenarios to crash, hang, or otherwise disrupt
every single element in that chain. This kind of testing presents a number of inter¬
esting opportunities—it also happens to be a lot of fun. But most importantly,
when done well it yields the kind of robust and reliable solution that software users
in real-world production environments demand.
Environmental Failure
This is similar to what was described for SVT, but broader. In the SVT case, the im¬
pact of environmental failures is narrowly focused on the specific software under
test. But during integration test, the focus broadens to include an entire software
ecosystem operating and sharing resources. A failure in any one of those resources
will force multiple programs to deal with that loss simultaneously. Think of the
chaos on the floor of the New York Stock Exchange moments after a company has
announced it is under investigation for fraud; traders shout to be heard as they all
clamber to unload shares. It’s the same sort of chaos when multiple programs
scramble to recover at once—so controlled chaos is the tester’s friend.
Multiple IBM mainframes can be clustered together into a loosely coupled Parallel
Sysplex® that employs a shared storage medium known as a coupling facility (CF).
In addition to one or more CFs, a Parallel Sysplex employs a common time source,
software support from the z/OS operating system, and exploiting subsystems and
middleware. Unique technology enables all server nodes to concurrently read and
write directly to shared disks with complete data integrity, without the need for data
mirroring or similar techniques—and to do so with near linear scalability [Nick97].
It also enables the configuration of an environment that eliminates all single points
of failure. A view of these elements is depicted in Figure 8.1.
You can think of a CF as being similar to a shared disk, but one that can be ac¬
cessed at near memory speeds, i.e., much faster than at I/O speeds.
mots
148 Software Testing Techniques: Finding the Defects that Matter
The initial release of Parallel Sysplex included the convergence of new function
in many elements at once, including hardware, the operating system, a data base
manager, a distributed lock manager, a transaction monitor, a telecommunications
access manager, a security manager, and more. It was a classic example of the need
for an integration test, and so one was arranged. Let’s take a look at one aspect of
that test, namely recovery testing involving coupling facility structure rebuild.
Execute recovery testing on busy systems to challenge the software’s survival capa¬
bilities under production-like conditions.
Failures can often be simulated by simply powering off or disconnecting hard¬
T/P
ware. By understanding how a hardware failure is surfaced to the software, you
can often find a simple, nondestructive way to simulate it.
As you might imagine, this recovery processing was complex, and trying to per¬
form it amid a flood of new incoming work challenged all components involved.
Several bugs were found in both the operating system services and the middleware
that used them, along with an occasional bug in the underlying hardware. Once
these bugs were fixed and that first exploiter could complete the rebuild process
successfully, the scenario was repeated for the next one. That scenario unearthed
new bugs, they were fixed, the next exploiter was tried, and so on. Finally, a point
was reached where each exploiter could consistently recover from the loss of a CF
on a fully loaded cluster.
When testing software recovery in complex systems, it helps to start simply by first
isolating failures to individual components or functions.
T/P
Combined Recovery
It wasn’t time to celebrate yet. Having tried each end-to-end scenario individually,
they were now smashed together. The structures for all previously tested software
products were allocated together on the same CF. A workload was again started up
150 Software Testing Techniques: Finding the Defects that Matter
and allowed to stabilize. Then, that CF was killed. This forced all products to un¬
dergo their rebuild processing simultaneously.
Even though individually each of these products could reliably rebuild under
load/stress, the controlled chaos of this big smash scenario unearthed a whole new
set of timing and serialization bugs. This environmental failure scenario is a great
example of the value of an integration test that hits a combination of relevant soft¬
ware elements, rather than just looking at each in a vacuum.
SUMMARY
In This Chapter
■ How much can we squeeze the schedule?
■ Leveraging entry and exit criteria
* Using testability to make validating software easier
■ A Case Study: overcoming a seemingly impossible testing situation
A s German Field Marshal Helmuth von Moltke once said, no plan survives
first contact with the enemy. When the initial test plans are developed,
everyone has the best intentions to identify all issues, problems, risks, ac¬
tions, dependencies, and assumptions. But, somehow “Murphy” usually shows up
to toss in the proverbial monkey wrench.
Significant software development projects will not progress from beginning to
end without issues. In fact, some won’t even have started before something goes
wrong. By their very nature, testers also cause trouble. That’s their job. The combi¬
nation of the inevitable problems that will occur over time and trouble caused by
testers, must be considered during planning. Following are suggestions to help the
test team plan for all of these unknowns.
151
152 Software Testing Techniques: Finding the Defects that Matter
SCHEDULING
Software development teams often miss their deadlines, and their code rolls into
test later than expected. Unfortunately, testers face an immovable end date, General
Availability (GA), when the product becomes generally available to the buying pub¬
lic. Sometimes there is even pressure to pull in the GA date to address competitive
“time-to-market” needs. Testers get squeezed on both ends.
Fighting these real pressures to get the product to market while still ensuring
high quality is the goal of the test team. A tester’s responsibility is to be the cus¬
tomer’s advocate. Will a customer want a product if it doesn’t work, no matter how
early they can get their hands on it? A balanced approach is needed. Yet, the squeeze
on the schedule is reality. Your test plan must address this accordingly.
Contingency
The dictionary defines contingency as “a possible or chance event.” With all of the
potential for trouble, the likelihood of such events occurring during the test phase
is extremely high. The components of a testing schedule include building the test
approach and strategy, defining the scenarios that will form the foundation of the
test, setting up and preparing, executing the test scenarios, identifying problems,
and resolving those problems. So, how can you construct a timeline that accounts
for the many contingencies inherent in these schedule components?
is often acceptable for simple and well-understood projects. This means that after
sizing the effort, add one-half of the schedule length to that to arrive at the total
schedule length. In complex projects, a contingency factor of 100% is common. De¬
termine your effort and schedule, and double it. That’s likely to tell you when the
test will complete and what time it will require.
Techniques
Sizing a large test as a single entity can be overwhelming. It’s both easier and more
precise to use a divide-and-conquer strategy. Let’s look at a couple of different
techniques.
cerns. Not all of the problems will necessarily result in actual code defects. Some
may turn into enhancements to the software’s serviceability, availability, or usabil¬
ity in other words, design changes. You must allow time for problem discovery,
problem recreation, and fix verification. How much time depends on the problem
history of the changed areas. When testing new software, its size and complexity are
clues to the likely problem volume. Combine this appraisal with a realistic assess¬
ment of development’s turnaround time for fixing defects, and you can estimate the
expected delays and an appropriate contingency factor.
Incidentally, the reality of such delays is as true for “quick” tests as for major
ones. When challenged on a test schedule or sizing, you may be asked how long it
would take to do if you found no problems. The answer is easy, zero days—because
if your schedule is only valid if you find no problems, then why even bother doing
the test?A good test execution contingency factor anticipates these and other un¬
foreseen problems and folds them into your schedule.
Sizing by Methodology
Methodology contingency can be defined as a cushion of time or resource added to the
schedule based on the test phase or the execution environment. We can further sub¬
divide the sizing of the test phase by breaking down the effort into self-defined, man¬
ageable chunks based on their focus area or methodology. For example, the test can
be classified by looking at regression, migration, mainline, and recovery testing. In ad¬
dition, the test team might choose to classify the effort based on environmental re¬
quirements such as complex software and special hardware with which the software
must interoperate. In either case, the formula is simple. Size each of the chunks, add
contingency to each one, and sum them up. The initial sizing of the chunks assumes
things go fairly smoothly, and the contingency factor accounts for reality.
Picking appropriate contingency values relies on the same factors mentioned
earlier for the test execution stage. Problem history, complexity, and any existing
quality issues also apply.
Scheduling Snafus
Ultimately, contingency factors are an educated guess. They are a simple but effective
way of factoring time into your schedule for the snafus that typically occur. Resist the
urge to reduce or ignore contingency values when putting together a schedule for a
156 Software Testing Techniques: Finding the Defects that Matter
project with tight deadlines. Later, no one will remember that you were a hero for
coming up with an aggressive schedule; they’ll only remember if you met it or not.
Contingency factors alone are not enough to completely protect your ability to
deliver high-quality code on time to the marketplace. You also need to know when
it is time to start and when your test is complete.
On the surface, establishing entry and exit criteria is both an easy concept to grasp
and a very well-known practice in software engineering. However, without proper
attention to the creation, measurement, and response to these criteria, they can slip
from being a powerful tool for testers into a weak set of paragraphs in a test plan.
There are good ways to use entry and exit criteria and there are less-effective
ways. Let’s take a look at strong approaches and also identify some common mistakes.
* Ensure the test can execute with efficiency and effectiveness, and proceed with
a manageable amount of risk
m Properly focus attention on the most important activities at the correct time
■ Guide decisions on the next set of actions
■ Influence what actions others should take
■ Determine when a test effort is complete
It is a good idea to consider all these things regardless of the development process
in use. Serial test phases are not required for effectiveness. In fact, it can be argued that
the establishment of criteria is more important with parallel or overlapped efforts,
since its use can help the teams avoid running into duplicate problems.
Meaningful
Every criterion documented in a plan should be placed there for a specific reason.
The effect ot not meeting each one should be obvious. Such criteria will help make
doing a risk assessment easier for both a technical team leader and a project or
product manager.
Measurable
A criterion that can not easily be measured is not useful. Neither an entry nor an
exit criterion should be subjective; it should be calculable using empirical data.
Any discussion surrounding the status of a missed criterion should be based on
risks and action plans, rather than mired in debates over whether or not it has been
met. The current status of a good criterion is indisputable by its definition.
Achievable
Exit and entry criteria are useful only if they are actually achievable. This may seem
obvious, but documenting a criterion that is never met in test is a fairly common
mistake, and only dilutes the effectiveness of the entire set of criteria.
Discrete
Each criterion should be defined at a level that will encourage appropriate actions
by the people or teams that can respond if the criterion is not met. Combining too
many items into one criterion makes it more difficult to determine what is best to
do and by whom.
Mutual Agreement
Entry criterion should be the result of discussions among the teams involved in meet¬
ing it, measuring it, and depending on it. When a criterion is in jeopardy of not being
met on time, it should not be a surprise to the responsible teams. It is a good practice
for a supplier to include meeting their customer’s criterion in their own list of mile¬
stones. Exit criteria should be reviewed by the managers of the entire project and they
must be in agreement that they signal the completion of the test.
Avoid Ambiguity
So would a good entry criterion for SVT be “the FVT of widget 1, widget2, and wid-
get3 is 75% complete”? This criterion does have the attributes of being measurable
and achievable (though whether it is discrete or meaningful to the SVT team can be
debated). However, it’s ambiguous. Does each of the individual function tests need
to be 75% complete or does the aggregate of all three need to be 75% complete?
Since the criterion is ambiguous, assumptions will be made. Each of the FVT
teams might assume that they can rely on the other FVT teams to be more than
75% complete—each assuming their unfinished work can be the bulk of the other
25%. Finding this out at the start of the SVT is too late and can seriously jeopardize
the successful beginnings of the SVT.
Document Specifics
Also, what is included in the 75% complete? What is acceptable if it is not finished?
The criterion can be measured and met, yet still not satisfy the real requirement.
There is no correlation between the specific plans of the SVT team at their start and
the appropriate completed FVT items. What are the initial activities of the SVT and
how do they map to the FVT plans? These are the criteria that should be investi¬
gated, agreed upon, and documented by the SVT team. Good criteria cannot be de¬
veloped by staring at a product development timeline; they are created through
detailed discussions among all involved teams. In this example, the SVT team
should sit down with the three FVT teams and agree on which items are most im¬
portant to be completed prior to the start of system test.
In this example, a good approach to defining SVT entry criteria would be to
identify exactly which pieces of the FVT test plans address items whose stability is
critical to enabling SVT to begin. Then state that the requirement is for those pieces
to be 100% complete. It is also important to state these criteria in terms that are un¬
derstood and tracked by the FVT teams. For example, “Mainline variations num¬
bered widget 1.100 to widget 1.399, inclusive; and widget 1.422 for line item widget 1
must be 100% exposed and successful.” The FVT team for line item widget 1 is now
specifically aware of what is being asked of them. The ambiguity is gone, it is mea¬
surable, achievable, and, assuming the SVT plan is really dependent on this crite¬
rion being met, meaningful.
fort. So the test can proceed with some risk even if a criterion or two are not met.
However, understanding the impacts of multiple, disparate criteria on large work
efforts can be difficult.
One approach to addressing this difficulty is to assign a risk assessment value to
each entry criterion. Write down everything you would require to be ready at the be¬
ginning of your test that would allow you to proceed with no risk. Then assign a
value, from one to five, to each criterion based on your understanding of the risk it
would add to your test it not met. The value “ 1 ” is assigned to criteria that would add
minimal risk to test progress if not achieved on time. An example would be a small
new function that you could easily avoid early in your test but could reasonably fin¬
ish on time even if test exposure was delayed. A value of “5” is assigned to criteria
that add very' significant risk, such as a critical, mainline, initialization function that
is required to perform all or almost all of your tests. Assign the other values to crite¬
ria somewhere in the middle based on the level of risk deemed appropriate. The total
risk is the sum of all assessment values. Figure 9.1 shows a sample table.
Criterion A 1
Criterion B 1
Criterion C 1
Criterion D 1
Criterion E 2
Criterion F 2
Criterion G 3
Criterion H 3
Criterion 1 3
Criterion J 4
Criterion K 5
Total 26
FIGURE 9.1 Risk assessment table.
160 Software Testing Techniques: Finding the Defects that Matter
In the sample table, the total risk assessment is 26. If criteria A-K are all satis¬
fied, your risk is 0 and you can certainly start your testing. But this will rarely hap¬
pen in the real world, so you’ll want to be more aggressive than that. You may wish
to begin your test even if the risk assessment value is not 0, but not if it is 26, mean¬
ing completely blocked from making any progress. In this example, assume you are
willing to start as long as the risk assessment is under 18. Prior to your start date,
add up the risk assessment values of the criteria not yet met. If the sum is 18 or
above, hold off on starting the test and instead use your time, effort, and resources
to help meet the criteria with the highest risk assessments first. This helps ensure the
team is working on the most important activities.
For successful products that have multiple releases, the values can be honed
over time so the total risk assessment is based on historical data rather than guess¬
work. This simple approach can easily show the cumulative impact of multiple cri¬
terion not being met, and is especially helpful to an SVT team responsible for
testing a large product made up of many components and functions. Each devel¬
oper or FVT team may not understand why their “little piece left to do” can have
such a negative impact or cause risk to be added to the SVT plan. By using risk as¬
sessment values, it becomes clear where the little piece fits into the big picture. It
can also greatly aid in determining which recovery actions would have the biggest
impact on reducing risk.
met or end prior to achieving their exit criteria. This, in and of itself, is neither good
nor bad—it is how the situation is managed that matters most.
Proper Focus
In the real world, entry and exit criteria can be used as guides for deciding where
valuable resources, human and otherwise, should be placed. Good criteria are a tool
to ensure proper focus and attention is applied to the most important activities at
the correct time. Criteria and their measured status are not a line in the sand that
halts the project’s progress when they are not met. Rather, they become a starting
point for discussions on what is best to do, by whom, and when.
Getting Attention
Meeting the objectives of a test can be partly dependent on the actions of others. An
example is the turnaround of fixes to problems opened by the test team. Even
though the business may dictate that a product must ship on schedule even if the
test is not complete, a good set of exit criteria can assist in getting development to
address, with a sense of urgency, key open problems. This is why it is a good idea to
tie exit criteria to something that many people care about. For an SVT, it may be
helpful to associate meeting exit criteria with the product ship decision. Having
such an association will ease the task of obtaining outside help on the most impor¬
tant items.
We have reviewed areas that derail the test team’s success; now let’s look at a tech¬
nique that can keep it on track. Testability can be injected into the software to as¬
sist test in validating the product. Gupta and Sinha define this type of testability as
controllability measures. These techniques are used to enable the software to attain
162 Software Testing Techniques: Finding the Defects that Matter
states required for execution of tests that are difficult to achieve through its normal
user interface [Gupta94].
In environments where creating errors and complex conditions is difficult,
these testability measures should be explored. “How easy will it be for our testers to
verify and validate this software solution?” is a question that developers and de¬
signers should be asking themselves [Loveland02]. This question should be part of
any software development checklist.
Testability hooks can be defined as those functions integrated in the software
that can be invoked through primarily undocumented interfaces to drive specific
processing which would otherwise be difficult to exercise.
The FVT team can drive these recovery scenarios by setting breakpoints and
manipulating data, but with minimum amounts of system stress. Within the SVT
environment, there is no external means for forcing the error condition that would
trigger this recovery action. In fact, the SVT team could only hope these recovery
Planning for Trouble 163
situations occur through normal execution. This combination is certainly not pro¬
viding sufficient test coverage.
We can use a testability hook to create the looped queue condition. The tar¬
geted component also has a set of external commands. The test team suggests the
introduction of undocumented keywords in the syntax of a command to invoke a
new testability hook’s processing. To drive a looped queue condition, the tester
specifies the undocumented keyword on the command, telling the software to cre¬
ate a looped queue. When the component processes the command, the component
injects the failure condition. “BOOM”—recovery in control!
Test’s early involvement in the design and implementation phases of a project
can foster the integration and use of testability function. This technique can be a
key risk-mitigation action for the test team. Most testability hooks are not docu¬
mented to external customers. However, at least one software testability hook was
so successful it was made available to IBM customers so that they could also bene¬
fit from its exceptional capability. Let’s see how.
We’ve noted earlier that multiple IBM mainframes can be clustered together into a
loosely coupled Parallel Sysplex that employs a shared storage medium known as a
coupling facility (CF). In the event of a hardware failure in the CF, software compo¬
nents using it can work together with the operating system to rebuild its structures
on the fly to an alternate CF. In Chapter 8, “Testing for Recoverability,” we looked
at an approach for simulating the failure of an entire CF by powering it down. That’s
a good scenario, but let’s take it one step further. The software provides support for
reacting to the failure of any individual CF structure by rebuilding it while other
structures on that CF continue operating normally. Shutting off the CF would not
generate this failure scenario. The question the test team faced was how could they
simulate the failure of just one structure within a CF?
setting trace stops, modifying actual object code, etc.). The CF error injection tool is
the epitome of a software testability hook.
.x, If faced with a seemingly impossible testing situation, adding a testability hook to
the software itself could save the day. If included in the final product, the hook
W could be used for future tests as well. By getting involved early in the development
T,p cycle you can suggest possible testability approaches.
A failed structure can be thought of as a corrupted file that exists but cannot be
accessed.
More
For example, a system tester could start a heavy transactional load/stress run
that pounded on a relational database product across multiple discrete systems, all
sharing a database on the same physical disk. The database software used coupling
facility structures to coordinate this multisystem activity. Once the load had stabi¬
lized and the coupling facility was quite busy, the tester struck. Fie invoked the test
hook, targeting the specific relational database’s structure, and requested that a
failure be injected against that structure. On receipt of the request, the coupling fa¬
cility support code stored this request aside to handle on the next logical operation
to the targeted coupling facility structure. When the next operation against the
structure arrived, the coupling facility noticed the pending failure injection from
the prior request and returned a structure failure indication [Loveland02]. The re¬
lational database software was notified of the failure and invoked its recovery pro¬
cessing, which rebuilt the “failed” structure on another CF. Refer to Figure 9.3 for
a flow of this processing.
The tester could create a script to initiate the error-injection tool. The main pa¬
rameter of the tool is the specification of the target CF structure name. In addition,
and if required, another optional parameter was available that specified whether an
old or new instance of the structure was to be targeted for the failure. This pa¬
rameter was used in CF structure recovery cases.
Planning for Trouble 165
With built-in testability functions, using a simple script can often be enough to eas¬
ily force complex situations that would otherwise be difficult to generate.
mots
Identifying and implementing testability hooks early in the design is a team effort.
Be sure to exploit everyone’s expertise.
This program has become an important part of the testing of the z/OS platform,
and has even been released for use by customers and other vendors to help create re¬
covery scenarios to test their applications and product solutions. Each time new
software that makes use of the coupling technology enters test, the use of the CF
error-injection tool is considered. Feedback received from customers and users has
166 Software Testing Techniques: Finding the Defects that Matter
been very positive and they have found it critical to the verification and validation of
their data sharing environments on IBM zSeries® mainframes [Loveland02].
The CF error-injection tool’s simplicity and effectiveness have allowed the z/OS
test team to ensure the continuous availability and recoverability of the zSeries
coupling technology. With its externalization to customers and other vendors, IBM
has provided not only itself, but also its user community with a crafty tool for val¬
idating critical components of the operating system and critical applications.
SUMMARY
ou now understand the testing phases, where each fits into your organiza¬
tion’s development model, and what sort of defects they should be targeting.
-A- You have researched the software you have been asked to test, broken its com¬
plexity down into manageable chunks, and perhaps even recommended modifica¬
tions to make its testing more efficient. You have in hand a thorough, well-reviewed
plan, a schedule with adequate contingency, and crisp entry/exit criteria. Are you
ready to dive into the test? Not so fast.
A chef can’t assemble a culinary masterpiece until the meat is trimmed, the veg¬
etables chopped, and the seasonings measured. A biologist can’t make a life¬
extending discovery unless his specimen slides are ready and the microscope is
working. An army commando can’t leap into a gunfight without weapons that fire.
There’s a common principle that underlies most professions, and it’s certainly true
for software testing: preparation is key.
Before you can begin testing, you’ll need to prepare your implements of de¬
struction: test cases and test tools. The next four chapters will show you how it’s
done. You will see how to use old testing tools and programs to find new bugs, and
how a change in context can open the door for taking test cases from one test phase
and reusing them in another. You will learn techniques for developing new test
programs, and how the rules for such programs vary from FVT to SVT. You will
discover why normal testing will often miss data corruption problems, and how to
create specialized data integrity monitors to fill the gap. A broad range of tools
available to software testers will be surveyed, along with pointers on when it’s bet¬
ter to buy a tool or build one yourself. Finally, the pros and cons of porting real cus¬
tomer applications and environments into the test lab will be discussed, including
tips on how to do it right.
So sharpen your knives, focus your microscope, and oil your weapons. Let’s get
ready to test.
A
■
In This Chapter
a The reuse of test cases between test phases
■ Test case reuse opportunities and techniques
■ A Case Study: testing real memory management
T he tester stands beside a whirring projector. A field of eyes studies him from
around the stuffy conference room. He scratches his forearm, and then clears
his throat. “But, uh, we have hundreds of test cases to code. It takes time. We
need—”
“I don’t have more people to give you. But meeting that accelerated delivery
date is critical to our business. You’ll have to find another way.”
Testers never feel they have enough time to do all they would like. At the same
time, project managers under pressure to pull in their delivery dates are always eye¬
ing that big chunk of time allocated to test and dreaming of trimming it some
more. Unfortunately, if the developers’ schedule for delivering code slides out, the
end date usually doesn’t; it’s the testers who get squeezed. But testers themselves
have a lot of code to write.
169
170 Software Testing Techniques: Finding the Defects that Matter
Sure, your buddies in development might like to brag about how much code they
pounded out over the weekend. But who really writes more code, developers or
testers? In some organizations, testers write up to three times more. For example, if
a developer writes one module that can be invoked through a macro with a dozen
options, the tester might write several dozen test cases against it all macro options
must be tried individually and in multiple combinations, erroneous invocations
and failure situations must be attempted, and devious manipulations of the macro
aimed at bypassing its security must be hacked up.
That’s a lot of test cases. They won’t be trivial in size, either, since to use the
macro in a realistic way, each test case must go to the trouble of establishing a valid
environment for invoking and using the macro’s services. “Environment’ in this
context might include running in a specially authorized mode, obtaining required
serialization (e.g., locks, mutexes), establishing an appropriate recovery context, or
even gaining addressability to other processes or address spaces. Test cases proba¬
bly won’t be written with the same rigor as a developer’s module (no one will be
trying to hack the test cases for security holes, after all), and many will be near du¬
plicates of one another, perhaps taking advantage of object-oriented techniques to
minimize redundant coding. But the point isn’t to debate whether developers or
testers are more productive; it’s rather to highlight the often overlooked reality that
testers have a lot of coding to do. That reality cannot be eliminated, but its impact
can be reduced through the magic of reuse.
Both function testers and system testers write test cases, although function testers
probably write a great deal more due to the detailed nature of their testing. Both write
test tools as well. All are candidates for reuse, though the approach for each may vary.
realm of SVT. It could be used as the basis for an acceptance test at the beginning
of SVT, or as a source ot background stress against which other tests are executed.
But what if the software under test is brand new?
For new software, the SVT team won’t have any old SVT test cases to use for ac¬
ceptance testing, and they’ll be challenged to write new ones fast enough to create
a meaningful load/stress bucket in time to meet the team’s needs. But there is a way
the SVT team can jump start the creation of a good regression bucket.
reusing FVT test cases is sufficient for a thorough SVT. But doing so is an excellent
starting point and, if done properly, is almost free.
reset that record to its value prior to the update—and do so within the same unit
of recovery context. It the test case deletes a record, it must re-add it before end¬
ing. In this way, the data does not age; all records of a particular type do not
eventually get deleted, nor does the number of records within the database grow.
Yet, the required functions of the database system have been exercised.
Creating test cases along these lines is not difficult, as long as you’re aware of
the rules when you begin. So if the SVT team wishes to reuse test cases from FVT,
it’s important that they share these guidelines with the FVT team before FVT starts.
Not all test cases will be able to conform to these standards, but a big chunk of them
will. The ones that do can potentially live forever in automated regression buckets,
earning their keep through reuse day after day.
Tool Reuse
Don’t forget about test tools when you’re searching for reuse opportunities. By
their nature, tools are intended to be used over and over again, so you may not view
simply running them as a form of reuse. However, using an old tool to meet a new
need certainly qualifies.
Homegrown Tools
As noted in Chapter 13, testers often build many of their own tools. Some of these
can be quite elaborate. Due to the perennially tight schedules that testers face, there
is often pressure to develop such tools as quickly as possible, following a design that
only addresses the current function being tested. Such an approach is shortsighted.
By focusing on modular design techniques from the start, large tools can easily be
extended in the future to meet new needs. Such reuse will not only increase the re¬
turn on investment for the tool over time, but will shorten the time required for fu¬
ture test preparation.
In addition to elaborate tools, testers also create tiny, niche tools to meet very
specific needs. These are often created by a single tester for his immediate task, and
others on the team may not even be aware of it. They are often written as “throw¬
away” code, with no thought toward future extension, which is fine for this sort of
tool. But it’s surprising how often such tools can actually be extended, rewritten, or
used as a base for a future tool. The memory thrasher described in Chapter 12,
“Data Corruption,” is an example of just such a tool.
Niche tool reuse can have a tremendous impact on the test team’s efficiency
and effectiveness, but there’s one trick: everyone has to know that such tools exist
before they can consider reusing them. If all testers are in a single department, this
might be easy to do. If they are spread out across a development organization, it can
be more difficult. Maintaining an informal, online list of little tools people have
174 Software Testing Techniques: Finding the Defects that Matter
written is one way to share knowledge, although the challenge with such lists is
keeping them current. Linking the upkeep of a tools list to some regularly occurring
activity, such as a test postmortem (see Chapter 20, “The Testing End Game ’), can
help. But whatever you do, don’t create an environment in which tracking the de¬
velopment of niche tools is so onerous that it discourages people from writing them
in the first place.
One of the major tasks of an operating system is to manage memory. Most handle
this by viewing memory from two different perspectives: real and virtual. A com¬
puter’s addressing range may be larger than the amount of physical memory it has
installed. For example, a 32-bit system can address four gigabytes of memory, al¬
though it may only have one gigabyte installed. Yet the operating system, working
in conjunction with the processor, can make it appear to all applications as though
they really have four gigabytes available. This is done by creating a four-gigabyte
virtual memory space, and mapping the portions of that space that are actually in
use to the computer’s installed, or real, memory. When real memory becomes
scarce, chunks of it are freed up by moving them out to a page or swap file on a hard
disk. The next time a program tries to access one of those chunks, they are copied
back into real memory from disk. All of this is managed by the operating system,
making it transparent to the application program.
The z/OS operating system has a component called the Real Storage Manager
(RSM) that keeps track of all of these chunks of real storage, known as pages. The
RSM uses an extensive system of queues to track pages that are in various states of
usage, and a sophisticated locking hierarchy to serialize the movement of pages be¬
tween those queues across multiple CPUs without getting confused. It’s a complex
component at the very core of the operating system.
At one point, IBM mainframe architecture evolved to, among other things,
provide additional virtual storage addressing. It enabled programs to access multi¬
ple virtual memory spaces all at once with a full range of processor instructions
under hardware-enforced access control [Scalzi89]. The RSM underwent signifi¬
cant changes to support these enhancements.
Whenever an existing component is updated, you have the opportunity to use both
tp old and new test cases to broaden your testing coverage efficiently.
The FVT for this support included creating a huge number of test cases, which
inevitably uncovered some bugs. Since a large percentage of those test cases met the
standards for being streamable, the SVT team was able to reuse them. They did so
in multiple ways. ’ 7
The Magic of Reuse 175
Acceptance Testing
First, the SVT team performed an acceptance test using its existing regression test
case bucket, without the new RSM test cases included. This bucket included hun¬
dreds of test cases from past tests, covering virtually every component of the oper¬
ating system. Given the core nature of the RSM to overall system operation,
changes there had the potential to impact base system stability, so it was important
to establish a stability baseline. This testing actually went remarkably smoothly, so
the focus quickly shifted to testing the new functions.
When testing enhancements to existing software, start by reusing old test cases to
see if prior function has been broken.
Take advantage of existing test cases and tools to exercise new functions. Updating
old test cases can be quicker and easier than generating new ones.
TIP
176 Software Testing Techniques: Finding the Defects that Matter
Another way to reuse test cases for multithreaded software is to selectively combine
them into a high load/stress workload that is intensely focused on the new functions
under test.
VP
Epilogue
This SVT team reused old test cases, reused new test cases from a prior phase,
reused existing test tools against a new function, and extended those tools to tackle
the updated software. Plus, the team mixed things in different combinations. Taken
together, these approaches exposed multiple bugs that would have eluded detection
had only a single technique been used, and the extensive reuse kept preparation
costs to a minimum.
SUMMARY
Testers write a great deal of code, sometimes significantly more than the code
they’re testing. Finding ways to reuse that code makes a lot of sense. Test cases writ¬
ten to be streamable can be reused within and between test phases. Old test tools
can be used to address new problems, or adapted to changing needs. When start¬
ing a new test, the savvy test team always looks for opportunities to reuse the out¬
put of prior tests to save themselves time and effort—and help them find more bugs
in the process.
But while reuse can jump start a test team’s efforts, it’s rarely enough by itself.
New test cases usually need to be developed. In Chapter 11, “Developing Good Test
Programs,” we’ll look at the characteristics of a good test case.
Developing Good
Test Programs
In This Chapter
W hen it comes to fulfilling your test program needs, reuse is great but it
will only take you so far. Eventually, you’ll have to write some code. Un¬
fortunately, testers are often given more guidance on how to track status
for test programs than on how to create them in the first place.
In Chapter 7, “Test Plan Focus Areas,” we noted that for large, complex pro¬
jects, we equate a test case with a test program, so we’ll use the two terms inter¬
changeably. Why is it that while testers sometimes write far more code than
developers, they are rarely trained to write effective test cases? Perhaps it’s assumed
that standard software engineering practices for good code development are ap¬
plicable to writing test software as well. To some extent that’s true. But effective test
programs also have unique requirements which must be addressed, and those re¬
quirements vary somewhat across test phases. Let’s examine the characteristics of
good FVT and SVT test cases.
177
178 Software Testing Techniques: Finding the Defects that Matter
In the previous chapter, we saw six criteria that FVT test cases must meet in order to
be folded into an automated test stream: self-checking, debuggable, well-behaved,
autonomous, restartable, and self-cleaning. Regardless of whether a test case will
ever be a candidate for streaming, there are more fundamental things to consider.
Coding good FVT test cases requires that you adhere to practices that are recom¬
mended for writing any software, but with test-specific twists. There are also coding
practices specific to the test world. Let’s look at both types.
Standard Practices
Testers sometimes treat the test cases they write as throwaway code that s more im¬
portant to do quickly than well. When a test case will only be used a single time, this
might be appropriate. But for those that can be reused or adapted for current or fu¬
ture tests, taking a quick-and-dirty approach is shortsighted. Spending a little more
time up front to create robust, well-written test programs converts each one from
a necessary evil into an asset for the entire test team. All that is needed is to follow
some common programming guidelines.
The first time you execute a new test case in FVT it will likely be the only thing
running on the system. It might be tempting to take advantage of that fact and not
bother with all of this coordination. However, that would be a mistake. Later when
you want to include the test case in a regression suite alongside hundreds of others,
you’ll regret it.
Test-oriented Practices
Most programs are written to accomplish a task, such as producing useful output or
manipulating data. Test cases, on the other hand, are intended to determine if another
program’s task can be successfully accomplished. Kaner describes this as a test case
being the question you ask of a program [Kaner03]. The most straightforward ques¬
tion to ask is whether the target program correctly does what it’s supposed to do. It
can also be interesting to ask the opposite question: are there ways the program can
fail? Brainstorm a list of areas at risk for possible failure and then write test cases that
generate circumstances aimed at forcing those failures. This approach falls into a cat¬
egory known as risk-based testing. Bach suggests framing the risk question around vul¬
nerabilities the program might have, threats (inputs or other triggers) that might
exploit those weaknesses, and potential victims of the resulting failures [Bach99].
Another way to interrogate the target component is to consider its possible states
and then write test cases that force different changes to those states. In practice, how¬
ever, for even moderately complex software, the number of states will quickly ex¬
plode. Simplifications or abstractions will be needed. Weyuker offers a real-world
example of this phenomenon, in which the number of distinct states for a telephone
call processing application swiftly grew to over 167 million [Weyuker98].
No matter what approach you use for devising test cases, there are specific
practices to follow in coding them effectively. While there may be some overlap
with other types of programs, these techniques are fairly specific to the realm of test.
Let’s review them.
The test case should target specific functions of the software. It should correctly test
those target areas with a crisp definition of what constitutes success or failure. The
test program should not be a vaguely defined tour through the target software’s
functions with no specific purpose or objective. If it has a clear purpose, then suc¬
cess or failure will be easier to determine.
Ensure Correctness
This seems obvious. Of course a test case should be correct, as should any program.
This is sometimes trickier than it seems. In the wake of an escaped defect, testers
have been known to insist that they had a test case that exercised that code. But after
Developing Good Test Programs 181
further review, they unfortunately discover that their test case didn’t do what they
thought. Some minor parameter they hadn’t explicitly set defaulted to an unex¬
pected value. That default value in turn caused a slight deviation in the path taken
through the new code, bypassing the defect. The test case executed cleanly, but its
implementation was incorrect.
Protecting your test cases from such mistakes is a twofold process. First, ensure
your coding isn't making any implicit assumptions about default actions or code
paths that will be exercised. Be explicit about everything. Second, make a thorough
list of all possible indicators of the test case’s success or failure, and then program¬
matically check them all. For example, in our earlier example about testing a service
that creates a data table, don’t declare its success just by checking a return code.
Also verify that the data table was actually created, and that it was created with the
correct attributes (e.g., size, format, extendibility). In this example, it would be
better to go even further by attempting to use the new table. Store data in the table,
modify the data, verify it, and remove it. As noted earlier, this could be done
through a self-serializing sequence of dependent test programs, each a test case in
its own right. Sometimes you won’t discover a creation service has gone awry until
you take some actions against what it created.
Stay Unique
You have limited time to test any given function. Don’t waste it by creating redun¬
dant test cases. This is not to say there won’t be overlap. For example, if you are
testing a callable service with many possible parameters, you might create multiple
test cases to invoke the service in slightly different ways. Some could test limits, oth¬
ers would force error cases, and still others might probe for security holes. These
test cases are not redundant. However, much of the code that’s related to estab¬
lishing the proper environment to invoke the service and subsequently report re¬
sults could be the same.
Include Self-reporting
Test programs can become quite intricate, setting up complex environments and
then checking for subtle conditions. Good ones don’t stop there. They strip away that
complexity to make it obvious whether the test was successful or not. The best test
cases are programmatically self-checking, as described in Chapter 10, “The Magic of
Reuse.” These test cases summarize success or failure through a mechanism another
program can easily check, such as a return code.
Decision Points
Self-checking isn’t always practical, however. For example, you might want to build
a test case that forks at several key points, with each path examining a different
182 Software Testing Techniques: Finding the Defects that Matter
Null Reporting
Another type of test program might pressure the target software while watching for
the occurrence of intermittent or unpredictable errors. In this situation, when the
program detects a failure it would report it to the tester, perhaps by deliberately
crashing to force a memory dump. If it never detects an error, it could run forever.
Here, the absence of a reported error indicates success. The data integrity monitors
described in Chapter 12, “Data Corruption,” are examples of this kind of program.
Self-tracing
What about a test case that “reports” its status by feeding huge amounts of data into
a log or trace file? This is indeed a form of reporting, but not a good one. Rather
than making success or failure obvious, it forces you to spend considerable time
poring over a trace file to determine the outcome. Too many tests like this and
you’ll miss your end date, your total test coverage will be thin due to lack of time,
or, in haste, you won’t notice bugs that the test case actually surfaced.
On the other hand, for debugging you may find it valuable to create test cases
that can optionally trace their own internal flow: branches taken, return codes re¬
ceived from system services, passes through a loop, etc. If you trace inside loops,
you’ll probably also want to include the option to dynamically and programmati¬
cally limit what’s captured to avoid consuming too much memory (i.e., only trace
the first two passes through the loop, then suppress the rest). This kind of internal
tracing can often prove quite valuable for debugging a complex problem. But it
isn’t the most efficient way to report initial success or failure.
Developers love creating elegant software, in which a few lines of code accomplish
a great deal. Testers are often tempted to do the same, creating a single “big-bang”
test program that, if successful, will complete many test variations in one fell swoop.
A very simple example would be a limits test in which a single invocation of a
callable service is made with every input parameter set to its upper limit.
Unfortunately, when multiple variations are intertwined and one fails, the oth¬
ers will be blocked until a fix is available. During FVT, the software is still fairly im-
Developing Good Test Programs 183
mature, so such failures are quite likely. Remember, your goal is to find bugs, which
means that unlike developers, you want your programs to “fail.” By creating a big-
bang test program, you are in effect devising a test that, if it meets its objective, will
by definition put other tests behind schedule. By trying to save time through an ele¬
gant test case, you've actually cost yourself time by unnecessarily blocking progress.
That’s not to say you shouldn’t include several related variations into a single
test program. For instance, you might repeatedly invoke a targeted service in se¬
quence, with different inputs set to their limits each time. Often that’s a sensible
way to minimize redundant coding while still establishing the right programming
environment around the tests. Just keep each variation in the program independent
of those that precede and follow it. That way, if a bug in one variation is found, it
won’t prevent the others from being attempted.
Callable Services
A better approach is to identify these common functions at the start of the test and
then spend a little time up front encapsulating them into callable services (e.g.,
macros or objects). Some categories of such test case functions include:
they don’t match, or parsing log files for anticipated error messages)
Data-driven
You might also create a single test case that accepts multiple input parameters to
control its execution. By providing lots of external knobs to twist, you will be able
to reuse a single test program to execute several different test variations. The
amount of actual test case code could be quite small, relying on input data to pro¬
vide the variability that drives different paths.
In the previous chapter we saw how FVT test cases can be reused in a different con¬
text during SVT, both as a regression test suite and as a collection of miniapplications
exploiting new functions. The merging of such test cases into creative, system-level
scenarios can create a powerful base for your SVT. In fact, SVT is often more focused
on the use of a broad variety of system attack scenarios than on low-level test cases.
But sometimes that isn’t enough. SVT is aimed at extracting a different class of bugs
than FVT; discovering those bugs may require a different class of test cases.
Some of the prime targets of an SVT aimed at multithreaded software are tim¬
ing and serialization issues. A key approach for flushing out such problems is to put
the software under heavy stress. Sometimes this is as straightforward as cranking up
a workload or mix of workloads that push the new software functions to their lim¬
its. Other times, you must develop test cases specifically designed to uncover po¬
tential serialization problems. The key here is to look for functions in the target
Developing Good Test Programs 185
software that you either know or suspect are protected by serialization (e.g., locks,
mutexes), and then create test cases that put that serialization support on the hot
seat. Force the simultaneous execution of the function from many threads at once.
Then stand back and watch the sparks fly.
Before the advent of computer hardware and operating systems that supported 64-bit
addressing, applications were sometimes constrained by limitations on the amount of
virtual memory available to them. This was particularly true for data-intensive appli¬
cations that needed to cache or manipulate large volumes of data within memory. A
predecessor to the z/OS operating system tackled this problem by creating the con¬
cept of a data space.
A data space was intended to augment an address space. A z/OS address space
is similar to a UNIX process. It owns the virtual memory into which a program is
loaded and executes. A data space was similar to an address space, except it could
hold only data; no code could be executed within it. Also, a data space could not
exist in isolation; it had to be owned by a task, which is similar to a UNIX thread in
that it is the smallest element that can perform work.
186 Software Testing Techniques: Finding the Defects that Matter
The test program created a data space with a maximum size of one megabyte.
There was a bit of gray-box testing going on here—under the covers the operating
system grouped data spaces into three size categories, and created slightly different
control structures for each group. As a result, different tests were required for each
size grouping.
Learn about softwares internal operation to help target a test program at otherwise
invisible limits.
TtP
Developing Good Test Programs 187
Once the data space was created, the test case cycled through each 4096-byte
page and “touched” it by storing a few bytes of data. This forced the system to fully
populate the control structures it was using to represent the data space. Once this
was finished, the test case simply ended, without having deleted the data space. As
the operating system went through termination processing for the task, it should
have noticed that the task still owned a data space. The operating system should
then have done the task’s cleanup work and deleted the data space.
That FVT test case was run, and much to the disappointment of the function
tester, it worked fine. The termination processing performed as designed, the data
space was successfully deleted, and system control structures were properly cleaned up.
FVT test cases typically target verifying a single execution of a given function.
A/ore
The SVT test program was similar to what was created for FVT in every way but
one: instead of creating and populating a single data space, it worked with 100 of
them. The idea was that when the operating system went through termination pro¬
cessing for the task, it would be forced to clean up 100 data spaces simultaneously.
This placed much more intense stress on the delete processing’s serialization than
did the unsynchronized termination of 100 individual FVT programs.
■tejLV Create SVT test cases that force multiple invocations of a multithreaded function.
Execute them concurrently on a heavily loaded system in order to explore aspects of
^ the software specifically intended to support and coordinate simultaneous execution.
The test case was run on a system that was already busy processing other work.
It immediately uncovered a serious defect in the termination support. In fact, by
running various combinations of such test cases under differing load conditions, a
whole series of such defects were found and fixed.
Interestingly, prior to running the test case on a fully loaded system, the tester had
tried it out on an unloaded, single-user, virtualized environment under z/VM (see
Chapter 16, “Testing with a Virtual Computer,” for more on z/VM and virtualiza¬
tion). In that environment, the test case ran cleanly without revealing any bugs. It
wasn’t until it was tried on native hardware on a busy system that the fireworks began.
Epilogue
This is a classic example of the difference between the kinds of bugs targeted by FVT
and SVT. It also demonstrates how seemingly similar test cases can be adjusted to
seek out and find very different defects.
SUMMARY
Writing test programs is a lot like writing other kinds of software. Techniques such as
thoiough commenting and avoiding hard-coded resource requirements are useful in
many contexts. But good test cases also have some unique characteristics you should
strive to implement. Test cases that are targeted, correct, unique, self-reporting, dis¬
crete, and efficient will be more effective at achieving their desired ends in the least
amount of time. Also, if you are involved in system testing, watch for opportunities to
devise test cases aimed at discovering the bugs that test phase is best equipped to find.
Chapter 12 delves into another class of defect that is critical and yet so tricky to
find that it warrants special treatment. This special type of problem is the data in¬
tegrity bug.
Data Corruption
In This Chapter
■ What is data integrity?
■ How to protect against data corruption
■ Why do special testing?
■ Data integrity checkers
■ A Case Study: memory and file thrashers
189
190 Software Testing Techniques: Finding the Defects that Matter
Data corruption occurs when data is incorrectly modified by some outside source
and we are not notified that it happened. This can occur in any of the many places
where data is found: in main memory, flowing into and out of main memory, to
and from I/O devices, or stored on an I/O device. For example, when a record that
has been read from a file on a disk is resident in an I/O buffer in an application pro¬
gram, and this record gets incorrectly modified by some outside source—it is data
corruption. This modification or corruption could be caused by software or hard¬
ware. Data integrity is ensuring that data does not get changed without notification.
tables are incorrectly built or modified, they can cause an incorrect and unexpected
page to be read into real memory. Should this happen, it will appear to the program
as if the whole page of memory has been corrupted. There are several flavors of this
problem.
One is the case in which the tables that keep track of the used page slots on the
disk get incorrectly modified, allowing the same slot to be used for two different
pages at the same time. Another is when the tables that track where a page has been
stored on the disk get incorrectly modified after the page has been written out. In
either of these cases, when the page is read in it will be the wrong page.
Endless Possibilities
These are just a few examples of the kinds of corruption that can occur. Database
software has its own tables and pointers for tracking data, and each has the potential
to generate corruption. So can message queuing software that moves data from one
system to another. Even user-mode applications that manipulate large volumes of
data can fall victim to damaged pointers or wild stores that lead to scrambled data.
Testers should be dubious of any program that controls the flow of data. Software
that provides data management services to other programs has an even greater po¬
tential for harm, and so should be viewed with extreme suspicion.
In cases where one or more pages of memory get corrupted, it often causes
enough damage for the program to terminate, but it doesn’t always. The wild store
kind of corruption is more insidious, because usually a smaller amount of memory
gets damaged and, as a result, may not be detected until long after the corruption
occurred.
One of the basic design assumptions of computer systems is that every component
of the system will adequately protect the data that either flows through it or is
stored in it. This means that hardware components like main memory, I/O chan¬
nels, disk drives, tape drives, and telecommunication links must each protect the
data that it touches. Software has a role to play as well. Let’s review some examples.
192 Software Testing Techniques: Finding the Defects that Matter
Hardware Protection
Magnetic storage media is not 100% reliable. As a result, when data is written on a
disk drive, extra check bits must be written with each record. That way, when the
record is read back, the disk drive can determine if any of the data has been corrupted.
In this case, the corruption would be caused by electrical or mechanical problems,
such as a defect in the magnetic media. Similarly, each of the other computer system
components needs to use check bits to monitor its data for corruption and return an
error indication when the corruption is detected (if it can’t be corrected).
Check Bits
The ratio of check bits to data bits needed is determined by the number of bits in error
that we need to detect, and whether or not we want to be able to correct those bits.
The simplest kind of checking is through parity bits. One example is to use a parity bit
for each eight-bit data byte. For odd parity checking the ninth bit, the parity bit, is set
so that there is an odd number of one bits in the nine bits. When the byte is checked,
if there is not an odd number of one bits, then the byte has been corrupted and at least
one bit has the wrong value. Using this one parity bit for each eight data bits ratio, we
can detect all single-bit corruption and some multibit corruption of a data byte. With
more check bits and more complex error detection and correction schemes, a larger
number of corrupted bits can be detected and corrected.
Software Protection
The operating system, just like all of the other components of the computer system,
has to adequately protect data. Just as we do not expect normal programs to check
that the add function in the CPU is working correctly when it adds two numbers,
we should not expect programs to check for corruption in the data they are relying
on the operating system to manage. The same is true of databases and other criti¬
cal system software components. System software depends on careful serialization
and other techniques to try to protect it from creating corruption problems. How¬
ever, unlike hardware components, system software does not usually have real¬
time mechanisms in place for detecting data corruption if it occurs. MD5 sums and
other postprocessing techniques can be used to identify corruption long after it has
occurred, but are not typically used for real-time detection. They also do nothing
to prevent corruption in the first place. That’s where testing comes in.
should be a priority focus area when testing any vital, real-world software that is in¬
volved with managing data, whether the data is in memory or on disk. In particu¬
lar, the SVT phase should be particularly adept at catching serialization holes or
timing windows that can lead to problems with corruption.
Why is a special focus needed? Isn’t normal load/stress testing sufficient?
Maybe, but probably not. Data integrity problems are among the most difficult to
detect with traditional testing techniques. Since normal test applications (or cus¬
tomer applications) assume data integrity is being maintained, they likely will not
immediately notice when something has gone wrong. Instead, they simply continue
processing the corrupted data. Eventually, they may be affected in some way (such
as dividing by zero) that will crash the application and bring it to the attention of
the tester. Or, the program may end normally—with invalid results. In any case, by
the time the error is detected, the system has long since covered its tracks, the tim¬
ing window has closed, and prospects for debugging it are slim [Loveland02].
Validating data integrity requires testers to think differently from other pro¬
grammers. Specialized attacks are required. Test cases must be written that do not
assume that when a value is written to memory, that same value will later be re¬
trieved. In fact, the test cases must assume the opposite—and be structured in a way
that will facilitate debugging data integrity problems when they strike.
Because normal programs do not check for data corruption, we need to devise test
programs that do. It is better to develop individual, small monitor programs for
each kind of data corruption than to create a large, complex monitor program that
looks for many kinds. This keeps the processing of the test program to a minimum,
so the software under test, rather than the test program, becomes the bottleneck.
Let’s outline a few monitor programs.
real memory. If an incorrect page is detected, the test program will capture di-
agnostic data and terminate with an error indication.
■ Repeat Steps 3 and 4 until either an error is detected or it is manually termi¬
nated.
IS The virtual page was written to the paging disk and then correctly read back
into memory. This means that the tables that map the external paging disk for
this page were not corrupted while the page was on the paging disk.
■ The virtual-to-real translation tables for this page were valid when it was
checked.
■ While the page was on the paging disk, it was not overlaid with another page.
You would typically run multiple copies of this test program at the same time,
each with a different bit pattern. For example, one bit pattern could be all bits off,
which would be looking for any wild store that turns on a bit. Another bit pattern
could be all bits on, looking for any wild store that is just turning off a bit. Now
we’ll review a real data integrity test program used in testing z/OS.
In the 1980s, the z/OS SVT team’s first thrasher was written in response to a data in¬
tegrity bug in a prerelease model of the IBM 3090™ mainframe processor. The prob¬
lem turned out to be related to how the machine was handling a special type of
memory used for fast paging and swapping, called Expanded Storage. At the time, no
one knew where the problem lay; only that data was being corrupted. After weeks of
Data Corruption 195
analysis, the culprit had proved elusive. Then a thrasher was devised and written. It
almost immediately caught the problem, which was quickly debugged. Since then,
variations on this short, deceptively simple program have been used during the test
of many products and features related to storage management, whether on disk or
in memory [Loveland02]. In fact, the test cases used in the example in Chapter 7,
“Test Plan Focus Areas,” for testing a data in memory cache were variations on the
thrasher idea. In essence, they were file thrashers. Let’s look at both that original
memory thrasher, and then how the concept was reused for a file thrasher.
r*-/ Wiien debugging a problem that is intermittent, devise a test focused on exercising
I A the areas of the software under suspicion. Attempt to turn an intermittent occur-
VL—-V rence into one that regularly occurs. This allows you to hone in on the problem and
speeds up the debugging process.
Memory Thrasher
Few people outside of the development lab for an operating system probably think
much about the possibility of errors in virtual or real memory management. But the
concepts contained in a memory thrasher are both simple and powerful. Under¬
standing them may prompt you to think about adaptations you can make for your
own testing.
Implementation Details
There are a few basic rules to follow in order to create a good thrasher. First, the
processing of the thrasher code should be kept to the absolute minimum to ensure
system code, not the thrasher code, becomes the bottleneck. Second, the thrasher
should be designed in such a way that multiple copies can be run in parallel as sep¬
arate address spaces or processes. In z/OS, virtual memory is managed on an ad¬
dress-space basis, and one possible bug occurs when pages from one address space
are exchanged with those of another. Running multiple thrashers in parallel is the
way to catch such problems. Finally, the golden rule is: trust nothing.
When writing software to test other software, keep your eye on the main objective.
Ensure the software under test is being exercised heavily, not your test software.
Keep it simple.
von Stressing a tightly focused area of your target software is a great way to flush
out data integrity defects. Consider running “cloned” copies of a single test pro¬
gram as one technique for generating such stress.
196 Software Testing Techniques: Finding the Defects that Matter
GET MEMORY ADDRESS(MEMPTR) LENGTH(PAGES*PAGESIZE) BOUNDARY(PAGE); /* Get memory table to be thrashed through
PAGEPTR=MEMPTR;
*/
DO PAGENUM=1 TO PAGES; /* Initialize memory table */
PADDR=PAGEPTR;
PASID=MyASID; /* MyASID obtained from system control structure */
STCK(PTIME); /* Store value obtained from current clock */
PCOUNT=0; /* Page not thrashed through yet */
PJOBNAME = MyJob; /*
MyJob obtained from system control structure */
PAGEPTR=PAGEPTR+PAGESIZE; /* Go to next page in memory table */
END;
REFNUM=1;
DO FOREVER; /*
Repeat until job is canceled */
PAGEPTR=MEMPTR;
DO PAGENUM=1 TO PAGES; /*
Make a pass through the memory table */
IF PADDR<>PAGEPTR OR PASID< >MyASID THE#* Data integrity error detected? */
Force the program to abend; /* Force a memory dump to capture state data */
ELSE
DO
PADDR=PAGEPTR; / * Else, update page again... */
STCK(PTIME); /*
Store current clock value in this page */
PCOUNT=REFNUM; /* Update reference count for this page */
PAGEPTR=PAGEPTR+PAGESIZE; /* Go to next page */
IF WAITloO THEN /* Delay between reading pages */
Wait for WAIT1 Seconds;
END;
END;
REFNUM=REFNUM+1;
IF WAIT2<>0 THEN
Delay between passes through table */
Wait for WAIT2 Seconds;
END;
When creating test programs to stress software, consider including the ability to
input various settings that will direct their behavior. This approach provides a
von powerful means for reusing a program by altering the testing it performs When
probing for scrambled da ta, each test must be able to distinguish one bit of data
from the next. Look for simple ways for your test case to create unique, predictable
data fields that can be easily validated later.
Data Corruption 197
Next, the thrasher dynamically obtains the memory table and initializes it. Note
that the size of each entry in the table is related to how the underlying operating sys¬
tem and hardware manage virtual memory. z/OS manages memory on a 4096-
byte-page basis, so each entry is 4096 bytes long.
Finally, the thrasher goes into an infinite loop and begins working its way
through the table. For each page, it first checks for corruption. If any is detected, it
immediately torces an ABEND (abnormal end). Typically, the tester would issue an
operator command to set a system trap for this particular ABEND which, upon de¬
tection, would immediately freeze the entire system so memory could be dumped
and the failure analyzed. If no corruption is detected, then the program updates the
table entry so the page is changed. It performs any delays requested by the user, and
then proceeds to the next page. Note that with this flow, after a page has been up¬
dated, it is allowed to “brew” awhile before it is rechecked for corruption. This is
necessary in order to give errant timing windows sufficient opportunity to arise, but
also means that there will be a slight, unavoidable delay between the time an error
occurs and when it is detected.
Test cases that programmatically determine when an error is encountered and as¬
sist in gathering diagnostic data are very powerful. Referred to as self-checking test
cases, they are more effective than test cases that just exercise a particular function.
MOTf
Execution
Executing a memory thrasher could not be easier. The program is simply started,
and it runs until it is canceled or detects corruption. As stated earlier, multiple in¬
stances of a given thrasher are normally run concurrently. Similarly, streams of un¬
related thrashers are often run in parallel. In fact, a secondary benefit of thrashers
is that they provide an easy method for generating high load/stress with minimal
setup requirements, and can provide good background noise while other tests are
run in the foreground.
File Thrasher
The data in memory cache support described in Chapter 7 also involved manipu¬
lating data, but this time the data resided in files on disk. Nonetheless, whenever
198 Software Testing Techniques: Finding the Defects that Matter
data is being manipulated, the potential for corruption exists. For that reason, data
integrity monitoring capability was a crucial aspect of the test. So when the team
began developing its test cases, they turned to the thrasher concept and looked for
a way to adapt it.
Implementation Details
The support being tested involved caching data read by programs as they
worked their way through a file. Writing of data by a user while it was being read
was not permitted, so the test cases only needed to concern themselves with per¬
forming reads. However, something had to initially create a file and prime it with
data so the readers would have something to work with. With that in mind, the file
thrasher was broken into two pieces: one ran at the start of the test to initialize the
file, and a second then read that file back, watching for corruption. Pseudocode for
the writer and reader are shown in Figures 12.2 and 12.3, respectively.
OPEN TESTFILE;
/* Open the file
CLOSE TESTFILE;
/* Close the file
END;
These programs were even simpler than the memory thrasher. The writer filled
the file with records 200 bytes in size, with the number of each record stored within
itself. That number was the predictable data that allowed the readers to later detect
corruption. It was stored at the beginning, middle, and end of the record in order
to be able to later detect partial overlays that didn’t corrupt the entire record. A de¬
sign decision could have been to store that record number over and over again
Data Corruption 199
throughout the record, and then later check them all. But this was deemed overkill
and would have increased the overhead of the thrasher code, thus breaking the rule
of keeping thrasher processing to a minimum.
The reader simply worked sequentially through the file and checked to see if
each record correctly contained its own number. If not, it forced the program to
crash in a way that could be caught by diagnostic system tools, similar to the mem¬
ory thrasher. If all was well, then the reader checked to see if it was being asked to
simulate a “slow” reader. If so, it waited a bit before proceeding with the next read.
When the reader reached the end of the file, it exited.
Execution
Execution was quite simple. The writer was started and allowed to complete. Then
multiple copies of the reader were started, each reading the same file, all watching
for cases where the caching support under test had corrupted a record. Several
groups of readers could be started, each group reading at a different speed so that
“fast” readers could over take “slow” readers—a technique that caused problems
for the software under test.
200 Software Testing Techniques: Finding the Defects that Matter
Consider an individual thrasher and similar test case as a building block. Combine
different combinations and variations of them to create workloads.
Epilogue
The basic thrasher design has been reused and adapted over the years for verifying
data integrity in many additional technologies as they have come along, including
such z/OS features as Dataspace, Hiperspace™, Coupling Facility, BatchPipes®,
Hiperbatch™, and the Unix Systems Services hierarchical file system. Easy to cre¬
ate and run, thrashers have proven themselves to be very powerful and valuable
tools in the system tester’s arsenal.
SUMMARY
We have discussed just a few of the many ways that data can get corrupted, exam¬
ined why it is so important to do special testing for data integrity problems, and re¬
viewed some specialized tools that can help with that activity. There are many more
software test tools you can leverage to provide further stepping stones for a suc¬
cessful software test. Chapter 13, “Tools—You Can’t Build a House without
Them,” digs into guidelines for buying or building effective test tools.
Tools-You Can't Build a
House without Them
In This Chapter
Y ou are about to build your first house and decide to act as your own general
contractor. You’ll hire all the necessary subcontractors to perform the jobs
needed to get the house built. One very important step is choosing a carpen¬
ter. You research the available carpenters in your area and find yourself asking,
“What makes a good carpenter?” Does a carpenter’s selection of tools play a big role
in your decision? Probably not. You assume a good carpenter is competent enough
to arm himself with the tools he’ll need to get the job done right.
A good carpenter is someone who’s skilled at using the tools of his trade. He is
dependable and flexible, able to quickly react and adjust to house design changes on
the fly. He has some practical experience building houses. A good carpenter recog¬
nizes practical application problems in the design prior to building the house, and
provides you with valuable suggestions after reviewing the design. He has demon¬
strated an ability to uncover and fix problems should any occur during construction
202 Software Testing Techniques: Finding the Defects that Matter
of the house. He has current knowledge of local building codes and any other laws
or government regulations that may apply. And of course, he must have good refer¬
ences.
It’s interesting that while the tools themselves are not really a significant part of
defining what makes a good carpenter, no carpenter can perform his job without
them. The same can be said for a tester. No book on testing would be complete
without a discussion on tools, since they are essential for a tester to perform his
work. But, as with carpentry, the art of testing relies on much more than tools—
they merely support its approaches and practices.
Just as the carpenter will have a truck full of implements to perform his tasks, the
tester requires multiple gadgets to get his job done. Because different phases of test are
designed to find different types of defects, each tool is geared toward uncovering a
certain class of problems or supporting a specific test practice. Regardless of any state¬
ments made in the marketing material of commercially available test tools, testers will
need a mixture of them. There is no single tool that fulfills all of a tester’s needs.
Cheaper, better, faster any tool you decide to use should meet any two of these
adjectives. The effects on the third should be considered as well. Let’s go back to the
carpentry analogy. Say you are a carpenter and are looking for a better hammer.
You find an electric nail gun for sale. It’s the latest technology in nailing. It very ef-
Tools—You Can't Build a House without Them 203
fectively and efficiently drives nails into wood. It appears to help perform nailing
tasks better and faster. But it costs more than a traditional hammer. Do the bene¬
fits outweigh the additional cost? This is not necessarily an easy question to answer
and requires further investigation. After all, ultimately a wood-framed wall built
using a nail gun would not be much different than one built by a skilled carpenter
using a traditional hammer.
Start by comparing the base cost of the tools themselves. Then consider the ad¬
ditional costs. There are training expenses and additional running costs. The nail
gun requires electricity—maybe not a big expense, but one that should be consid¬
ered. Then there is the potential problem of needing the services of the nail gun in
a place that does not have electricity. If this problem will be frequently encountered,
maybe buying a generator will fix that issue. But this adds yet more costs for both
the generator itself and its ongoing fuel requirements. What appeared to be an easy
decision seems to get more and more difficult.
Now reevaluate what your needs are and what the new tool will do for you.
Were you looking for a new hammer to outfit a new employee you just hired? Or
was your old hammer wearing out from years of use or broken from misuse? What
type of carpenter are you? One that frames houses and generally spends the better
part of the day driving nails, or one who specializes in making attached decks on
houses and mostly uses galvanized deck screws? The first carpenter may be much
better off investing in the nail gun, while the other may actually harm his business
with an unnecessary upgrade.
Similarly for software testing tools, there will be associated training and running
expenses. When new users need to learn about a testing tool, or existing users require
education on its enhancements, a training cost is incurred that must be factored into
the spending budget. Running costs will also include maintenance of the tool, as well
as resources that its operations require, such as accompanying hardware, software, or
personnel. Don’t forget about those costs—they can break the bank later.
TOOL CATEGORIES
There are many ways to logically group tools that support the software test disci¬
pline. They can be organized by test phases, by the test activities they support, or
other ways. The specific grouping presented here has no particular significance; it’s
just one way to tie the many tools together. It is also not all inclusive, nor is it in¬
tended to be. It simply offers a general overview of some testing tools commonly
used in the software development industry.
204 Software Testing Techniques: Finding the Defects that Matter
Support Tools
There are tools available that will assist in the test process. These tools may not be
directly involved in test execution but instead help testers and test teams manage
their activities.
Testers quickly discover management’s keen interest in their current status. Also
called a test tracking tool, this tool gives a test team an easy way to deliver a status
update. Many times, this type of tool will show how current testing progress maps
against what was planned. In other words, it tells you whether you are ahead of or
behind schedule. Some test management tools also keep track of the individual
testing tasks planned and can identify which ones have been attempted. They can
also track whether those attempted tasks succeeded or failed. Some will even iden¬
tify the defect tracking tool problem number(s) associated with a failed test and au¬
tomatically monitor the status of that problem via an interface with the defect
tracking tool. Of course, no test tracking tool is complete without the ability to au¬
tomatically generate fancy charts and graphs that make managers drool. Read
ahead and you’ll see an example of using this type of tool in Chapter 18, “Manag¬
ing the Test.”
Tools—You Can't Build a House without Them 205
There are various types of widgets in this category. They range from text editing
software used for test plan development to fancy reviewing tools that assist in the
review process. These reviewing tools typically let reviewers append comments di¬
rectly into a formatted test plan document, automatically notify the test plan
owner, and support a disposition of the comment (e.g. accepted, rejected, update
completed). In many cases, they will assist in complying with quality certification
standards such as IS09001.
Other tools to assist in test planning can be built or purchased. A very easy one
to build is described in Chapter 5, “Where to Start? Snooping for Information,”: a
simple checklist to guide a tester through investigation, planning, and execution ac¬
tivities. When challenged with the potential of having to test many combinations of
hardware environments or other variables, there are optimization tools to help you
identify the minimum number of configurations required to achieve maximum
coverage. These are just a couple of examples out of the many tools available.
Source-code-aware Tools
These tools are typically geared for the developer and are most appropriately used
during the unit test phase. They are mentioned here because testers should at least
be aware of them and advocate their use by development. Also, there are some
cases where a tool best suited for unit test can also be used for some purposes in
other test phases.
variables, memory leaks, and out-of-bounds array access. In some cases, the source
code scan is accompanied by an analysis service, in which you contract out the analy¬
sis to a vendor. Some types of errors identified by static code analysis tools can also be
identified by specifying certain options on some compilers. Again, it s normally the
development team’s responsibility to have this analysis done, but you can gain some
valuable insights by reviewing the resulting reports. For example, you may be able to
identify parts of the software that tend to be more error prone than others. Based on
this knowledge, you may wish to zero in on these parts during your test.
End-user Simulators
This is a generalized way of describing many tools. From GUI automation to large
systems load and stress, these tools provide a way to automate entering work into the
system in ways similar to those of the software’s real users. Considered the “bread
and butter” of many system test teams, they can supply a way to simulate many
thousands of end users simultaneously entering transactions. Such tools can drive
very robust applications through quite simple and natural means. For instance, a
tool might do nothing more than simulate the keystrokes or mouse clicks of an end
user on a Web page. Those keystrokes or mouse clicks invoke “real” transactions
that call “real” applications that require access to “real” data. By simulating hun¬
dreds to thousands of such users, the tool can push significant loads against the tar¬
geted software stack.
Frameworks
Many very powerful tools do nothing. Of course, this is an unfair characterization
of frameworks, but it illustrates their nature. Frameworks are tools that provide ser¬
vices or encapsulated, invokable functions for use by test cases or test scripts. They
eliminate busywork so the tester can focus on creating only the code needed to ex¬
ercise the items he wishes to test. Frameworks provide the foundation for those
tests, or offer services that can reduce the complexity of establishing certain envi¬
ronments. Many frameworks are easily extended by allowing the user to plug in
new callable services. This is one reason why frameworks are so powerful and po¬
tentially long-lived. They allow the tool to be enhanced by its users so it can evolve
over time and keep current with new technology. Later in this chapter we’ll describe
a sample framework in detail.
Tools—You Can't Build a House without Them 207
Just a Sampling
This is just a very small subset of the many software test tools available. It would be
impossible to describe all the tools that exist. If you know exactly what you want to
achieve, you can likely find a tool to help you get there. By fully understanding your
test responsibilities and adhering to best practices, the specific tools you’ll need will
be apparent. Locating them should not be difficult.
end-user simulator can act as a test harness for your scripts and does not need to
understand the underlying function of the software. Of course, you can come up
with a scenario where even this is not the case. If the end-user simulator enters
work into the system over a limited number of network protocols and the software
function you are testing is the implementation of a new network protocol, then
you’ll need to find other ways to drive the software. If you obtain a thorough un¬
derstanding of what it is you will be testing and you know the capabilities of the
tools you use, you won’t be caught by surprise in a scenario like this and you can
put plans in place to address it.
The Component Test Tool (CTT) was developed to provide a framework for testers
of the IBM mainframe’s operating system. CTT’s origin lies in a creative approach
to testing a component deep in that operating system. A significant software rewrite
planned for the Recovery Termination Manager (RTM) was going to require a
huge investment in test case coding. An obvious approach would have been to code
a common test case model, then for each new test case copy that model and mod¬
ify it as needed. But by doing this, there is the risk of propagating common errors
and adding complexity when establishing environments for each test case.
The situation motivated an experienced test leader and his team to search for a
more efficient way to validate the RTM. The team needed a means to provide sim¬
plification, extendibility, workload distribution, exploitation of system services as
part of normal tool processing, and multisystem capability. The result was the CTT.
allowed the tester to easily create test cases in extremely complex environments.
And with a single invocation of a verb, CTT performed all of the messy tasks needed
to establish these environments [Loveland02].
When faced with the need for many similar test cases to establish complex envi¬
ronments, consider creating a framework that consolidates that complexity in a
single place, so development of the individual test cases becomes simpler.
TfP
For example, most operating systems set priorities for different types of work. On
the mainframe, something called a service request block (SRB) represents a unit of
work that carries a higher-than-normal priority. Unfortunately, establishing and ex¬
ecuting the SRB operating mode can be an involved process with many steps. CTT
simplified that process by hiding it behind a single verb. This freed testers to focus on
the services being tested in SRB mode, rather than on this tedious setup work.
A framework can provide a small set of high-level commands that perform tasks
that are routine but tedious and error prone.
More
Figures 13.1 and 13.2 outline a traditional programming approach to estab¬
lishing an SRB mode environment and a CTT SRB mode implementation, respec¬
tively. The code in Figure 13.1 implements the scheduling of the SRB using a
high-level programming language. The details aren’t important. Just note the vol¬
ume of code and complexity involved—and the opportunity for error.
The code in Figure 13.2 makes use of the CTT test case infrastructure. The in¬
teresting part is the function named SCHEDULE. This single verb performs the en¬
tire set of tasks outlined in Figure 13.1 without requiring the programmer to
understand all of the details. Clearly, CTT made it easier to develop test cases and
eliminated worry about the complexities of the underlying environment. Consoli¬
dation of the complexity of system or application services into a single interface is
a characteristic of leading software testing tools.
Extendability
Constantly depending on the test tool support team to provide new function to all
interested test teams quickly results in work prioritization. Software test tools must
be self-extendable by their users in order to lower the reliance on the tool support
team. CTT was structured with plug-in capability to make it extendable by its user
community. Let’s examine how the technique was implemented.
Find ways to make a framework easy for users themselves to extend. An interface
for pluggable, user-written modules is one effective technique.
210 Software Testing Techniques: Finding the Defects that Matter
Implementation Details
The design and implementation of CTT allowed for quick and convenient ex-
tendibility. CTT is comprised of two functions, parsing and processing. During the
parse phase, the input stream of statements is deciphered and interpreted. CTT
parses the entire input stream and, if there are errors, reports on all of them and
Tools—You Can't Build a House without Them 211
IN IT TESTCASE=SCHEDULE,ASID=MAINSID,
MAINID=MAINTASK;/*Identify the testcase*/
/********************************************i
/**Sample CTT deck to Schedule an SRB and */
/**wait for the SRB to end its processing */
/********************************************i
stops execution. If the parse completes successfully, CTT builds control structures
representing each individual input statement. These control structures are linked
into a double-threaded structure that is an ordered representation of the input
statements.
A high-level command input language can offer a rich user interface for a frame¬
work, but it implies that a parser will need to be reused or developed.
wre
After parsing, CTT begins processing. The CTT uses a simple dispatching al¬
gorithm that traverses the queue and examines the control structures, each repre¬
senting a particular input statement. One of the data items within the control
structure element is the address of a program to be called to perform the functions
associated with the element. The control structure element contains all of the pa¬
rameters and options specified on the input statement. The process program, which
is called by the dispatching algorithm, interprets the data passed via the control
structure element and performs the actual function requested. Figure 13.3 shows
the CTT dispatching approach.
To perform actions and coordinate work, a framework will typically require at least
a crude dispatcher.
More
This implementation is key to the extendibility of CTT. Both the parse program
and process program that are called by CTT are simple plug-ins. Testers wishing to
implement new or improved CTT verbs or functions can simply specify an input
pointer to a set of verb definition files. These files specify the name of the parse and
process routines that CTT is to load and utilize for the specific verb. At any time,
212 Software Testing Techniques: Finding the Defects that Matter
Dispatching
Algorithm
N
testers can develop their own verbs or functions for use in the CTT framework. As
these plug-ins mature, they are then integrated into the base tool framework for the
benefit of all users.
By creating a user-extendable framework you can turn the entire test community
into part of the tool’s development team, greatly speeding the availability of new
features.
CTT also provides its users with a call interface to other programs. The impor¬
tant aspect of this function is a precise set of data parameters between CTT and the
“called” program. The data parameters contain pointers to return and reason code
indicators so that the target program can inform CTT of success or failure.
vfT/P
Ensure a framework can always report on the success or failure of test cases it man¬
ages.
Another benefit of this capability is that the tester can take advantage of CTT’s
support for establishing complex execution environments and then call a target pro¬
gram of its own. This gives the tester a way to simplify his target program and let
CTT perform all of the complex tasks of establishing a particular operating mode.
Tools—You Can't Build a House without Them 213
TIP
T/P
//CTTDECK J0B_
//CTTIN DD*
INIT TESTCASE=CTTDECK.SYSTEMID=SYS1 ,
CTT functions (verbs) to be performed on a system SYS1
INIT TESTCASE=CTTDECK.SYSTEMID=SYS2,
CTT functions (verbs) to be performed on a system SYS2
INIT TESTCASE=CTTDECK.SYSTEMID=SYS3,
CTT functions (verbs) to be performed on a system SYS3
A framework can contribute to the testing of new software features by making use
of them itself
t/ore
SYSTEM 1 SYSTEM 3
Test
Case
V #3
f CTTX ^
] Process w
CTTX
#6 ^ Process
i
bVblEM 2
#4
\ \
CTT Job carves up system-specific test case portions and
, passes individual test case input to each target CTTX
#6 \ CTTX process through the local CTTX process.
CTTX process passes data to other CTTX processes on other
Process systems.
CTTX processes on each system initiates test case received to
execute on that syystem.
#6 - After test cases execute, output is passed back through CTTX
process communication to CTT Job on System 1
#7 - CTT Job writes output for all test cases.
Epilogue
The Component Test Tool was originally designed as a framework to ease the bur¬
den of coding a large volume of test cases when preparing to test a new software re¬
lease. CTT had to be written in-house, because the area it addressed was too
specialized for a tool vendor. Because it was written in a flexible, extendable way, it
has been able to evolve over the years to meet new testing challenges. To this day,
it still continues to thrive and help mainframe testers spend less time on tedious,
repetitive coding and more time on finding bugs.
The title of this section could very well be “Buy versus Buy.” Whether you spend re¬
sources to build a tool or to buy one, you are paying someone. The payee may be
different, but it boils down to money spent. There is no right or wrong answer
about which option to choose. It is a matter of what makes the most business sense
in each case. However, there are some general guidelines you may wish to follow.
Tools to Buy
Unless you are in the software test tool business, your first instinct should lean to¬
ward obtaining testing tools from external sources rather than spending a lot of
time and effort building them. You can either purchase the test tool from an out¬
side vendor or find an existing, supported tool that is widely used throughout your
company.
Tools that support tasks and practices that are common to most test teams, or
that are in support of activities not directly related to the software being tested,
should be obtained externally. A good example is tracking status. Every test team
has to do it. Suggesting that your test tracking requirements are so unique that you
need your own homegrown, customized tool for tracking is a difficult argument to
216 Software Testing Techniques: Finding the Defects that Matter
win. Your time is much better spent testing than writing and supporting a test
tracking tool. It’s this line of thinking that is needed when you are faced with the
“buy versus build” decision.
In addition, tools that provide complex functions and can support your test re¬
quirements are best purchased. End-user simulation tools are a good example. It
would be very difficult and most likely not a good use of your time to try to build
them. These complex, powerful tools are best left to test-tool experts and full-time
developers.
Tools to Build
Whether you explicitly decide to build tools or not, you will be building them. It s
just a natural byproduct of testing. By its nature, software testing requires testers to
create various scripts and other test tools. It’s unavoidable. You 11 also find that
nontest tools will be built as part of the overall test effort. For example, testers will
automate mundane tasks—you can pretty much count on it. These efforts should
be embraced and rewarded.
The other tools you cannot avoid building are ones that are required to help
you test new technology that cannot be efficiently tested with existing external
tools. The CTT just described is a good example. Focused test drivers like the
thrashers mentioned earlier should also be built.
Tools are so essential to software testing that it’s very easy to get entangled in the
tool decision process. In their quest for perfection, it is not unusual for testers to
continually question whether there are better tools available.
SUMMARY
lust like any carpenter, a tester needs a toolbox full of tools to get the job done. There
are both tool requirements and many solutions available for all aspects of software
testing. But like a good carpenter, a good tester is not defined by his tools, but by his
skill at using them. Quality tools are not free, whether you build them yourself or
buy them, and you’ll need to make some difficult business decisions about how to
best address your tool needs. Sometimes building your own tools is unavoidable,
and some tools will naturally be developed as you execute the test process.
We’ve seen that some test tools provide a framework for driving test work
through the system. But what is the content of those tests? Can they align with real
customer activities? In Chapter 14, “Emulating Customers to Find the Bugs That
Matter,” we’ll look at emulating customer workloads to expose the bugs that mat¬
ter most.
Emulating Customers to
Find the Bugs That Matter
In This Chapter
■ Approaches for porting customer applications
■ Customer-oriented goals for tests
■ Managing the test lab as a production environment
219
220 Software Testing Techniques: Finding the Defects that Matter
As noted in earlier chapters, each test phase is geared toward finding different
types of defects, and should be focused on the right set of objectives. The notion of
emulating customers applies to all test phases. It s as much a matter of having the
customer mindset as anything else. However, in many cases the test phases best
suited to having specific, documented customer emulation objectives are SV T and
Integration Test (if one exists). Techniques for emulating customers during testing
can be divided into three categories. The first is defining what you run in your test,
the second is how you run it, and the third is managing the test environments.
Customers’ production software rarely runs in isolation. It must fit into a complex,
intertwined environment. It must inter-operate with many other individual pieces
of technology that are brought together to create a complete IT infrastructure.
Testers ignore this reality at their peril.
In some cases, taking an application the customer runs in its production envi¬
ronment and porting it to your test lab is a good way to find problems not exposed
by other testing. And since it’s a real application, any problems it finds automatically
fall into the category of defects that matter. On the other hand, there are times when
creating artificial applications or workloads that are modeled after real applications
are cheaper and more effective. This is particularly true in the realm of general-
purpose software, where the range of customers spans many industries. In such
cases, it is far too large an undertaking to mimic every customer’s applications for
testing. On the other hand, creating a composite workload patterned after a variety
of actual environments can be quite effective. Let’s examine both the porting and
modeling approaches.
course can sometimes prove to be just the opposite on the test floor. The likely suc¬
cess or failure of porting a customer workload into a test lab is dependent on a va¬
riety of factors. Many of these may not be obvious at the outset of the project. Let’s
take a look.
Capturing Data
An application won’t trip over bugs until it is invoked. Simply copying the appli¬
cation to your test lab is not enough. You’ll also need to drive realistic activity
against it. This implies you may need to capture both the initial state of critical
pieces required for the application to run and the actual runtime activity of the ap¬
plication. For example, the state of the data in a database is key for application
transactions to execute successfully. The initial state of the database needs to be syn¬
chronized with the beginning of the transactions. Depending on the situation, you
might also need to collect information on the flow of activity to the application in
order to replicate that same flow in automated test scripts. This is fairly trivial to do
for a GUI application on a personal computer. It becomes more difficult to achieve
for a large enterprise application that receives a continuous stream of input from
local users on channel-attached terminals, remote users coming in over the net¬
work, remote devices such as ATMs or point-of-sale kiosks, other applications, and
simultaneous input and activity from batch work.
Additional Complexities
In addition, large, real-life applications have other components that must be cap¬
tured. The state of the security environment, the security repositories, and user in¬
formation and permissions are needed. The real time and date of the activity might
be important to the application. The application may require pieces of information
from the database of another application running in a separate environment. It
may pass and receive data to and from that other application. These types of appli¬
cations are typical in large IT environments and they routinely support hundreds to
222 Software Testing Techniques: Finding the Defects that Matter
thousands of concurrent users. Any piece of the puzzle that is not captured can sig
nificantly impact the later execution of the workload in the test environment. Un¬
fortunately, this type of issue is often not discovered until after significant time and
effort has been spent on the porting activity.
Confidentiality Concerns
There are also confidentiality issues to address. The contents of the data to port may
contain very sensitive information. For instance, the personal information of the
business’s customers might be included—names, addresses, phone numbers, credit
card numbers, or Social Security numbers. Very strict data protection procedures
may need to be created, reviewed, and approved. Many details like the physical lo¬
cation of the storage drives; backup capabilities and security requirements on the
backup media; and rules on storage sharing, test system access, and tester access need
to be documented. Any of these issues could become an insurmountable hurdle to
proceeding with the port.
What is Missing?
We’ve looked at what you get after all the data is collected, but what about what you
don’t get?
One thing you won’t get is an entire view of the application. The activity in large IT
environments similar to the one described in Chapter 2, “Industrial-Strength Soft¬
ware, It’s Not a Science Project,” is very dynamic. What happens on Monday isn’t
exactly the same as Tuesday. The activity that occurs the first week of a month dif¬
fers from that of the last week of the month. The same is true about end-of-quarter
and end-of-year processing. The type and level of activity you end up with depends
on the days you collected the data.
Emulating Customers to Find the Bugs That Matter 223
Updated Code
It’s also important to note that application programmers are just that, program¬
mers of applications. This means they create, enhance, and maintain their applica¬
tions. They continuously update them. So, it is safe to assume that very soon after
an application is ported to a test environment, it no longer is the same as the ap¬
plication running in the customer’s environment. A tester would be fooling himself
if he thought what he had a month after collecting the data was a current view of the
application. Be aware that after you have your copy of the application, you will only
be gaining the benefit of what it does up to that point. Anything new that the cus¬
tomer’s programmers do after that point is no longer part of the benefit you see.
Randomness
Another element important to test workloads that you won’t get with customer ap¬
plication ports is the capability for randomness. The activity from the ported ap¬
plication often needs to be replayed in the same sequence as it was when initially
gathered. The speed of the transactional input can be varied, but only on a global
scale that keeps the sequence of events the same. So every replay is basically redun¬
dant, resulting in the workload’s gradually losing its effectiveness in finding defects.
Software can get “grooved” to a static workload, which is not the result you want.
Maintainability
In a moment of crisis, you may be tempted to dump a customer’s entire application
environment blindly to portable media and restore it in your test lab. Certainly this
is one way to ensure that you capture not only the application, but all of its associ¬
ated data and configuration information. It also creates a starting point at which the
application and its data are well-synchronized. However, the result may prove dif¬
ficult to maintain. A lack of documentation on the database structure and the ap¬
plication’s dependencies on it, for example, may make the database difficult to
prune or upgrade. If you don’t understand the internals of the application, it might
be impossible to adapt it later when its supporting infrastructure is upgraded and
features it relies on are deprecated. Be wary of the lure of a raw dump and restore
of a customer environment. In some situations it might provide a quick fix, but
don’t expect it to deliver long-term value to the test team after the crisis is over.
Return on Investment
After investing all the time and effort involved in porting a large application, you
want a good return on that investment. Snapshots of an application should be able
to be replayed for a substantial period of time, right? Not necessarily. An experience
from a real application port will demonstrate this potential pitfall.
224 Software Testing Techniques: Finding the Defects that Matter
Systematic Approach
What data you review and what activity you strive to generate depends heavily on
what it is you are testing. But the general approach can be the same. You’ll want to
take a systematic approach using empirical data. Identify key characteristics of the
way the software is driven—pressure points and rates of activities that are measur¬
able and externalized in some way, either by the software under test or related tools.
Then define corresponding activity goals for your existing workloads. Make tar¬
geted changes to the workloads, and then measure them. Iteratively modify and re¬
measure. Keep a close eye on the activity your workloads generate and enhance
them until they converge with your customer sampling.
Activity Goals
Most operating systems have many system activity measurements that can be mon¬
itored via performance reports, operator commands, and system monitoring tools.
These measurements allow for easy ways to gather data from customers and define
test goals of system activity based on that data. The testers can also use the same
performance reports and operator commands to measure their workloads against
those goals and make modifications based on the results. We’ll describe a few ex¬
amples for a system test to trigger ideas on what is available for your own situation.
Input/Output Rates
I/O activity can be viewed in many ways. Goals for I/O activity can be defined at dif¬
ferent levels of granularity, such as the total number of I/O requests per second
across a single system. Another example is the total number of I/O requests per sec¬
ond across all the systems participating in the same cluster. You can get more spe¬
cific and define goals for I/O activity to a particular storage drive, again from both
a single system and from multiple systems that share the same drive (when such
sharing is supported.) Getting even more granular, you can define and measure
goals for I/O activity against a particular file on a storage device. And for network
or SAN-attached storage drives, you can define I/O utilization goals across the at¬
tachment medium.
You’ll also want to ensure you have a good mix of different I/O types. Your mix
would include attributes like read/write ratios against a database, so create work¬
loads that perform X number of writes of data for every Y number of reads. Since
caching is an important part of the technology, you’ll want to have goals for that
also. You can look at cache hit percentages and cache misses. This applies to both
hardware and software caches, if any. Linux systems, for example, buffer file I/O in
otherwise unused memory.
226 Software Testing Techniques: Finding the Defects that Matter
For all of these examples of I/O activity goals, it s important to correlate activ¬
ity numbers with response time goals. You’ll want to ensure that the activity num¬
bers created from a review of the customer data are matched with a test hardware
configuration that can support the desired activity requested and performs like
customer environments.
Memory Usage
Most operating systems allow you to monitor different forms of memory usage.
Simple console commands usually reveal how much memory is currently used,
free, shared, or being used for caching or buffering. More sophisticated tools can
often track the ebb and flow of memory usage over time. Likewise, some middle¬
ware programs can monitor how their exploiting applications are using memory.
Web application servers, for example, can usually display changes in Java heap size
as the load on an application increases. These kinds of memory usage statistics are
important to include in your analysis. You can set test activity goals based on them.
You might set a goal of driving the Java heap size to a particular threshold X times
an hour, for instance, or target a specific system paging or swapping rate.
Shared Memory
Some software takes advantage of shared memory segments, which allow multiple
processes to access common control structures and data. Examples of such mem¬
ory areas include IPC shared memory in Unix and Linux, and the Common Stor¬
age Area and System Queue Area in z/OS. Goals on the use of these shared memory
segments can be set and monitored.
Buffer Utilization
In addition to system-wide memory utilization goals, you can set goals for specific
functions or subsystems. Many functions have their own cache or in-memory
buffers for fast access to information and the ability to avoid I/O. For example, a dis¬
tributed secure sign-on solution might maintain all user authentication and autho¬
rization information in a central server’s repository, but cache portions of it as
needed on each local client system to improve response times. Utilization goals for
such caches can be created and measured. Some functions even support the ability
to monitor buffer management activity, e.g., the number of times data was cast out
of the buffer to make room for a more recently referenced item and the number of
times the entire buffer filled up.
Resource Contention
In large clustered systems, where each system runs on an SMP, serialization is es¬
sential to successful operations. Varied resources on the system will be serialized for
Emulating Customers to Find the Bugs That Matter 227
defined periods of time to ensure integrity. Some resources are “popular,” so con¬
tention is anticipated and acceptable. The number of resources in contention, how
long the contention lasts, and the number of waiters for the resource are moni¬
tored. Goals can be set for the amount of contention occurring as well as the aver¬
age number of waiters. Average wait time is another reported item that can be
modeled by a set of SVT workloads.
Some general system activity goals can be set, but still provide value in guiding some
test objectives (e.g., the total number of active processes or address spaces on a sys¬
tem.) Other examples include the number of batch jobs that start and stop in a given
period, the number of concurrent interactive users, and average CPU utilization.
The primary type of software discussed in this book is code that is made up of many
components and developed by many programmers. The software under development
may consist of multiple new functions. Typically, the testing of the new functions is
228 Software Testing Techniques: Finding the Defects that Matter
parceled out to individual testers. This is appropriate, particularly early in the devel¬
opment cycle before all the pieces have had a chance to mature and stabilize. But once
that stabilization has occurred and the pieces have come together, the way final tests
are run can be slanted toward emulating customers.
Systems Programmers
The systems programmers can perform all the setup and customization tasks using
the documentation that comes with the software. They might also act as the sup¬
port staff for the other team members, creating and documenting the procedures
for managing the systems over the long haul. They can develop and perform main¬
tenance procedures, provide hardware configuration management, and offer sup¬
port for local customizations. The systems programmers may also be responsible
for the data backup and archiving procedures, log management, change control,
software installation, problem determination, and availability management, and
also own their automation. While performing these tasks, they will obtain a good
understanding of how the software performs from a real-life perspective.
Networking Specialists
The networking specialists can own and manage the network. They should pay
particular attention to any networking functions provided by the software under
test and ensure all of the latest features are enabled. Another role for them is to look
Emulating Customers to Find the Bugs That Matter 229
at the integration of the software’s networking capabilities with many other pieces
outside of the specific test bed. The network specialists might, for example, create
a portal to enable the rest of the team to remotely monitor and control the target
software, or link it to a shared patch download repository.
Security Specialists
The security specialists control all aspects of security for the entire IT environment.
They are responsible for the security of the network and manage the firewalls and all
other protected resources, such as files and databases. They work with all of the other
IT team members to provide the level of security required or mandated. These spe¬
cialists also might take the approach of integrating the software’s authentication and
authorization mechanisms with an external application managing single sign-on
across the environment.
Database Administrators
The DBAs own the data and define and manage the attributes of the databases.
They need to understand the database requirements of the software under test as
well as the applications built to drive the new functions. They can then work with
the systems programmers to define the backup and archiving requirements of the
pseudoproduction environment. The DBAs also provide the care and feeding of the
databases, all the while noting the database server software’s behavior and how it
interacts with other software in the environment.
Performance Analysts
Every customer wants to squeeze the maximum amount of work out of their com¬
puting resources. That’s where the performance analysts come in. These experts
capture data on resource consumption, tune systems to achieve peak performance,
and project future capacity needs based on recorded performance trends. Most in¬
dustrial-strength software product teams have testers responsible for measuring per¬
formance in search of possible bottlenecks. However, such experimentation is
normally done in unconstrained environments under carefully controlled con¬
ditions. Ask those same testers to perform the role of performance analyst in a
customer-like, pseudoproduction environment where the conditions are anything
but well controlled. That experience may open their eyes to weaknesses in the soft¬
ware’s performance-data-gathering capabilities and lead to improvements that cus¬
tomers will greatly appreciate.
Application Programmers
The application programmers have a key role in this environment as well. They
provide the much-needed programming skills to build the infrastructure and
230 Software Testing Techniques: Finding the Defects that Matter
framework within which the business will operate. Without them, you aren t going
to accomplish much. Similarly, the test team will be coding test cases, workloads,
and tools that establish the testing fabric within which the test will be conducted.
Testers as Educators
A valuable byproduct of taking an approach similar to the one just described is the
hands-on, real-life experiences obtained by the test team. This experiential knowl¬
edge gives the testers an excellent foundation. They are now better able to relate to
the software’s users and observe software behavior from a different perspective. If
fact, the test team can become an acknowledged competency center of hands-on
knowledge that can be of great value to the development, service, and customer
communities. Testers also become a resource for educating customers, using lan¬
guage and experiences the customers can relate to, and offering hints and tips for
things to do and to avoid. This can be done via documentation the test team cre¬
ates or direct discussions with customers. A huge side benefit is the common bond
that develops between the testers and customers. This elevates the perception of the
contributions of test and adds to the team’s credibility.
Continuous Execution
at the most inopportune time, or other problems that only occur after extended
execution.
is pulled or a critical service killed. The tester operating the environment now can ex¬
perience firsthand the software’s capabilities to support customer needs. Observe the
effectiveness of the error messages presented—did the operator take the most appro¬
priate recovery actions? Was the problem easily recognized and isolated? If not, why
not? What were the effects on the workloads and test cases running? How long did it
take the operator to accomplish complete restoration of service?
Another opportunity you may have to more closely emulate customers is to ap¬
proach the overall management of your test environments in similar fashion to cus¬
tomer management of IT environments. Many of the tasks you need to perform to
create and manage test environments are the same set of tasks customers deal with
on a daily basis. Although test environments tend to be much more dynamic and
ever-changing than typical production environments, there are still some things
you can do to emulate customers and gain their perspective.
Centralizing Support
There are not many software development companies or organizations that develop
only one piece of software. Many times there are multiple test teams that test differ¬
ent software in the same lab. When this is the case, an opportunity arises to emulate
customers by centralizing many of the management activities of that test lab.
Emulating Customers to Find the Bugs That Matter 233
Asset Management
The physical assets required by the test teams can be managed centrally. Although
the test lab may support multiple teams, you can view it as a single entity. Manag¬
ing all the hardware requirements, ordering the hardware, installing the hardware,
and keeping track of inventory are some responsibilities that can be assigned to the
asset management team. Another set of responsibilities can be controlling the
power and cooling requirements of the test lab and scheduling electrical work by
outside vendors.
How does asset management help a software test team emulate customers? It
provides the capability to take more of an end-to-end view of a realistic environ¬
ment. One challenge all companies face today is how to control costs. Centrally
managing hardware assets can do just that. It opens up the possibility to most effi¬
ciently use the hardware in the lab. Specifically, it lets the sharing of hardware re¬
sources to occur more easily. Instead of each test team obtaining its own hardware,
all of the requirements can be pooled, and fewer pieces of hardware may be needed.
This is how many customers manage their hardware requirements. It then allows
the test team to operate under customer-like constraints and see how their software
behaves in a heavily shared environment. Feedback should be given to the devel¬
opment teams in the event that the software does not fit seamlessly into the shared
environment.
Testers can gain more experiences with more realistic environments, instilling a
customer-like mindset.
SUMMARY
Testers strive to find as many defects as they can, and customers want to avoid de¬
fects at all costs. These diametrically opposed objectives create challenging issues
for testers, yet are not contradictory with the idea of emulating customers. Porting
real applications from customers to use as a tool for driving new software can be a
good means of reaching some test goals but there are lots of cases where that is not
the total answer. Other approaches can prove just as, if not more, valuable at find¬
ing the defects that matter. In an SVT, very specific software activity goals can be set
from in-depth review of customer data. Some test time should be spent looking at
production aspects of software such as continuous execution and restoration of ser¬
vice after a failure. Viewing the test lab as a production environment also instills
more of a customer mindset in testers and exposes them to additional demands.
We have discussed enough about preparation, planning, and thinking about
testing. At some point, you’ll actually have to perform the tests. It’s time for the
main event. Chapter 15, “Test Execution,” will describe some critical issues to con¬
sider when executing tests.
V Executing the Test
B ugs can be elusive little critters. They hide in the dark recesses of sophisti¬
cated software and resist detection. If they do surface, it may be at inoppor¬
tune moments when your attention is directed elsewhere. A good test will
roust bugs from their hideouts, snare them in a finely-crafted net, and cart them off
to oblivion. A sloppy test, like a torn net with gaping holes, will let bugs escape.
You'll want your net to be tight and unforgiving.
Even with strong planning and preparation, executing a test isn’t easy. It’s a
complicated, chaotic affair that demands care and attention. The next four chapters
will guide you through the process. We’ll look at structuring activities around dif¬
ferent test execution sequences, from traditional to iterative to artistic. You’ll see
how to detect when a bug is tangled in your net, and approaches for diagnosing it.
If your testing gets bogged down due to insufficient hardware, we’ll reveal several
ways to expand your options through the magic of virtualization. Or, if the hard¬
ware you require is so new that it’s still under development when you’re ready to
begin, you’ll see how simulators and prototypes may be able to save the day. You’ll
also learn techniques for controlling the inevitable chaos associated with testing in
order to keep your execution on track. Finally, we’ll discuss the tester’s weapons for
getting bugs fixed, and what you can do if the end date arrives but your testing isn’t
finished. Now toss out your net, it’s time for a bug hunt.
\
■
m
15 Test Execution
In This Chapter
■ Test execution flow
a Artistic testing
■ Iterative testing
■ Detecting problems
■ Diagnosing problems
■ Testing the documentation
Y ou stride down a narrow, shadowy corridor. The silence is eerie, broken only
by the soft padding of your own footsteps on the beige, industrial carpet.
Ahead, a lone steel door blocks your path. Your fingers squeeze its cold,
sculpted handle, then yank. The door swings open, and out rushes the mechanical
hum of machines. You step into the computer room and see that ahead, a crowd has
gathered. Developers, planners, and managers shuffle about, their movements tenta¬
tive, uncertain. You approach. Suddenly, a finger points your way. The others look,
and stiffen. The throng slowly parts; all conversation hushes as you pass through. On
the other side, you reach your destination: a swivel chair, resting before a gummy key¬
board and dusty monitor. You sit. The crowd edges closer. Your gnarled, wizened fin¬
gers rise to the keyboard, and hover. The time for testing has come.
The beginning of your actual test may not be quite this dramatic, although after
all the studying, digging, planning, reviewing, preparing, and scheduling, it might
237
238 Software Testing Techniques: Finding the Defects that Matter
seem that way. Executing a test is the best part, the point where all your advance work
pays off. But there are several variants and techniques to consider. A fair ly mainstream
sequence of events usually occurs, although its flow can vary. There are also some
more unusual testing approaches which are very useful for certain situations, but their
execution models are quite unique. No matter what your test execution looks like,
you’ll need to be able to spot problems when they happen, and then figure out what
to do with them. And be careful not to overlook the documentation bad books can
cause a customer just as much trouble as bad code. Let s look at each of these eclectic
areas, all of which may eventually touch you during the execution of a test.
Different tests and test phases have their own ebb and flow. No single sequence of
events is right for every case, but there are some common characteristics we can ex¬
amine. Let’s pick the SVT phase as an example, figure 15.1 shows a typical flow.
Installation
Regression Regression
Load/Stress
Mainline Function
HW Interaction Security
Migration Serviceability
Data Integrity
Artistic
Recovery
Reliability
Establishing a Baseline
You can’t do much with software until you install it, so at least a portion of your in¬
stallation testing must get done immediately. After that, the first order of business is
to validate that entry criteria have been met and to establish a stability baseline. Run¬
ning an automated regression test case suite at moderate levels of stress is a solid
means for achieving both goals, and so it’s the first test in the sequence. Once most of
the regression tests run without error, a natural next step is to increase the regression
workload’s intensity. This signifies the onset of load/stress testing and morphs the re¬
gression suite into the first stress workload. More workloads will follow as the target
software is beaten and battered over the course of the entire test, with load/stress act¬
ing as either its own focus area or as a backdrop against which other testing occurs.
Mainline Activities
Once the software proves it can handle a heavy load, that load can begin to serve as
background noise for attacks on new or changed mainline function. Other tests can
begin at this point as well. If support for new hardware is part of the test, you may
want to start that work now to allow for the possibility of long turnaround times for
fixes to defects you find in the hardware itself. It’s also logical to begin migra¬
tion/coexistence testing as early as possible, since you may be able to cover much of
this testing as you migrate your own test systems from the software’s prior release.
Security and serviceability testing can also commence at this point, or wait until
later as shown in Figure 15.1.
If data integrity is a concern for the software, it’s prudent to allow considerable
time to explore it. Problems in this area are notoriously tricky to solve, so they can
significantly elongate the test process. However, since new mainline functions are
often the focus of data integrity worries, it is frequently best to wait until those
mainline functions have proven they can survive some initial, basic tests before pro¬
gressing to the more sophisticated integrity attacks.
over a power cord. Allow time to attempt this test several times before achieving suc¬
cess. Also, if regression test suites have not been included in the daily stress runs, or
if the defined regression tests include some manual activities, then it s a good idea
to re-execute them near the end of the test. This ensures previously-tested support
hasn’t suffered collateral damage from a steady stream of fixes.
Staying Flexible
There’s no magic to this sample flow. You might choose to do all migration testing
last, or before starting your initial regression tests. If a main feature of the new soft¬
ware is its ability to recover from failures, then it makes sense to begin testing re¬
covery at the same time as other mainline functions. Maybe your window for
artistic testing should be wider than we show here. Or perhaps you won t have any
predefined window for artistic testing, but will jump into artistic mode whenever
the results of your planned tests suggest you should. The point is not to pay homage
to some specific sequence of activities, but rather to think about what makes sense
for your test, and choose a flow that is not only effective, but efficient.
ARTISTIC TESTING
The renowned British economist, John Maynard Keynes, was once accused by a
critic of changing his stance on a subject. He reportedly responded, “When I get new
information, I change my opinions. What, sir, do you do with new information?”
[SchochOO] Keynes could just as easily have been talking about artistic testing.
We first touched on artistic testing, also known as exploratory testing, in Chap¬
ter 7, “Test Plan Focus Areas.” It recognizes that you’ll always know more about a
piece of software at the end of a test than you do at the beginning. Rather than ig¬
noring that reality, artistic testing exploits it. It encourages you to take new infor¬
mation and insight that you acquire during the test, and use it to spontaneously
devise fresh scenarios that were not considered when the test plan was originally
conceived. It’s a creative process that Bach defines as simultaneous learning, test de¬
sign, and test execution [Bach03].
Artistic testing is a test focus area whose content isn’t actually devised until the
execution phase is well underway. There are several different paths you can follow
and you may not know which is more appropriate until the moment for artistic
testing actually arrives. Let’s briefly look at each.
Structured
One approach is to pause for a moment, step back, and reassess the testing under¬
way. Is it becoming apparent that the existing test plan misses an important area of
Test Execution 241
the software? Or are there areas that are covered, but insufficiently? Are there envi¬
ronmental or configuration enhancements needed to fill gaps in the testing? Are the
test workload suites sufficient, or should they be tweaked or different combinations
tried? Brainstorm the possibilities, and jot down your ideas. These notes then be¬
come your artistic testing road map. This structured approach is almost like infor¬
mally creating a new, minitest plan, but without all the formal documentation,
reviews, and signoffs. As you pursue your new road map, additional quirks in the
software may become apparent. Follow them, and see where they lead.
Investigative
It’s likely your test plan will provide fairly broad, even coverage of the target soft¬
ware’s functions and subfunctions. Once the test is underway, you’ll typically find
that some functions are working quite well, and others are more troublesome.
These problem-prone areas represent soft spots, portions of the software that for
one reason or another are weak.
The second approach to artistic testing is to probe the soft spots. Investigate
those very specific areas. Poke and prod some more. Dig deeper. Use your intuition.
Be creative. Go where the software and its problems lead you. The theory here is
that bugs in a section of code are like cockroaches in a house—the more you find,
the more likely there are others lurking in the dark. The simple fact that you have
found an unusual number of defects in one area suggests there are more to find.
The investigative approach encourages you to react to conditions the software pre¬
sents to you, trying wild combinations of things on the spur of the moment that
would never have occurred to you during the formal test plan development process.
Customer-like Chaos
Artistic can be defined as “showing imagination and skill.” These are just two of the
many traits of an effective tester. We can take that imagination and skill and com¬
bine it with spontaneity. What we get is organized chaos.
With this approach, the entire test team simultaneously uses the software for its
intended purpose, each in his own way, making up tasks as they go. Focus areas
might be divided among the team, but otherwise there is little coordination of ac¬
tions. This mimics the kind of random usage that the software may see in the pro¬
duction environment. Such chaos is likely to force the execution of code paths that
more structured testing may miss. Better yet, if that chaos can be oriented around
the kinds of activities likely to occur in a customer environment, the defects it un¬
covers will also align with those that might occur in production. This approach fits
well at the end of a test, as a final check of the software after all planned tests have
been completed.
242 Software Testing Techniques: Finding the Defects that Matter
Executing to a test plan is important, but so is keeping a close eye on the experi¬
V T/P
ences gained during its execution. When those experiences suggest more testing is
needed, be flexible. Develop and perform additional test activities based on the new
insight.
The FVT test team members pondered how they could stabilize the environ¬
ment. They decided to take an artistic approach and define some organized chaos.
The primary technique was to drive the entire package of components with random
use and operational tests. These tests included exercising system services in all com¬
ponents, performing command-driven operations, injecting program errors, and
removing systems from the cluster, all while other testers were acting as users. Each
day, testers were assigned a different area on which to focus, infusing variability and
randomness to the activities. These spontaneous and pseudorandom operational
tests drove a set of unique paths that had not been driven in the isolated FVT envi¬
ronments. In fact, the operational tests drove paths similar to those that customers
were expected to use.
tJLy
Py Consider aligning artistic test scenarios with likely customer activity. It’s an excel-
V lent way to extract the most critical defects.
T/P
When faced with the need to quickly augment the planned test, work closely with
other testers and help each other, if possible. Most importantly, be creative and try
new approaches.
TIP
winds down, successful completion dwindles while scenarios await the last batch of
fixes, long-running reliability tests wrap up, and so forth. Software project man¬
agers around the world are intimately familiar with this curve, and are quite com¬
fortable projecting results from it. For an AVT, however, it’s useless.
In an AVT, each experiment unveils a new behavior that requires adjustment to
the algorithm. With each adjustment, the results from prior tests become suspect
and must be revalidated. These retests may bring to light more new wrinkles to be
accounted for—and force subsequent retests. This iterative loop suggests that no
experiment can be considered truly complete until all are successful, or very nearly
so. As a result, a plot of the test progress shows it appearing to limp along for quite
some time, until finally everything comes together and it spikes up. This flow is de¬
picted in Figure 15.3.
If you are involved in an AVT, one of the challenges you’ll face is educating
people that they can’t superimpose their classic test project management template
on your activity. Status tracking must be approached differently. Although you will
have crisply defined experiments, you can’t project when they will succeed. This
concept may blow some managers’ minds. But because in this model the test and
design are so intertwined, testing can’t complete (or even show significant progress,
in classic terms) until the designers and developers close in on the correct design!
In effect, attempting to project the end of an algorithm verification test is the same
as attempting to project a moment of invention.
Test Execution 245
In reality, an AVT could continue almost indefinitely, because it’s always pos¬
sible to do a little more tweaking to further optimize a complex algorithm. But
eventually a point of diminishing return will be reached. The trick is recognizing
when you’ve hit that point. You have to learn to see when something is operating
acceptably, rather than perfectly. This is not a job for purists. The best approach for
knowing when you’re done is to have one or more experts on the algorithms spec¬
ify detailed criteria up front about how good is “good enough,” and then stick to it.
Status tracking in this environment is tricky, and it’s unlikely you’ll get far with
an approach that just says, “trust me.” A useful technique is to try to chart progress
against defined goals, but with soft and movable target dates. Break the definition
of the test into logical chunks. For each chunk, you can track whether code is avail¬
able, testing is underway, and if a particular algorithm has met some predefined exit
criteria. Unfortunately, while this data will show a lot of movement early in the test,
soon it will tend to stagnate as all code is available and tests are underway, but few
algorithms are meeting criteria. You can solve this problem by providing backup
data, to reassure people that progress is indeed being made. List activities per¬
formed since the last round of status, accomplishments or progress, and any cur¬
rent obstacles. This kind of information will keep everyone informed, so your test
will seem less like a black box.
A New Component
In response, a new component was devised that would allow the system to self tune.
This new component was called the Workload Manager (WLM). It only asked the
system programmer to tell it, in simple human terms, the performance goals to be
246 Software Testing Techniques: Finding the Defects that Matter
achieved and a business importance for each goal. In fact, these goals were ex¬
pressed in terms similar to those found in a standard service level agreement (de¬
scribed in Chapter 2, “Industrial-Strength Software—It’s Not a Science Project”),
such as the desired response times and throughput rates (or velocities) for different
types of work. WLM would then feed that input into its sophisticated algorithms to
dynamically tune the system, make tradeoffs, and adapt ongoing processing to meet
the specified performance goals.
WLM is a part of the operating system that acts as the ‘traffic cop. ” It makes deci¬
sions about which vehicles are allowed on the highway and how fast they can go.
When the highway is not crowded, its job is easy. When there is a traffic jam, it en¬
Mon
sures ambulances are not delayed getting to the hospital by moving other vehicles
off the highway or routing the ambulance through streets that are not congested.
As you might imagine, replacing a set of arcane and sometimes conflicting op¬
erating system performance controls with a completely different set was hard. But
the task of devising algorithms that would self tune the system to meet requested
goals under many conditions was more than hard—it was mind-numbing. A series
of hypotheses was encoded into algorithms, but no one could be sure those hy¬
potheses were correct. The only way to validate them was to exercise the code as
early as possible. Unlike a traditional test, the intent was not to validate that the
code matched the specification, but to validate that the specification itself was cor¬
rect. It was like prototyping, but done on what would eventually be the real prod¬
uct code. It was a classic case of iterative development.
The testing effort described here was not focused on finding traditional bugs in the
sofware. It was focused on helping the design and development teams learn about
wre the decisions of the traffic cop. Detailed reviews of the actions of the traffic cop and
the resulting traffic flow were performed and, if needed, the traffic cop would be re¬
trained to make better decisions.
A standard FVT and SVT were performed against WLM to exercise externals and
other basic function. But because those tests could not expose the subtleties of the
algorithms, highly specialized tests were needed. These were incorporated into an
AVT, which started about the same time as the FVT and finished along with SVT.
The AVT team carefully constructed a series of experiments. Each was designed
to present a different situation for the algorithms to handle in order to observe how
they reacted. But it wasn’t enough to study the results from the outside. The team
needed to look at the actual internal values that fed into the algorithms to ensure
Test Execution 247
Special testing efforts may need special tools. Define the needs and requirements of
specialized tooling when planning. Include the cost of the tool development when
determining what the test will require.
Of equal importance with executing tests is the analysis of results. Plan for time to
review what has happened from the test execution. Also begin each test activity
with crisply defined objectives of what you expect to accomplish. Have a goal de¬
TIP
fined for each individual testing exercise.
Result analysis was a group activity, bringing together the designers, develop¬
ers, and testers to discuss what they were seeing. If actual results did not match ex¬
pectations, new theories were proposed. Once a single flaw was noticed in any
sequence of events, all subsequent results in that experiment were considered sus¬
pect. When a fix to that flaw became available, the change it introduced would po¬
tentially invalidate experiments that had previously succeeded, so everything
needed to be rerun with the new fix.
A clear and formal pipeline for turning around fixes was critical. The team de¬
pended on a continuous build process to fold fixes into the experiment engine. All
known fixes and design changes had to be incorporated before the next experiment,
or there was no point in running it.
Creating extremely close working relationships among testers, developers, and de¬
signers is a powerful approach for quickly and effectively validating and improving
complex software. Be an ambassador for this approach and work to make it a nor¬
mal part of the development process.
248 Software Testing Techniques: Finding the Defects that Matter
Prior to beginning anAVT, prepare project managers for the unusual test progress
curve they will likely see unfold, so they won’t be surprised.
Epilogue
Throughout the course of the AVT, the entire WLM team’s rallying cry was remi¬
niscent of the Hippocratic Oath: “First, do no harm to the system.” They worked
together to devise targeted experiments, create custom tools, analyze results, and it¬
eratively move forward one fix at a time. As a result, the team was able to stay
within the bounds of their motto while delivering complex, yet high quality, new
function. When WLM was delivered to customers it proved to be quite robust, and
served as a strong base upon which the mainframe’s ability to self tune has been ex¬
panding ever since.
CATCHING PROBLEMS
Regardless of the type of testing being performed, the whole point of doing it is to
find problems. Therefore, it is critical that testers take care to detect when a prob¬
lem has actually occurred. How can you be sure you aren’t missing any?
You will probably want to purchase or develop a tool to automate the moni¬
toring of the success or failure of test cases. If a test case runs without detecting an
error, the tool can delete all of its output and just add it to the “success” tally. This
automation will be most useful when you are doing automated runs of hundreds or
thousands of test cases. You'll want to know when a test case fails, but won’t want
to spend a lot of time reviewing the output of those that didn’t. By filtering out all
unnecessary output, this tool will be a great time saver and ensure that the failures
get noticed.
Abnormal Termination
For certain types of failures, the test case will not finish normally but instead will
abnormally terminate or crash. The operating system or software component will
terminate the current unit of work before it has an opportunity to provide an error
return code or issue an error message. Such crashes occur for different reasons, and
each typically has a corresponding failure code. These codes are normally docu¬
mented in a system codes manual, which includes a description of the type of fail¬
ure each termination code depicts.
When a crash occurs, a memory dump is typically generated. The dump is a
snapshot of memory at the time of the error and includes many operating system
control structures, registers, and memory areas related to the failing unit of work.
On some systems, there will be messages in the console log or system log alert¬
ing the operator that a dump has been taken, but on others, the dump may be qui¬
etly stashed away somewhere without any notification. Make sure you know where
dumps will be stored, and watch for them.
Wait States
The most obvious indication that something has gone drastically wrong is when the
operating system goes into a permanent wait state. This occurs with catastrophic er¬
rors when the operating system has determined it can not safely continue. The var¬
ious kinds of wait states are usually defined by wait state codes documented in a
system codes manual. A wait state code will identify what the failure was, but is un¬
likely to explain what caused it. In order to debug such a failure, you will probably
need to use a standalone utility to capture a complete memory dump prior to re¬
booting the operating system.
Error Logs
In some cases, it’s possible an error occurs but the software determines a dump is
unnecessary, so it simply recovers from the failure and continues on. This variety of
failure is usually written to an error log, which may look different depending on the
250 Software Testing Techniques: Finding the Defects that Matter
systems and products you use. Testers must review these logs often to see if any
“silent” failures are occurring.
Messages
System messages often reveal a lurking defect to the alert tester. When you execute
a scenario or run a load/stress job stream, monitor the system log for any message
that seems strange. Don’t assume that because you don’t understand the message,
it isn’t important. Quite the contrary is true. Look up the message to make sure it
is not a clue that something unexpected has happened. Some testers focus nar¬
rowly on the immediate task at hand, and miss bugs by ignoring “extraneous” mes¬
sages that they didn’t think concerned them. A good tester doesn’t stick his head in
the sand, he questions everything.
Another type of defect to look for is incorrect information in the messages
themselves. Always be on the lookout for these kinds of problems. They are most no¬
ticeable in messages issued in response to commands you enter. Is all of the infor¬
mation in the message correct? Is it complete, or is something important missing?
Carefully examine all error messages. Do they lead you to the correct under¬
standing of the problem? Do they supply enough information? If not, you might
have found a valid usability or serviceability bug.
PROBLEM DIAGNOSIS
* If an error message was issued from the system, look it up in the system mes¬
sages manual. Try to discover more detailed information about what it means
and (hopefully) determine the module that detected the problem.
■ In the event of a crash, look up the termination code in the system codes man¬
ual to learn what it signifies. From the resulting memory dump or related mes¬
sages, try to determine in which module the failure occurred and the location
of the failure within that module.
Test Execution 251
■ After a test case detects an error, look at the test case to understand which
function was being tested and what the surfaced error really indicates.
■ When there is a return code from a function that signaled an error, look it up
to find out what it denotes.
■ Insight into the architecture and instruction set of the computer the software is
running on
252 Software Testing Techniques: Finding the Defects that Matter
Once these questions are answered, you need to start backtracking from the point
of failure to find out where the incorrect information originated. As you are doing
this, keep in mind that the defect may be in the section of code that surfaced the fail¬
ure, or that code may be a victim of bad information generated somewhere else.
The debugger will typically begin by obtaining a program listing for the software
in question. This listing shows the low-level assembly and machine language in¬
structions that make up the compiled program, along with the offset of each such
instruction from the program’s beginning. Ideally, it will also show the high-level
programming language statements that the assembly language instructions were
derived from by the compiler. Detailed program listings such as this allow you to
see what the computer was actually executing as it worked through the program,
and can normally be generated by using particular options at compile time. For ex¬
ample, the GCC compiler found on Linux systems [GCC] can be invoked as follows
to generate a complete program listing in file “listing” for program “foo.c”:
Alternatively, if you are debugging a module for which a listing wasn’t gener¬
ated at compile time, you may be able to generate one from the program’s object
Test Execution 253
code. For example, the following will disassemble an object code file on Linux sys¬
tems to std out, and if the program was originally compiled with the -g option, the
resulting listing will even include the corresponding high-level language source
statements:
objdump -S foo.o
Once a debugger has the program listing, he will review it to understand the
flow through the program leading up to the failure, while also looking at the mem¬
ory dump to see what was contained in registers and memory locations when the
failure struck. This process often involves “playing computer” in reverse, which
means working backwards from point of failure, doing what the computer did for
each instruction. For example, when the instruction is to load a value into a regis¬
ter, you actually calculate the memory location and look at the value in that loca¬
tion in the memory dump. As you are doing this backtracking, you are developing
a theory that matches the evidence that you have gathered so far. Continue back¬
tracking while either validating or disproving the theory. You will normally reach a
point where you can prove a theory.
Some problems are relatively simple to solve. For example, a program may suf¬
fer an interruption because it tried to reference an invalid memory location. This
condition might have been caused by the caller of the program initializing a para¬
meter list incorrectly. But some problems are far more difficult to solve and may re¬
quire the tester to set traps and re-create the failure multiple times.
Test execution is not just about the software itself. Bad documentation is as likely
to cause problems for your customers as is bad code. If a crucial step is left out of
installation instructions, the install will fail. Even worse, the install may appear to
succeed, but key features of the software will be inoperable. Or, if the software pro¬
vides a great new feature but the documentation doesn’t tell the user how to set it
up and invoke it, the feature might as well not even exist.
Software users rely on documentation. The documentation could consist of
manuals, release notes, or help panels. Whatever the form, the content had better
be correct. Sitting around a table and reviewing documentation is important, but
just as in a code walk through, errors will slip past. By far the best way to discover
these errors or gaps is through testing. However, this doesn’t necessarily mean you
need any special activity. If you put some thought into it, your test execution can
cover the code and the text that describes it, all at the same time.
254 Software Testing Techniques: Finding the Defects that Matter
B Instructions
■ Examples
■ Messages
B Samples
If specific instructions are provided for certain activities, its likely you will have
test scenarios defined for those same activities (if not, you probably should consider
expanding your scenarios). When executing those scenarios, follow the docu¬
mented instructions exactly, looking for errors or omissions.
Similarly, step-by-step examples may be provided to explain GUI screen in¬
puts, clarify syntax for commands or other interfaces, show expected outputs, or il¬
lustrate other important points. If so, then it’s very likely you will need to exercise
those same areas when you test the software. Mirror those examples in your tests.
Watch for inconsistencies that might confuse the user, as well as outright errors.
Keep in mind that any user following these examples closely will be trying to un¬
derstand how something that’s new to them works. Any inconsistency could prove
perplexing, and therefore is important to fix.
When you hit a problem, for example, an error message, you should verify that
the documentation for the message is correct. The meaning of the error message
may be obvious to you, because by now you are very familiar with the software. But
a typical user most likely will not be as knowledgeable. So check to see if the error
message documentation adequately explains the error and what should be done to
address it.
Finally, samples are sometimes documented for such things as initialization or
tuning parameter input files. When these are supplied soft copy via a companion CD-
ROM or Web download, many customers will try to plug them in as is. If they are
only documented in hard copy text, some users will implement only excerpts, and
others will go so far as to type them into their system verbatim. In either case, you’ll
need to try these samples during your test execution to make sure they are valid.
The approach for managing documentation defects you’ll find should be de¬
cided up front. Some organizations treat defects in manuals the same as those in
code, using common defect reporting and tracking tools for each. Others use more
informal means for managing documentation defects. Either way will work. The
important thing is to acknowledge that these defects will exist, testing can find
them, and that a feedback loop for reporting them must be in place.
Test Execution 255
SUMMARY
Test execution is where the real fun begins, and there are many ways in which it can
unfold. A classically sequenced model can be organized in multiple ways, depending
on what is most efficient for your project. An algorithm verification test has its own
unique flow that is intimately linked with the design and development processes.
Artistic testing encourages you to design new tests during the execution phase, fun-
neling fresh experience with the software into new scenarios created on the fly.
No matter which execution model you follow, you won’t be successful unless
you know how to detect when you’ve actually found a problem. Strong testers don’t
just blindly report suspected problems to their developers—they spend at least
enough time doing level-one debugging to correctly identify the failing module
and check for duplicates. And no test is complete unless it examines the documen¬
tation as well as the code.
No matter what the nature of your test, in order to execute it you’ll need access
to computers. But what if your team doesn’t have enough hardware available to do
the job? Chapter 16, “Testing with a Virtual Computer,” shows you a way out of
this dilemma through the magic of virtualization.
A
Testing with a Virtual
Computer
In This Chapter
a Subdividing a real computer for testing
B Using a virtual computer for testing
B Combining partitioning and virtualization
■ Leveraging virtualization to automate complex test scenarios
B A Case Study: a closer look at z/VM
W hat is one thing that all software testers require before can they attempt
a single test scenario? They must have a computer system on which to
run their software. Indeed, for large, complex software projects they usu¬
ally need more than one.
Unfortunately, as wonderful as computers are, they cost money, and that’s
something test teams often have in short supply. But that doesn’t reduce the need
to create a variety of suitable environments for performing a thorough test. Mad¬
deningly, computers often woefully underutilized during testing, particularly unit
and function verification testing. Wouldn’t it be amazing if there were a way to har¬
ness that unused capacity to create several test environments from a single com¬
puter? If you think so, then you’re not alone. Let’s examine how it’s done.
257
258 Software Testing Techniques: Finding the Defects that Matter
PARTITIONING
One technique for creating multiple test environments from a single computer is to
partition it into multiple, distinct entities. There are different ways to go about this.
Physical Partitioning
From the early 1970s until the late 1990s a number of high-end IBM mainframes
had the capability to be electrically split into two separate computers. This allowed
the customer to either run the computer with all of the CPUs, main memory, and
I/O paths as one big system image or split it into two smaller separate computers.
The split was done on an electrical power boundary so that either side could be
powered off without affecting the other. The split was typically down the middle.
When operating in split mode, the two sides were completely independent and dif¬
ferent operating systems could be loaded on each, as shown in Figure 16.1. Similar
approaches are now available on high-end Reduced Instruction Set Computers
(RISC) and Intel systems. This subdividing is called physical partitioning.
Operating Operating
System System
hardware, while others only support a single, specific operating system. All offer a
very coarse granularity for the division of resources, which cannot be shared be¬
tween partitions and must be allocated in whole units. For example, a single CPU
must be assigned to one partition; it can’t be split or shared among them. The par¬
titions can only communicate with each other through external connections (e.g.,
networking), as if they were physically discrete machines.
Logical Partitioning
A refinement of this concept was developed in the 1980s on mainframes and is called
logical partitions (LPARs). LPAR capability is now available on other classes of com¬
puters as well, typically at the higher end. They provide a way for the customer to di¬
vide the real computer into a number of smaller, logical computers, as shown in
Figure 16.2. All of the LPARs images don’t need to be the same size. The customer
configures how many CPUs, how much main memory, and how many I/O paths are
connected to each LPAR image. Both the CPUs and the I/O paths can either be
shared by some or all of the LPAR images or dedicated to a single LPAR image. The
maximum number of LPAR images that can be defined within one computer is typ¬
ically 32 or fewer. Each LPAR image is isolated from every other LPAR image, cre¬
ating what is almost like a series of independent, smaller computers.
1 LPAR Management
Physical Machine
Software Partitioning
Finally, there is software partitioning. This approach employs control software that
works in concert with a specific operating system to subdivide its resources. See Fig¬
ure 16.3. These software-defined partitions are capable of running distinct in¬
stances of a specific operating system. User Mode Linux (UML) is a popular
example. UML implements the platform-dependent portion of the Linux kernel
(the arch layer) to run on the Linux software “platform,” rather than on a specific
hardware platform. Essentially, it ports Linux to run on top of Linux. This linkage
means that each UML partition is limited: it can never run anything but Linux. On
the other hand, like Linux itself, UML isn’t tied to a specific hardware architecture,
Host OS Applications
Software Software
Partition 1 Partition 2 A
VIRTUALIZATION
So far, we have looked at some techniques for subdividing a computer into a num¬
ber of smaller images. Now we are going to look at another solution to supplying
the testing environment, which has some additional capabilities. This solution gives
each tester one or more virtual computers to test on. The virtual computer would
appear to the software running inside of it to have all of the resources that are avail¬
able on a real computer. For example, it would have a certain amount of main
memory, virtual I/O devices, and one or more virtual CPUs, each of which would
have all of the hardware facilities of a real CPU. The intent is to provide an envi¬
ronment that appears to the operating system, middleware, and applications to be
a real, physical computer so they can run unchanged. Not only that, but the virtual
computer may have the ability to emulate hardware that might not actually exist on
the underlying physical machine.
Emulation
One way to build this virtual computer environment is to write an application that
emulates the hardware platform on which the software under test will run. This
emulator needs to decode each instruction and then do exactly what the instruction
is designed to do. It runs as an application on a host operating system, so from that
perspective it seems similar to the software partitioning approach shown earlier in
Figure 16.3. But, unlike the software partitioning approach, the emulated environ¬
ment is not tied to a particular operating system. Assuming it faithfully emulates
the targeted hardware architecture, it can support any operating system enabled for
that architecture.
Typically, an emulator does not run on the same hardware platform as the one
being emulated. The intent is usually to provide a machine of one architecture on
top of a machine of a different architecture. For example, the FLEX-ES emulator
from Fundamental Software, Inc. emulates z/Architecture™, Enterprise Systems
Architecture/390®, and System/370™ on an Intel Pentium® processor. Likewise,
the open source Bochs emulator can run on a RISC-based system and emulate the
Intel x86 architecture.
Multiple instances of an emulator can often run on a single physical computer,
each supporting a single emulated computer. Each emulated computer can run its
262 Software Testing Techniques: Finding the Defects that Matter
Hypervisors
Another approach to building a virtualized environment is to run the virtual com¬
puter natively, suspend it any time it executes an operation that would affect the
state of the real CPU, and emulate that operation before returning control to the
virtual computer. The software that does this is called a hypervisor, or virtual ma¬
chine monitor. A hypervisor gives each user his own virtual computer, also called a
virtual machine or guest. The virtual machine that each user gets appears to have a
certain amount of main memory, a defined number of CPUs, and a list of I/O de¬
vices at various device addresses. The hypervisor handles the mapping of physical
resources to emulated logical resources that are assigned to these virtual machines.
It provides very fine-grained control over resources. Also, there doesn’t need to be
a one-to-one correspondence between physical and logical resources, although the
closer that mapping is, the better the performance will be. For example, a virtual
machine could be defined with more CPUs than actually exist on the physical com¬
puter, or with only 10% of a single CPU.
Sugerman et al. note that unlike an emulator, a hypervisor gets out of the way
whenever possible to let the virtual machine execute directly on the hardware, so
virtual machines can achieve near-native performance [SugermanOl]. However,
that means a hypervisor running on a machine of a particular architecture can only
support virtual machines of that same architecture. A single hypervisor can also
support multiple virtual machines concurrently, with the maximum number rang¬
ing from dozens to hundreds depending on the underlying hardware platform.
Each guest can run a different operating system, or different versions of the same
operating system, if desired. Also, virtual machines on the same physical computer
can often communicate with each other at memory speeds, through virtual net-
Testing with a Virtual Computer 263
works managed by the hypervisor. Operating systems typically don’t require any
special modifications to run in a virtual machine—they think they’re running on a
real computer.
Virtualization through hypervisors is not a new concept. It originally grew out
of work done on IBM mainframes in the mid 1960s [Creasy81]. Early IBM an¬
nouncement materials for the technology drew a comparison between virtualization
and light streaming through a prism. It suggested that just as a prism could break a
single beam of white light into multiple beams of different colors, so could virtual¬
ization break a single mainframe into many virtual computers [Varian97]. Similar
technology has now penetrated other server classes as well, and its availability often
reaches from the high end to the very low end of a given server line. There are a cou¬
ple of different approaches for implementing virtualization via a hypervisor. Let’s
briefly look at each.
Native Hypervisor
A native hypervisor runs directly on the server hardware, as shown in Figure 16.4.
It’s essentially an operating system with hypervisor capability. This closeness to the
hardware maximizes its control over system resources while minimizing perfor¬
mance overhead. However, it puts the burden for supporting devices on the hy¬
pervisor, adding to the hypervisor’s complexity and cost. This technology is
Hypervisor
typically quite robust, performs well, and is suitable for production use. Examples
include the IBM z/VM mainframe hypervisor and the VMware™ ESX Server™ hy¬
pervisor for Intel servers.
Hosted Hypervisor
The complexity involved in supporting a full suite of hardware devices can be avoided
by offloading this chore to a host operating system. That’s the technique used by a
hosted hypervisor, which coexists with a host operating system on the same machine.
Sugerman et al. describe an architecture for this, in which support is split between a
hypervisor component that virtualizes the CPU, and an application component that
runs on the host operating system and handles I/O for the guest [SugarmanOl]. A
driver loaded into the operating system connects the two. See Figure 16.5.
Host OS
Applications
Guest OS
I/O Emulation
Application
T \ W A S: * |j| '
Physical Machine
V % # # # # Jfr H g
....—.....j
FIGURE 16.5 Hosted hypervisor.
There is a price to be paid for the convenience of leveraging the host operating
system for device support. The hosted hypervisor gives up a degree of control over
system resources. It also incurs additional performance overhead due to the heavy¬
weight context switching between the host operating system and hypervisor worlds,
particularly for I/O-intensive workloads. Examples of this technology include the
VMware Workstation and GSX Server™ hypervisors, which both run on top of either
Windows or Linux host operating systems on Intel hardware.
Testing with a Virtual Computer 265
It s worth noting that partitioning and virtualization technologies are not mutually
exclusive. They can be combined in various ways to further maximize the utiliza¬
tion of a physical computer. For example, a single server could be divided into a
dozen logical partitions. Several of those LPARs could each be running a native
software hypervisor, which in turn subdivides each LPAR into dozens of virtual
machines. This approach is depicted in Figure 16.6.
Guedt Gu est
Ooer3tir Od eratirig
Systesms Sy stems
Operating Operating
Native System Native System
Hypervisor
_H_
LPAR Management
Physical Machine m
_
FIGURE 16.6 Virtualization within logical partitions.
WHY GO VIRTUAL?
Now that we have seen several ways that a virtual computer can be built, let’s ex¬
amine some of the additional capabilities that are often available in these environ¬
ments and can aid testing.
266 Software Testing Techniques: Finding the Defects that Matter
Multinode Testing
What about tests that require multiple systems working in concert, such as
client/server or Web services networks? Multiply the number of required network
nodes by the variety of supported operating systems and by the size of the test
team, and the machine requirements could fly off the scale. Most teams would not
test every possible combination in such a case, but would instead pick and choose
some meaningful subsets. Nevertheless, the number of machines required could be
large and, combined with the necessary networking switches and routers, could eas¬
ily exceed the team’s hardware budget.
Again, virtualization can come to the rescue. A collection of virtual machines
could be interconnected through a virtual network, all within a single, physical
computer. The hypervisor could emulate Ethernet connections between the guest
images using memory-to-memory data transfers, creating a mininetwork operating
entirely inside that computer (without requiring any additional external network¬
ing hardware). On z/VM these virtual networks are referred to as guest LANs; other
hypervisors use different names for the same concept.
Testing with a Virtual Computer 267
Debugging
Some hypervisors offer robust capabilities for setting breakpoints and traps for use
in testing and debugging. For example, a breakpoint could be set so that when an
instruction at a specific location is executed, a value at another memory location is
displayed on the console, and the execution of the guest is resumed. By managing
breakpoints at the hypervisor level, the entire system image (including the guest op¬
erating system) can be frozen when a breakpoint hits, offering the ultimate in con¬
trol over the debugging environment. Also, the fact that breakpoints are being set
and hit is invisible to the software stack running in that guest—even the operating
system doesn’t realize it’s being temporarily stopped and resumed by the hypervisor.
This is a very important difference when compared to the artificial environment typ¬
ically created by an interactive debugging tool. Also, the interactive debugging func¬
tion is ordinarily going to use services of the guest operating system. That means, for
example, the interactive debugger cannot be used to set breakpoints in those parts of
the operating system because it would cause unexpected recursion.
Automation
Hypervisors often provide scripting capability, which allows the user some level of
programmatic control over the virtual machine. This support offers enticing op¬
portunities to automate tests that otherwise could only be performed manually.
To get a better feel for how a hypervisor operates, it might help to take a closer look
at one. For example, let’s examine the granddaddy of them all, z/VM, and how it is
used in testing z/OS.
Hardware Architecture
To understand this discussion, it will be helpful first to understand a little about the
underlying hardware infrastructure upon which z/VM is deployed. The zSeries ar¬
chitecture has two classes of CPU instructions and two states that the CPU can op¬
erate in: problem state or supervisor state. It is a basic design principle that there
will always be an operating system running in supervisor state and it, in turn, will
run application programs in problem state.
Privileged Instructions
Privileged instructions are a carefully grouped subset of the CPU instructions that
can only be executed when the CPU is in supervisor state. The remainder can be ex-
Testing with a Virtual Computer 269
ecuted when the CPU is either in problem or supervisor state. This grouping allows
the operating system to isolate and protect itself from problem state application
programs and similarly each problem state application program is isolated and
protected from all others. An example of a privileged instruction is the one that sets
the time-of-day (TOD) clock. There is one TOD clock per CPU, which the hard¬
ware continuously updates with the current time. Any program can view the con¬
tents of the TOD clock, but we would not want to allow problem state application
programs to update it. In the latter instance, every other program that viewed it
would see the altered and possibly wrong value. Privileged instructions such as this
are reserved for the operating system’s use.
Program Interruption
It the execution of a privileged instruction is attempted while the CPU is in prob¬
lem state, it will be suppressed, and a privileged operation exception program inter¬
ruption will occur which the operating system must handle. Building on this
function, z/VM was developed as a hypervisor operating system capable of running
other operating systems in a virtual machine in problem state. Because the virtual
machine operates in problem mode, z/VM can recognize any time the guest at¬
tempts to alter the CPU state.
z/VM Structure
z/VM has two main components—the control program (CP) and the Conversational
Monitor System (CMS). CP is the resource manager of z/VM. It creates the virtual
machines in which users can run either CMS or other guest operating systems.
In early versions of z/VM, CP ran the virtual machines in problem state directly
on the real machine until an interrupt caused CP to regain control. One such inter¬
rupt was a program exception. In this case, CP had to determine the reason for the ex¬
ception generated by the virtual machine. Some program exceptions would be passed
to the guest operating system; others, which could be caused by the guest operating
system executing a privileged instruction, required further checking. If the guest op¬
erating system was in virtual supervisor state, then the instruction needed to be sim¬
ulated by CP. If not, the program interrupt was passed to the guest operating system.
To further improve the performance of virtualization, in the early 1980s a
processor-assist feature was added to the mainframe architecture. One function of
the assist was enabling a guest to execute both problem and supervisor state in¬
structions, with the latter behaving differently when required to preserve the in¬
tegrity of the guest environment. The processor assist eliminated the need for z/VM
to emulate privileged instructions, and so reduced the performance overhead of
running guests.
270 Software Testing Techniques: Finding the Defects that Matter
CP Commands
A little more background on CP is helpful before we dive into an example of how
z/VM has been used by testers. There are a number of CP commands that can be
used when operating the virtual machine to simulate the external controls on a real
mainframe processor. Some of them are:
Debugging
The CP trace command allows you to set breakpoints to stop execution of the guest
at key trigger points, and display and alter main memory and any register available
to the guest operating system. You can then either resume the execution where the
virtual machine stopped or at another memory location. There is also conditional
logic within the breakpoint function to allow programmatic control over its flow.
The setting of breakpoints under z/VM can be combined with scripting capabilities
to create a powerful tool for test automation.
When choosing your virtualization solution, be sure to look for functionality that
will allow you to establish breakpoints to stop and then restart program execution.
This capability gives you flexibility for debugging all types of problems.
Modifying Memory
You can also display or modify memory of the guest to learn whether processing is
occurring correctly or to inject errors. Modifying guest operating system memory
is primarily done during unit verification test, where the focus on a single pro¬
gram’s processing is important. Function verification testers generally don’t mod¬
ify guest memory since their primary goal is to create all situations using external
means. However, there are times when such modification is necessary, particularly
in recovery testing.
Testing in a virtualized environment can benefit from the ability to modify guest
memory. Look for that capability to enhance your functional testing.
TIP
The purpose of the trap was simply to freeze the system at a particular point in
the targeted program’s execution in preparation for phase II. Each of the steps was
performed manually. When the trap sprang, phase II began. The tester would inject
a programming failure through a variety of techniques. This was mostly done by
overlaying the next execution address in the operating system’s program counter to
zero. It just so happens that for z/OS, the value at location zero is never a valid in¬
struction. After the guest operating system was resumed, it immediately tripped
over this invalid program counter value and an operation exception program inter¬
ruption occurred.
At this point, the recovery routines cleaned up the execution environment,
gathered diagnostic information for problem resolution and, in most cases, retried
the failed processing. The tester’s responsibility in this manual approach was to en¬
sure that the operation was retried (if appropriate), see that the proper dump data
and trace records were saved for later analysis, and verify that the system and this
component or function could continue to service requests.
This ability to freeze the entire operating system, poke through its memory, in¬
ject errors, and then resume was quite powerful. In some sense, it was like a scene
from a science fiction movie, in which the lead character can stop time. Everything
around him is frozen, but he is still able to walk around, snoop in people’s houses,
and cause mischief. That’s the power of virtualization. But the FVT team wanted to
go one step further and leverage this power to automate the otherwise purely man¬
ual scenario.
of understanding and acting across the two environments conceivable. Still, any tool
that hoped to control the entire scenario needed to possess two qualities:
■ Visibility into the state of the operating system running within the virtual
machine
■ Ability to invoke hypervisor actions
As mentioned earlier, the z/VM virtualization support includes a means for the
software running within the virtual machine to tell the hypervisor to stop that vir¬
tual machine and invoke a command or series of commands. The z/OS FVT test
team created a custom tool that exploited this feature.
274 Software Testing Techniques: Finding the Defects that Matter
JCz
Look for virtualization support that will allow you to stop the guest system to per-
(p form a command or series of commands.
TfP
But how did the automated test case find the targeted program’s address in
order to set the trace trap correctly? Because the tool ran on the instance of z/OS
under test, it could search system memory and control structures to locate the pro¬
gram and place its address into a variable. Knowing this address, the embedded CP
command could set a trace trap at a desired offset past the targeted program’s start¬
ing address, then inject a program failure (as described earlier), and finally resume
the guest operating system. The tool had now created a trace trap within a specific
routine in the operating system without any manual intervention by the tester.
Combine the capability of virtualization with tooling that can use information of
the guest machine to provide powerful automation.
T/P
The New Phase II: Injecting the Error and Monitoring Recovery
Now, a test case needed to somehow drive the relevant code path through the tar¬
geted program. If the test case did this successfully, the trap sprang, the guest oper¬
ating system was stopped, and CP executed the action identified on the trace trap
command. In this case, that entailed placing a value of zero as the next address to
execute and then resuming the guest operating system.
After the guest was resumed, an operation exception program interrupt was
immediately generated. This appeared as a true failure condition inside the targeted
program from which the system had to recover. In turn this drove the RTM pro¬
cessing to give control to recovery routines that were in place to protect the targeted
program. The recovery routines then funneled diagnostic information into proces¬
sor memory dumps, trace files, or log files. After the debugging information was
gathered, the recovery routine indicated, in most cases, to retry the processing that
originally failed.
Testing with a Virtual Computer 275
Without the capability of virtualization and breakpoints, this type of recovery sce¬
nario would have been nearly impossible to create on the native zSeries hardware
platform. But virtualization provided the infrastructure and tools not only to create
such a scenario, but to automate it. This capability proved to be a tremendous help in
finding the problems that matter to customers in support of continuous availability.
FIGURE 16.8 Process flow for integrating z/VM commands into z/OS test cases.
Not only can yon automate complex scenarios but you can automate the validation
of their results. Extend the automation to include self-verification techniques.
TIP
SUMMARY
The ability to subdivide a single physical computer into many smaller test environ¬
ments is very powerful. It can significantly expand the number of images available
for testing across operating systems or multinode configurations. It offers oppor¬
tunities for rapid system provisioning and saving snapshots of test machines. It also
presents a wide variety of possibilities for debug and test automation.
Even if you do take advantage of virtualization, there are situations when it’s
hard to begin testing because some portion of the underlying infrastructure you
need is not available. In Chapter 17, “Simulators and Prototypes,” we’ll see how you
can begin testing before all the pieces exist through the use of simulation and proto¬
types.
Simulators and Prototypes
In This Chapter
■ Simulators and prototypes
1 What to do when the hardware isn’t ready
■ Tools for dealing with schedule mismatches
■ The perils of simulation
■ A prototype case study
Let’s define a couple of things before we start, the first being a simulator. A simu¬
lator is a piece of software or hardware that mimics some other piece of software or
hardware. In order to do that, the simulator’s external interfaces must look as much
277
278 Software Testing Techniques: Finding the Defects that Matter
like the real thing as possible. But a simulator might be mostly an empty shell that
does only a subset of what the real thing will do.
The second item is a prototype. Across a variety of engineering disciplines, a
designer will often create a prototype when he wants to explore various design op¬
tions. A prototype is a crude, stripped-down version of something to be delivered
later. It’s used to explore certain characteristics of the real thing. Software engineers
use prototypes to evaluate the merits of different algorithms, for example, or to see
how different elements will interact when they come together. Unlike a simulator,
a prototype’s main mission is to do real work. It’s not expected to be high quality
or necessarily even perform well, unless exploring performance is its main objec¬
tive. A prototype is a quick-and-dirty way to see how aspects of the ultimate deliv¬
erable should eventually run. It’s similar to how hardware engineers will build a
prototype or model of a new piece of hardware to use for testing before the real
product goes into manufacturing.
Let’s assume you get into a situation where you have software to test but the neces¬
sary hardware is not in sight. This is most likely due to time constraints. This is
common when time-to-market is a key driving force. To shorten the elapsed time
of a project, you need to do as many things as possible in parallel—and that can
mean that hardware and software are developed simultaneously. Hence we experi¬
ence the problem of “nowhere to test” whenever the hardware development takes
longer than the software development or the hardware is not stable when it is first
married with the new software.
Nowhere to Test
You’re ready to start your UT and FVT, but the prototype machine isn’t ready yet.
If you have the luxury of waiting, you can delay any testing until the first prototype
is ready. Since it will elongate the overall project schedule, this is most likely not a vi¬
able option. On the contrary, you’ll probably be under pressure to do some amount
of testing before you get the working hardware. Now it’s time to be creative.
SIMULATION
When you can’t wait for the new hardware, consider simulation. This technique al¬
lows you to construct a tool that mimics the real hardware while doing a minimal
amount of work. The simulator absorbs input from the software and responds as
defined in the specifications, so from the software’s perspective it appears as though
it is interacting with real hardware. Sometimes a simulator will need to do some
limited amount of real work in order to react properly to the input it receives. For
example, if it is simulating an I/O device that has data storage capability, then it
may be necessary for the simulator to also store data. But that real work is merely a
byproduct of its simulation activities, not its main objective.
A powerful capability of simulators is having a command or parameter inter¬
face that can be used to modify how the simulator responds to input. For example,
an external switch could cause the simulator to return an error condition on the
next command it receives. Or, a parameter to adjust the I/O response time could be
used to simulate a hardware device timeout. Both of these capabilities can be ex¬
tremely useful in your UT and FVT.
The advantages of using simulators are not confined to the hardware realm.
They can also be quite valuable when the development schedules for two different
software packages that interoperate with each other don’t align. If one package
needs the other in order to begin testing, simulation may break the logjam. In some
sense, even simple stub routines and scaffolding (discussed in Chapter 3, “The De¬
velopment Process”) are forms of simulation. Both receive input passed to them
and respond in a manner within the bounds of what the caller expects. They are, in
effect, simulating a software module that hasn’t yet been created.
For the purposes of our discussion, we’ll focus on the more specialized case of
software simulation of new hardware. You’ll find that many of the concepts apply
equally well to simulation of new software.
will perform, and what will pass back and forth on the interfaces between the soft¬
ware and hardware. Your tool should mimic this interface traffic. Treating the new
hardware as a black box, the tool doesn’t need to do everything the new hardware
will, it just needs to react the same way on the defined interfaces.
Attributes
Whatever tool you develop must:
■ Accept all valid commands that can be sent to the display. The tool should
do a reasonable amount of validity checking of the data that is sent with the
commands.
9 Reject all invalid commands with the correct error codes. The simulator should
also be designed to record these errors in an error log. This is recommended be¬
cause you’ll more than likely need to investigate why something is attempting
to send these invalid commands to the display.
M Respond to commands with the same kind of timing that the real display will
have. This probably means you will need to insert some timing delays in your
simulator.
Simulators and Prototypes 281
H Comply with any rules demanding that certain commands only be performed
in a particular order.
* Send joystick input to the I/O driver in the same way the real hardware will. If
collisions can occur between commands sent to the display and input sent from
the joystick, your simulator needs to be able to create that scenario as well.
Degree of Simulation
You have a choice to make with your simulator tool when establishing the extent of
simulation you need to do. One option is to discard the display data because your
tool only needs to respond on the interface like the real graphical display will. An¬
other possible option is to convert the three-dimensional data into two-dimensional
data and display that on a normal graphical terminal. The latter option requires
more work, but it can help you surface specific kinds of defects earlier or ensure the
integrity of information targeted for the display (e.g. the display data may be correct
except the image is upside down).
■ The degree of testing performed on the I/O driver using the simulator
■ How well the simulator matched the real display
H The hardware team’s thoroughness in testing the new display
■ Whether each team’s implementation of the specification matches
Simulation can be a very valuable tool for the test team when schedules don’t line
up. But there are dangers that await a development organization that relies too
heavily on a simulator if it deviates from the way the real hardware works. Let’s take
a look at a classic case of such overzealous use of simulation.
internal cache was not accessible to the operating system through normal instruc¬
tions. Instead, new instructions were added to copy pages back and forth between
normal memory and expanded storage. This hardware support was first delivered
in a new family of mainframe processors.
Testing Dilemma
Naturally, software support was required in the operating system to exploit this
new hardware feature. The new code was developed at the same time that the
processor hardware itself was being built, which was scheduled to be ready at ap¬
proximately the same time as the start of SVT. This timeline put UT and FVT in a
bind. Without the hardware, they had no way to test the new code. Yet due to tight
schedules, the system test team was counting on those other tests to get the code
reasonably stable prior to the onset of load/stress testing.
Solution: Simulation
Faced with this dilemma, the team decided to create a simulator that would be based
on the use of a virtualized environment. Additions were made to VM, the mainframe
hypervisor discussed in Chapter 16, “Testing with a Virtual Computer,” to imitate the
new instructions for copying pages to and from expanded storage. The development
effort was not too large, and because the requirement was foreseen well in advance,
the simulator was completed before the operating system code was even written.
When faced with a major schedule mismatch between codependent products, con¬
sider imitating one of them with a simulator.
“ ”
The development team took advantage of the simulator’s early availability. De¬
velopers checked their new code against it every step of the way. In effect, they used
the simulator as an architecture-compliance tool, making sure that the code they
wrote adhered to the new expanded-storage architecture by running it against the
simulator. This approach kept development moving briskly, and before long the
code was ready for FVT. The FVT team ran a battery of tests against the expanded
storage support, using the simulator in lieu of actual hardware. Several bugs were
found and fixed, and eventually the code was deemed ready for SVT.
T/P
That explained why the error escaped. But why wasn’t it detected when the
simulator was first used during UT? A sheepish developer admitted that he hadn’t
bothered to look at the actual hardware specifications. Instead, he’d simply written
his operating system support to match the simulator.
Each of the three teams (hardware developers, software developers, and simulator
developers) should have built their respective components to match the design spec¬
ifications, rather than matching how one of the other teams built their component.
wre
Simulators and Prototypes 285
The problem was easily fixed and testing quickly resumed. Overall, the simula¬
TIP
tor was quite useful for its intended purpose. It enabled the team to complete early
testing and to find and fix several bugs. Only when it stepped outside of its true role,
by being used as substitute for the product specification rather than as a test tool,
did it lead to trouble.
PROTOTYPES
Prototypes are not only useful for software and hardware designers. There are oc¬
casions when they can be helpful to testers as well. When hardware and software
schedules don’t line up, and a simple simulator isn’t sufficient to fill the gap, a pro¬
totype can sometimes save the day. It can provide the tester with temporary hard¬
ware upon which to test until the real hardware is available.
A hardware prototype is not something the test team can usually create entirely
on its own; help from other teams is needed. Also, the situations in which such a
tool plays a role are fairly specialized. Most testers will never have a need to test on
a hardware prototype. Nevertheless, it’s an interesting concept; one that may give
you perspective on how testers can react to a seemingly impossible situation by ex¬
panding their vision of what is possible and then making miracles happen.
In this next example, we’ll look at using a prototype to emulate a new proces¬
sor architecture. By necessity, the case study requires a certain amount of technical
discussion about the architecture being emulated. If this is beyond your need or in¬
terest, you may want to skip ahead. But if you’d like a glimpse into how an effective
prototype can be built and exploited for testing, or you are simply curious about
how this kind of thing is done, then read on.
guts of the system, affecting such things as memory managers and interrupt han¬
dlers. The actual composition of the hardware instructions that make up a com¬
piled computer program have to change to account for the increased address size.
Even the CPU’s general purpose registers (GPRs), often used to calculate memory
addresses, must increase in size to hold all 64 bits. Doing all of this in a compatible
way that allows older, 24-bit and 31-bit applications to work correctly together
with the newer 64-bit ones on the same machine makes the job even more difficult.
In the face of such dramatic change, thorough testing is crucial. However, the
software team rarely has time to wait for the real hardware to be available before it
begins. Let’s see how z/OS development solved this problem.
The hardware team planned to do extensive testing of all the new 64-bit support
using its own customized tools. However, experience had shown that no matter
how much testing was done on a new processor using architecture verification
tools, it was never enough. Additional problems always surfaced once the operat¬
ing system was run on the new processor itself. In effect, the z/OS operating system
made a great hardware test tool.
In order to hold the hardware delivery date to customers, it was critical that the
hardware test team had a stable level of z/OS already running in 64-bit mode when
they were ready to verify the new processor. That meant that the z/OS team needed
to find a way to test the 64-bit support before the new machine was functional.
Option One
One possible solution was based entirely on virtualization. The idea was to modify
VM to emulate the new architecture completely. However, because VM itself would
still be running on a 31-bit machine, the hardware would not understand the for¬
mat of the new 64-bit DAT tables. That would force VM to simulate the use of 64-
bit DAT tables and do all of the virtual-to-real address translation processing for
each instruction. It was determined that the performance of the guest would be so
slow that it would not meet the testing needs.
Option Two
The team also considered modifying an existing machine to use DAT tables in the
new format. This option wouldn’t have been practical on a machine where hardware
managed the DAT function, because it would require the engineers to redesign and
build new processor chips. But in this particular case, one of the earlier-generation
machines used firmware for the DAT function. This opened up the option of mod¬
ifying the firmware so that it could handle DAT tables in both the old and new for¬
mats. It would also be possible to update the firmware with some of the new
architecture instructions.
When selecting a base upon which to construct a prototype, review a wide range of
possibilities. You may be surprised by a choice that can make the development job
much easier.
T/P
The remaining new instructions, those that could not be added to the firmware,
needed to be simulated. The logical approach was to modify VM so that it would
simulate the changes to the GPRs, the new instructions, and the like. With this
combination of changes to VM and to the firmware, some simulation could be per¬
formed by VM, and the DAT work would be handled by the hardware prototype.
Consider the use of simulation with your prototype. Combining the two techniques
can result in a very efficient approach.
T/P
The Choice
Given these benefits, the obvious choice was option two. The 64-bit prototype was
built as described and operational for more than 12 months before the first real
288 Software Testing Techniques: Finding the Defects that Matter
machine was scheduled to be powered on. The z/OS developers made extensive use
of the 64-bit prototype and were able to complete UT, FVT, and some limited SVT
of z/OS. The UT and FVT team members had all of the VM capabilities that they
were accustomed to. While the performance of the guest z/OS was not great, it was
acceptable on the 64-bit prototype. It certainly allowed for the functional verifica¬
tion of the operating system.
Success or Failure?
This combination of a special hardware prototype and VM doing some simulation
was judged a great success. When the real machine was declared stable enough to
try, the team was able to boot z/OS in 64-bit mode on it within two days. By that
second day, they were able to run batch jobs and log users on.
Much more testing continued after this first boot. The hardware testers now
had a stable 64-bit operating system that they could use as a tool for pressuring the
new machine. And z/OS testers continued their SVT, now on the real hardware.
SUMMARY
Simulators and prototypes are not needed every day. But when the need does arise,
they can be lifesavers. Either one can help you shorten the elapsed time of a project
by imitating new hardware before it is built, and allowing you to overlap software
testing with hardware development. For a successful outcome, it is very important
that a simulator mirrors the responses of the new hardware as closely as possible
and that it is only used for its intended purpose. No matter what hardware envi¬
ronment you rely upon, you’ll face chaotic situations during the testing cycle. In
Chapter 18, “Managing the Test,” we’ll explore ways the tester can gain control of
the chaos and be the master of his testing domain.
Managing the Test
In This Chapter
■ Testing the correct code
■ Reining in testing chaos through run logs
■ Avoiding oversights with an end-of-day checklist
■ Techniques for tracking problems and scoring defects
■ Measuring success as a weapon for getting bugs fixed
■ Practical solutions when a test fails to meet its end date
C haos, disorder, confusion: sounds like a typical day of software testing. From
a distance (say, in a magazine article—or a presentation to your executive
management), testing may appear to be a sleek killing machine, methodi¬
cally working its way through one scenario after another, smashing bugs with ruth¬
less precision. But look more closely and the glossy veneer begins to fade.
■ A step is omitted on page 13 of the user’s guide you’re following for configur¬
ing a networking application. But it’s not until hours later, when you reach the
critical point in a test and the application won’t connect to a partner system,
that you begin to suspect something is wrong.
■ While working through a well-planned recovery scenario, suddenly more
things than expected begin to fail. The system console lights up in a meteor
shower of error messages. The first in a series of 50 tests uncovers a severe bug
that blocks all the others. Instantly you’re behind schedule before you’ve really
289
290 Software Testing Techniques: Finding the Defects that Matter
begun. Your end date is a week away, testing is behind, and the list of open bugs
keeps growing. Management insists that delaying the software’s roll out would
be catastrophic, but is eager to hear what other suggestions you might have.
The notion that the act of testing software really isn’t a neat and tidy activity
should not be surprising. Chaos theory teaches us that the world is always more
complex up close than is seems from a distance. Benoit Mandelbrot, the father of a
branch of chaos theory known as fractal geometry, once made this point by ob¬
serving that, for example, clouds are not spheres, mountains are not cones, and
bark is not smooth [Mandelbrot83]. Why should we expect software testing to be
any different?
Once you accept that the testing process is messy, you can take steps to rein in
the chaos. There are certain problems and difficult situations that most testers are
likely to face. There are also techniques that can be used in the face of these troubles
to maintain some measure of control. With some forethought and care, you can
even harness the chaos to work for you, rather than against you. Or, in the words of
Shakespeare’s Polonius, “Though this be madness, yet there is method in’t.”
A new tester was once handed what seemed like a good introductory assignment.
An existing piece of software was being updated, not with new functions, but with
new algorithms to improve its performance and efficiency. To test this update, he
could make use of test cases and scenarios from prior tests, minimizing his upfront
preparation needs while also teaching him how to find and access that archived ma¬
terial. But the required testing wasn’t trivial. Indeed, it was fairly time-consuming,
and would give him a good introduction to the test environment as well as his
team’s processes and techniques.
Once the new code became available, he grabbed it and dutifully began his
work. Two weeks into the test, everything was going smoothly. While all those
around him struggled through problem after problem on other new functions of
the software, his testing sailed along without hitting any such roadblocks. This test¬
ing thing is easy, he thought. After a month of testing he finished, a full two weeks
ahead of schedule. Unfortunately, he hadn’t found a single bug, but he shrugged
that off. After all, it wasn’t his fault if the software happened to be rock solid.
Then it happened. Another tester on his team accidentally stumbled over what
turned out to be a major bug in his assigned code. The defect was a glaring error.
Our young tester was embarrassed and dumbfounded that he hadn’t caught it. His
test plan had been thorough, and his execution flawless. What had gone wrong? A
tickle of worry began to gnaw at his gut. Quietly, he did a little investigation. He
dumped one of his target modules on his test system. Then he did the same on the
Managing the Test 291
test system of a teammate. The module had an eye catcher at the top that was
human-readable in a hexadecimal memory dump. The eye catcher contained the
module s name and the date it was last compiled. He compared the two dates. In an
instant his stomach began to churn, his hair follicles stood on end, and a bead of
sweat started trickling down between his shoulder blades. The dates didn’t match.
In fact, the date from his system was over a year old. He’d just spent a month test¬
ing the software’s prior release.
Protecting Yourself
Though this story may seem like an extreme case, smaller-scale instances of testing
the wrong code happen to testers every day. In this example, a complete rebuild of
the new software was done every six weeks, with individual fixes for problems
found by a tester applied daily as needed. But many software projects follow weekly,
or even daily, build cycles. With that much churn, it’s not only possible, but prob¬
able that you’ll find yourself in a similar situation unless you take great care. If the
software you’re testing has line mode commands or GUI actions to show the exact
version of software on your test system, use them every time you pick up a new
build. Or, dump a sample of the actual executable code like our tester did, but do
so before your test, rather than after. Protect yourself from the embarrassment of
being told the spiffy “new” bug you’re so proud of was fixed two weeks ago and
your system is down-level. Always confirm that you are really testing what you
think you are.
A day of good testing can be both exhilarating and chaotic. It’s rewarding to watch
as an intricate scenario you devised reveals a devastating bug, and fun to follow a
trail of error messages and unexpected results to the hidden lair of a new defect. But
it’s also often muddled and confusing. Despite having a strong plan in place, test¬
ing is often not a linear activity. There may be some serendipity involved. To un¬
derstand the twists and turns that testing can take, it may help to consider a few
situations in which you might someday find yourself.
with a story about his run in with a grizzly on a recent backpacking trip to Alaska.
Finally you return to your test system and redo that last action you undid, but for¬
get to first redo the prior one. BANG! A bug hits. Excited, you collect documenta¬
tion on the error and file a bug report. A week later, the developer (or you, if you
are doing your own debugging) realizes more information is needed to debug the
problem. First, of course, you need to re-create it. How exactly did you do that?
Testing lore suggests that over half of the bugs found in any large-scale test are
not uncovered by exactly following a detailed plan step by step, but rather through
these kinds of subtle and unplanned deviations. Whether this is really true is un¬
known, but at times it certainly seems that way. In the case of artistic testing, it’s
that way by design.
pulling fresh copies of the software under test from the same build server. Sadly,
shared resources are just as likely to fail as dedicated ones, but the impact is more
widespread.
When a failure hits, someone has to do something about it. If it’s a hardware
failure, this may involve gathering some diagnostic information and contacting the
appropriate support team. If the build has a problem, you might follow a similar
approach to notify the build team. But what if you spend an hour of precious test
time collecting failure documentation and tracking down the right person to whom
you should report the problem, only to find out one of your teammates beat you to
it—three hours earlier? It only you all had a common place to post problems, ac¬
tions, and resolutions for such things, so time wouldn’t be wasted.
problem number in the run log too—that will help you quickly find the entry later
if needed, either for further debugging insight into the circumstances surrounding
the error, or for re-creating the bug to gather more diagnostic information.
Similarly, if you trip over a system or application problem that takes you a
while to resolve, list that in the run log. Describe the symptom clearly, so that oth¬
ers experiencing the same problem will find it and your resolution when they
search. Also, if you make configuration, tuning, or other changes to any shared
servers or other common resources, highlight them in your log too—you might
even want to indicate changes in bold or in color so they catch the eye. In the event
that your changes have unexpected downstream effects that impact others on your
team, your log entry can help them unravel the mystery of “what changed?”
summary section from the first hit on your search doesn’t show RCV0041 as fully
completed, just keep searching until you find the entry that does. The full history of
that scenario’s execution is right there at your fingertips.
We’ve discussed how the input to a run log is more important than the tool you
use to create it. Still, it is worthwhile to consider recording your run logs with a tool
that also allows a direct connection to your status tracking database. If marking the
success or failure of a scenario within your run log also directly updates your team’s
status, you’ve added convenience and saved time.
AN END-OF-DAY CHECKLIST
At the end of a vigorous day’s testing, you might be inclined to wrap up quickly and
head out the door. But what if you come in the next morning refreshed and ready
to dig into a problem you had hit, only to discover that you forgot to save a critical
memory dump or system log? No problem, you can just go back to your test ma¬
chine and retrieve that data—unless someone else has reused that machine for an¬
other test and wiped everything out.
Just as a grocery list helps you remember what food to buy, a simple end-of-day
testing checklist can go a long way toward helping you avoid an oversight that you
will later regret. Formality is not important here. The checklist need not be any¬
thing more than a few reminders scratched on the front page of a notebook. It can
include anything that you or your team deems important. For example, a checklist
might include:
■ Were any memory dumps taken? If so, have they been archived?
■ Have system logs been scanned for potential bug symptoms? If any anomalies
were found, have those logs been archived?
■ Have appropriate problem records been opened for problems found?
■ Has a run log been completed?
■ Does the summary section include all attempted scenarios, with an indication
of whether they were completed or not?
■ If any problem records were opened, have they been noted in the log?
■ Were any problems encountered with servers or other physical resources
shared by the team? If so, have those problems been reported to the proper
people for resolution?
■ Does a new build appear to have problems? Have they been reported?
This isn’t rocket science. The trick, however, is to actually create and follow
such a list, rather than just assuming you will remember everything. Doing so pro¬
vides another weapon in your arsenal for maintaining control in the face of chaos.
296 Software Testing Techniques: Finding the Defects that Matter
Problems are the tester’s pot of gold. You hope that all your planning, preparing, and
testing will result in a nice big cache of potential defects. But a good tester doesn’t
just report a problem and forget about it. He must take ownership.
Staking a Claim
Ensure a problem is reported to the proper people. If someone other than you is re¬
sponsible for debugging it, check with them periodically to assess their progress to
see where your problem is in their queue. Be a squeaky wheel. However, it’s wise to
use discretion. How frequently you follow-up should depend on the problem’s
severity and its impact to further test progress. Constantly nagging others over low-
severity problems will quickly make you a pariah in the organization. But hot prob¬
lems demand fast turnaround, and you have every right to push for a quick
resolution if your problem deserves one.
Once the problem has been debugged, a fix will need to be generated. Monitor
the status of fixes for all of your defects. Once a fix is available, it’s time to do your
part. Be prompt about applying it and verifying it does indeed resolve the problem.
When you’re satisfied that the fix works, don’t delay in marking the problem as
resolved.
■ The person who reported the problem and will be responsible for verifying the fix
■ The date it occurred
is The software’s build version
S Severity and impact
■ Who is assigned to debug it
il Its current state (e.g., unassigned, opened/assigned, in debug, re-create re¬
quested, fix being created, fix available, fix under test, closed)
Description of the problem, the circumstances surrounding its detection, and
its consequences
■ Diagnostic data (e.g., memory dump, system logs, traces) attached or referenced
■ Whether or not a bypass is available
8 Fix designation or number
® Resolution description
Managing the Test 297
Problem Scoring
The issue of problem severity versus impact is interesting. Many teams simply score
a problem’s severity on some predefined scale, such as one to four. The score of a de¬
fect depends on how its symptoms would affect a customer’s operation. A score of
one might mean the product is essentially inoperable, is corrupting data, or experi¬
encing some other critical symptom. Two and three would be progressively less se¬
vere scores, with a four indicating a minor issue such as incorrect message wording.
This kind of scoring is both common and quite valuable for prioritizing debug
and fix-creation activities. Unfortunately, by focusing purely on symptoms, infor¬
mation is lost. While the problem’s symptoms might only rate it as a severity three,
it may touch an area that is central to a new function and therefore block the exe¬
cution of many test scenarios. For that reason, some teams augment problem sever¬
ity with the notion of problem impact. The problem’s impact score depends not on
its external symptoms, but rather on the degree to which it is blocking test progress.
This extra information enables prioritization decisions to be based on a more com¬
plete picture of the damage the problem is causing.
Database Access
The problem tracking database should be accessible to both the test and develop¬
ment teams—or should it? If the development team is responsible for debugging all
reported problems, then they’ll certainly need access to the problem records. But
what if the test team takes ownership for debugging the problems they uncover, as
suggested elsewhere in this book? In this case, it might make sense to create a two-
tiered problem reporting structure. Tier One would be reserved solely for tester ac¬
cess. Any suspected problem would be recorded here. Once the problem has been
debugged, if it turns out to be a valid, unique defect, then it would be promoted to
Tier Two. The problem description in this second tier would be concise and to the
point. It would focus on the actual lines of code in error, along with an explanation
of contributing factors and resulting symptoms.
Benefits
There are a couple of benefits to the two-tiered approach. First, by isolating the first
tier to the test team, it encourages testers to record any anomaly they feel is a po¬
tential problem. They don’t need to know at that point if it’s a defect, only that it’s
something that requires further investigation. Since the audience for these records
298 Software Testing Techniques: Finding the Defects that Matter
is limited to the test team, testers are not inhibited. They can create as many entries
as they like, without worrying about flooding the development team with lots of re¬
ports that may ultimately go nowhere. Because everything is captured, subtle errors
won’t be lost or forgotten.
The second benefit to the two-tiered approach is that the second tier will con¬
tain almost nothing but valid, unique bugs. It won’t be cluttered with user errors,
test case problems, nonreproducibles, and so on. It’s not unusual for some of these
“nonproblems” to make up over half of what a prolific test team reports. With the
two-tiered approach, the development team doesn’t have to waste time weeding
through this chaff, and can instead focus on the wheat—real defects.
Let’s assume your testing is so successful that you uncover an ever-growing moun¬
tain of bugs. The biggest problem then becomes getting them fixed. Without fixes,
a good tester eventually becomes a victim of his own success. As the heap of bugs
mounts, so does the list of test cases for which progress is blocked. Eventually all
testing avenues become choked with defects, and there’s little for the tester to do
but watch the clock as the end date nears and testing stagnates.
You need a weapon to solve this problem. For many teams, the weapon of
choice is a set of well-honed measurements. Good metrics don’t necessarily serve
up convenient answers so much as lead people to ask the right questions. Unfortu¬
nately, when used improperly, measurements can become a source of endless busy-
work, paper shuffling, and fancy charts with a value far below their cost. But when
used correctly, a good measurement can succinctly communicate to those running
the project which defects are putting the end date in jeopardy, so the key ones are
fixed promptly. ^
Measuring Success
If you find yourself fascinated by the world of measurement techniques for test proj¬
ects, you’re in luck. There are countless books devoted to software testing processes,
each touting a variety of software metrics, reliability models, and predictive analysis
techniques (for an excellent example, see Kan [Kan02]). Such an in-depth discussion
is beyond the scope of this book. We’ll stick with a simple yet effective approach that
captures effort expended, effort remaining, and effort blocked, while also highlight¬
ing the impact of defects on testing progress.
Managing the Test 299
A Bare-bones Approach
Your test plan will have a list of variations or scenarios to execute. Perhaps the sim¬
plest technique for measuring your progress is to count how many variations are
successful each week and plot them on a graph over time. However, this method
omits important information. It does not account for variations that have been at¬
tempted but are not yet successful, so it does not present a complete picture of the
testing effort that’s been expended so far nor the effort that remains. It also does not
draw attention to tests that are blocked or the defects that are blocking them.
A Multistate Approach
A better approach is to view each variation as being in one of five states:
Other states are possible. For example, you might include an “unblocked” state
for variations that were previously blocked but now are available to execute. You
could also include a “fixtest” state for variations that were previously marked as
failed but now have fixes available. You can probably think of other useful states,
but these initial five make a good core set.
This finer-grained classification offers a more accurate picture of each varia¬
tion’s status, and clearly identifies those that are impacted by defects. But it still has
some weaknesses. It doesn’t distinguish between a variation that takes three minutes
to execute and one that takes three days. All are treated equally, so the true effort lev¬
els aren’t reflected. Also, it’s not obvious how to translate these classifications into a
simple chart that will clearly convey the team’s status trends. For example, a pie chart
could show a snapshot of the current spread of variation states, but a pie chart can’t
give a view of progress over time. A series of graphs could be plotted week to week
for each possible variation state, but multiple graphs could muddy the team’s main
message.
300 Software Testing Techniques: Finding the Defects that Matter
Weighting
These weaknesses can be addressed through a further refinement: assign a weight to
each variation corresponding to the estimated amount of time it will take to perform.
For example, you might assign weights on a scale of 1 to 10, with the higher weights
associated with the more complex scenarios. Or you could define a weight of one for
any scenario that will require some baseline amount of time, then assign a two to any
scenario that will take twice that amount of time, and so on. If you choose this latter
approach, don’t worry about getting too granular. Many teams have been successful
by simply assigning a value of one to any scenario that will require up to a half day, a
value of two for a scenario expected to need between a half and a full day, and so on.
As long as the same baseline is used for scoring all scenarios, the actual value of that
baseline doesn’t matter. What’s important is that you capture the relative differences
in effort required for the scenarios, not any absolute value.
Combining these weights, or “points,” with target dates for the completion of
each variation allows you to create a fairly accurate graph of expected test progress
over time. Plot total cumulative points on the Y axis, and time on the X axis. As
variations are completed, their points are added to the current test total. Partial
points can also be claimed on a sliding scale for variations in “attempted” status.
Every week (or whatever reporting interval you choose), you plot the new grand
total on your graph. The result is a nice, clear chart showing testing progress over
time. When scenarios are in a failed or blocked state, you can sum up the points for
each of those scenarios and quickly convey the total amount of testing effort that is
being delayed by the offending defects or late code deliveries. An example of this is
shown in Figure 18.1.
With these relatively simple techniques you can do a reasonable job of convey¬
ing the testing effort that's been expended, the effort remaining, and the impact of
defects on testing progress. Undoubtedly, this combination does not deliver a mea¬
surement nirvana. But it does the job and requires fairly minimal effort, so more
time can be spent on testing than tracking. For many testers, that’s close enough.
Measurements as a Weapon
Testers are sometimes annoyed by the need to track their progress and link bugs to
variations, even when the effort required to do so is minimal. They shouldn’t be. In
fact, testers should love tracking this information, because it’s a weapon they can
use to ensure that bugs are fixed.
When testing is behind schedule, the test team leader becomes a natural target
at project status review meetings. All eyes are on him as he defends his team. Fie can
panic. Or, he can display a simple graph that clearly shows actual versus projected
test progress, with the difference representing a realistic, effort-based gap. On the
same chart he can include an indicator showing what percentage of that gap is due
to unresolved defects. Next, he can show a chart listing all outstanding defects, or¬
dered by the amount of testing progress each is blocking. At this point, all attention
will shift from the test team leader to the development leads. The burden suddenly
falls on them to explain why development is behind in fixing defects, and what they
are going to do to catch up. The result is that the most critical bugs get the prompt
attention they deserve, and maybe testing has a good chance to finish on time.
It’s the rare test project that finishes with time to spare. Instead, the final stages of
most tests seem more like a mad dash for the finish line. As the end date nears and
defect reports continue to mount, nervous project managers start to look for ways
to, well, manage. Hopefully yours won’t go so far as to order you to stop finding
bugs so the code can ship.
302 Software Testing Techniques: Finding the Defects that Matter
It’s not necessary to spend long in test before you come face to face with a sim¬
ple reality: code ships before it’s perfect. That fact is clear from the number of fix
packs that supersede almost all complex commercial software. Testers don’t like this
reality too much. They’d rather see the code they’re testing be defect-free before it
goes out the door, but that isn’t realistic. As we’ve noted elsewhere, all complex soft¬
ware has bugs. Combine that with another popular testing axiom, namely that you
can’t test quality into a product, and the conclusion is inescapable: testers will never
be happy with what ships. They’ll always wish another bug could have been re¬
moved. The trick is to get beyond the desire for perfection to be able to see the dif¬
ference between software that is ready to go, and software that is truly in trouble.
Good exit criteria can help here. But even when it’s clear the software isn’t ready, the
challenge is figuring out what to do and convincing decision makers to do it.
Risks
First, identify the risks. Are there known defects that have not been fixed? The an¬
swer to this is frequently “yes.” How severe are they, both singly and cumulatively?
For example, a single bug that causes the corruption of critical data is devastating.
Other bugs may not be overly troubling when looked at individually, but when
viewed as a group clustered in a particular function they may be crippling.
Are there major functional areas that have not been fully tested, perhaps because
scenarios are blocked by open defects? Are those areas optional, or are they central
to the software’s operation? Try to project worst-case and probable-case failure sce¬
narios for those areas, based on defect trends the testing has revealed so far.
Impacts
Once the risks are identified, assess their potential impact on the software’s next
consumer. If another test phase is due to receive the code, how likely are they to be
affected by the risk? Is it even worth their while to start in light of known deficien¬
cies, or will those problems cause so much redundant error discovery and debug
that starting now will actually lengthen the project’s overall timeline? Are areas that
have yet to be tested critical to the next test phase’s planned activities? Can their
entry criteria be met?
If the software is due to ship to customers, how will they react to the identified
risks? Will they be warned about them? If so, will they refuse to install it? If not,
what happens when they put the code into production and those risks manifest
Managing the Test 303
themselves? Will their business be seriously impacted? Will their confidence in the
product be permanently shaken? Will their satisfaction level drop off a cliff? Will
they denounce your product at user group conferences or in Web chat rooms?
By coolly articulating both the potential risks and likely impacts of prematurely
ending a test, you can remove the hysteria surrounding an approaching end date,
both from testers who want a perfect product and project managers under pressure
to release it. The team can then focus on devising realistic solutions.
The End Date Arrives and Testing Isn't Finished: Now What?
Despite Herculean efforts by the test and development teams, you’ve reached the
end of the line. The end date has come, and the test team has not achieved its exit
criteria. Major bugs remain unresolved, significant chunks of testing are not com¬
plete, or both. You recognize the code may never be perfect, but as a tester and the
software’s first real user, you can’t sign off on releasing it as is. It’s decision time.
First, a reality check: as a tester, the choice of whether or not to ship a product
isn’t yours to make. It’s a business decision to be made by senior managers who will
take into account many factors, only one of which is test status. However, it’s likely
the test team will have an opinion on what that decision should be. What solutions
might you recommend? How can you convey your opinion in a way that’s con¬
vincing? There are several possibilities.
Delay Shipment
This one is every tester’s favorite, and sometimes it’s the best answer. If problems
in the software are serious and pervasive or testing is far behind in major functional
areas, there may be little choice but to extend the end date. That choice is rarely
made lightly, however. If the test team feels strongly that a delay is required, they
must present their rationale to the decision makers in a precise and powerful way.
In such situations, data talks. Your problem tracking and test measurement
charts should provide a solid base, but may not be enough. Look for ways to strip
out extraneous information and present your core message in a compelling way.
For example, if you want to convey that many areas of test are blocked, that story
may not come through clearly in a standard “S” curve showing overall test progress.
Consider creating a bar chart, with each bar representing a major functional area of
the software and showing its planned versus actual accumulation of testing points.
If many bars register a low score, the true state of the situation becomes obvious.
intertwined with the rest of the offering, pulling it may not be realistic. In that case,
it is sometimes possible to fence off the function, leaving it in place but disabling it.
In fact, if testers are involved during the code design phase, they can sometimes
sway the development team to create code that is easily disabled in case it becomes
necessary.
Either way, well-presented data can help you make your case. The bar chart for
test progress by function can work here as well, but this time it will show only one
or two areas as dramatically behind the others. Pair that with a similar bar chart
showing open defects by function, backed up with a detailed list of defects, and oth¬
ers should quickly see your point.
Restrictions List
If the broken areas can be bypassed by the user, then it may be enough to simply
provide a warning. Create a restrictions list describing functions to avoid until fur¬
ther notice. This may be particularly useful if the next consumer of the software is
another test team. If that team is able to stage its testing in such a way as to avoid
the problem areas initially, that will give your team and development a chance to
clean them up. This approach may also work with beta customers, particularly if
there are only a handful of them and they are well-known and can be robustly sup¬
ported. Even in that case, be aware that users have a tendency to ignore restrictions
lists, and then complain loudly when functions don’t work.
A restrictions list is usually an easy sell, sometimes too easy. It’s often tempting
to try to document your way out of a tough spot that really requires more direct ac¬
tion. If the restrictions list is longer than the working functions list, or the broken
areas are too embedded in mainline function for a user to avoid them, it’s proba¬
bly best to consider an alternative.
SUMMARY
T he day your software ships to its first customer can be very satisfying, but also
a little worrisome. You’ve done your best, toiling away for hours on end to
detect and destroy all the bugs that matter. But was it enough? Will the users’
experiences be smooth and satisfying, or troubled and disappointing? Is there any¬
thing you could have done better?
The concluding two chapters look at the final aspects of testing. We’ll start
with what customers tend to do with new software, and what you should expect to
gain from their experience. We’ll examine the kinds of defects that early customer
testing (often called beta testing) is expected to extract, and why it complements
your own efforts. But defect removal isn’t the whole story—you’ll see other bene¬
fits that beta testing can offer as well. You’ll also learn why customers do their own
testing of most new software before putting it into production, and why there’s a
whole class of defects that are beyond the scope of what you can find.
The experiences of your early customers may suggest areas where your testing
can be strengthened. But there are other ways to probe for opportunities to enhance
your work. We’ll review a variety of avenues a test team can pursue on the road to
continuous improvement—because as rewarding as it can be to wring the bugs out
of a complex package of software, there’s always another test on the horizon.
307
\
i
The Customer's Role
in Testing
In This Chapter
■ The value of a beta test
■ Why customers should test generally available software
309
310 Software Testing Techniques: Finding the Defects that Matter
Information on the installation itself is not all you can glean from this activity.
You’ll also want to learn about the experiences of the initial setup and customiza¬
tion that occurs as a final step of the install process. Also, some of the software doc¬
umentation will get a good workout during both the install and set up activities.
Customer feedback may indicate updates to the documentation are warranted. You
may also be able to determine if any additional test scenarios should be executed
prior to the general software release.
Regression
It’s almost a certainty that the new software will be introduced into an IT environ¬
ment that is already supporting the customer’s business needs. One objective for a
beta test is to ensure that the introduction of new software doesn’t adversely affect
existing applications and systems. You’ll want to identify any unforeseen, disrup¬
tive intersections with other software components or applications. This may lead to
adjustments to the software to remove or reduce the need for customers to make
changes in their own applications in order to deploy it.
During the development cycle for a predecessor of z/OS, a handful of these in¬
ternal beta test accounts had been selected and were anxiously awaiting the new
code. Schedules were put in place, as were specific milestones for the SVT team to
meet before shipping the new release to these internal customers. The SVT team
put the new software through a battery of tests to ensure it was stable enough for
their colleagues to trust in production. Finally, the milestones were achieved. The
SVT team celebrated, and with a flourish the code was sent off to IBM sites around
the country.
Trouble in Paradise
The celebration didn’t last long. When one of the sites attempted to boot the new re¬
lease, the system froze, dropping into what’s called a disabled wait state. This was un¬
acceptable. The SVT team was shocked. The software was stable. It had been booted
over and over again during SVT without any problems. It had survived a series of
load/stress and other tests. None of the other beta sites were experiencing this prob¬
lem. What had happened? The site experiencing the failure captured a full system
memory dump on tape and rushed it to the development laboratory. Eagerly, the
SVT team began to examine it. Quickly, the problem became apparent.
The Culprit
Installed at the IBM location in question was something called an IBM 3850 Mass
Storage System (MSS). Akin to a modern tape robot, the MSS was a monster. Con¬
sisting of 8 cabinets, it was 3 feet wide and 20 feet long with a series of honey¬
combed chambers inside that held some 2000 data cartridges. A mechanical hand
fetched cartridges and brought them to a central read/write unit as needed for pro¬
cessing, then returned them when finished.
The bug was simple. During boot processing, a change in the operating system
unrelated to the MSS was conflicting with the support for that device. This conflict
forced the operating system into a position where it could not safely continue, and
so it triggered the disabled wait state.
Some software bugs, even ones with devastating consequences, will only surface
under specific environmental conditions.
More
Any system with an MSS installed would have seen this failure during boot pro¬
cessing. However, at the time of this incident the MSS was nearing its end of life.
While technically still supported by the operating system, it was no longer widely
314 Software Testing Techniques: Finding the Defects that Matter
used. None of the other beta test sites had one. Neither, unfortunately, did the SVT
team. That’s how the defect escaped.
There was nothing wrong with how the SVT team had done its testing. The flaw
was with what had been missing when that testing was done. This example illustrates
how even the most thorough set of test scenarios can be foiled by environmental gaps.
It also shows the testing value to be achieved by exposing a new software offering to
a variety of different environments to attain a level of diversity that no single test team
could hope to match.
^ £ y If you can choose your beta test environments, pick ones that offer the greatest ex-
posure to diverse hardware and other software to increase the odds of finding sub-
^ tie defects.
TtP
PREPRODUCTION TESTING
Most businesses depend on information technology. Some have a very critical de¬
pendence on IT, even to the point where they are literally out of business without
it. Because of this dependence, they continually need to squeeze more and more out
of their computer investments and implement solutions that give them advantages.
However, any change made to the IT environment brings with it a certain amount
of risk. The risks need to be mitigated. One mitigation action customers can take is
to test.
The Customer's Role in Testing 315
z/OS
System Monitors
Runtime Application \
Unique Applications
/ Enablement Services \
Off the Shelf Applications
Unix System Services
Clustering Technology 1000's of Extended Devices
A Shift in Focus
The target of customer testing is the integration of all the disparate pieces that
make up the production environment. The customer can examine the effects of the
proposed changes on the total environment, with particular attention paid to their
unique components.
It is not unusual for software to provide various ways for users to alter the default
behavior of a subset of its functions. One approach that has existed for many years
is the notion of exit points. At very strategically placed spots in the software, the
ability to pass control to an exit is provided that optionally allows the customer to
alter how the software works. This exit is a program written by the user. The exit
The Customer's Role in Testing 317
may have to follow some strict rules, but it offers users a degree of freedom for cus¬
tomizing the software’s processing to meet their needs. However, whenever the
software containing the exit points is upgraded, any exits tied to those exit points
need to be tested to ensure they still work without any undesired side effects.
Of course, there’s more to customization than just exit points. Most software
otfers many other ways for users to alter default behavior. Many knobs and switches
may be provided. A large number of adjustments by any customer may create a
combination that forces the execution of unique code paths. Experimenting with
that combination of adjustments first in a test environment is another step cus¬
tomers can take to reduce the risk of change.
Operational Automation
Computer technology has improved and developed tremendously over the years.
The speed and volume of processing that current technology allows makes it very
difficult to manage and operate within human reaction times. To address this chal¬
lenge, customers often automate many operational tasks. Several vendors offer soft¬
ware products that allow a user to specify system events to watch for, and actions to
take in response to each event. The automation hovers in the background, moni¬
toring system activity. When it detects a targeted event, such as a log file filling up, it
responds with an appropriate action, eliminating the need for human intervention.
This technique is very effective, but it quickly fosters a dependence on correct
operation of the automation, since automation miscues in the production envi¬
ronment can be very disruptive and difficult to work around. As a result, any pro¬
posed changes that may effect the automation processing, including adding or
upgrading software, should be validated. For example, many automation packages
are triggered by messages that the software presents. So software upgrades that
might have changed message content or format demand special attention.
System Monitors
In addition to automation, programmatic monitoring of systems is typically a key
part of production. The monitors become the eyes and ears of the system adminis¬
trators grappling with massive amounts of processing. Many times, it is a monitor’s
alerts that give the system’s human operators their first indication that there is an
issue that requires action. As with automation, reliance on the correct operation of
monitors is widespread. Where do the monitors get their information and data?
Normally, it’s from other components of the system. Some monitors even are de¬
pendent on the internal control structures of other software—structures which of
course may change with a new release. Customers will want to demonstrate correct
318 Software Testing Techniques: Finding the Defects that Matter
operation of monitors after changes are applied to the system, but before intro¬
ducing those changes into production. In other words, they’ll need to test.
Applications
Although lots of attention is paid to a production environment’s infrastructure, it’s
applications that provide the most visible value to the business. However, most ap¬
plications rely heavily on the services of the infrastructure. Also, production envi¬
ronments do not simply include one application, but many. And, it’s typical for
them to have dependencies on, interface with, or pass data among each other. The
combination of all these applications is most likely unique to the enterprise that runs
them. It cannot be assumed that a software supplier or vendor has experimented
with this exact combination. Changes to any of these parts should be validated prior
to putting them in a position that may hurt the enterprise.
End Users
Servicing the end users of production environments is almost always the main rea¬
son those environments exist. End users may range from the CEO of the company,
to the warehouse worker filling orders, to the busy mom sitting in her kitchen
browsing the Web. A business cannot afford to let its end users be thwarted by
changes introduced to production. If the warehouse must delay filling orders be¬
cause the inventory system is down, or the busy mom can’t place an order and so
hops to a different Web site, it’s a direct hit to the bottom line. Validating the
changes in a test environment from the view of an end user is an important step in
testing. One way this can be accomplished is to have the application development
and system support communities act as the end users. As a group they can simul¬
taneously invoke the most important or often used areas of the applications. This
can be done on a robust test environment if it exists, or during the change window
in production prior to when the bulk of the real users start using the system. It’s a
relatively cheap way to do some functional validation, but may not drive the same
amount of activity as will occur during peak times in production. Customers con¬
cerned about this can invest in the same load driving, user simulation test tools
used by software development organizations.
the user community. But it won't take long for problems to be noticed. System sup¬
port can celebrate among themselves when no one else is even aware that a change
occurred.
SUMMARY
In This Chapter
■ Continuous improvement
■ The big picture
Y ou’ve maneuvered your way through the test of an entire software product.
You may have escaped with only cuts and bruises, but it’s not time to go
home just yet. A few tasks remain. As we’ve noted before, successful software
is rarely produced once and then forgotten. There’s always another version on the
way. How can you prevent any mishaps encountered during this test from occur¬
ring on the next? Are there ways to improve the technical approach and processes
used? Let’s examine several techniques.
CONTINUOUS IMPROVEMENT
Continuous improvement means solving problems that bit you in the past so you
can avoid them in the future. As we have previously discussed, activities such
as component assessments help identify potential process improvements. It’s
321
322 Software Testing Techniques: Finding the Defects that Matter
important not to ignore or lose sight of any shortcomings. How do you address
them head on?
Comprehensive Postmortems
At the end of a complex project, the test team needs to look back at where they’ve
been. Invite everyone on the team to a meeting. Start by discussing things that
worked well and should be retained for future tests. Then brainstorm on areas for
improvement. There are several questions to ask and topics to consider:
Test Strategy: Was there anything good or bad about the overall strategy used?
Tools: How well did the chosen tools work? Should you have selected a differ¬
ent set? If new niche tools were invented, did they work out well? Is everyone
on the team aware of them as possible candidates for reuse in future tests? Have
they been added to the team’s tool list?
Test Plans: Could the checklists, consideration lists, and test plans have been
improved?
Workloads: Were workloads representative of those used by customers, or
stressful and complex enough to squeeze out defects?
Environment: Were enough resources available or did the configurations allow
problems to escape?
Test Flow: Could communication with the testing phases before or after yours
have been improved?
Education: Was the team technically prepared to perform the tasks of the test?
Did everyone understand the items they were assigned?
Clarity: Were test scenarios documented well enough for others to execute
them in the future?
Problem Data: Did you find the types of problems you expected? Were there
trends in the data? Problematic components or functions?
The Testing End Game 323
1 he above set ot questions and topics is definitely not complete, but it offers a
starting roadmap. Keep in mind that the objective of the initial postmortem meet¬
ing is only to identify areas of strength and weakness, not to devise actions. Once you
have a list of possibilities, then various participants can later investigate possible
next steps and report back to the team.
Iterative Postmortems
Just as there are iterative development techniques, there can also be iterative post¬
mortem reviews. Simply because the entire project is not yet completed doesn’t
mean that a review of what’s happened so far shouldn’t be considered. In fact, in¬
terim reviews will capture ideas while they’re fresh in your mind. Consider carving
out time during a busy test to step back and review.
Postplanning Reviews
After your test plan has been developed, test cases have been written, and execution
has begun, you may wish to pull together the team for a review of the planning and
preparation activities. This will help outline what to pay attention to the next time
through the planning cycle. Additionally, it might also identify short-term actions
that the team can take to enhance the test. When a group of testers get together
they’re bound to generate new ideas on how to tackle test planning and preparation.
A review of the current schedule and whether modifications are needed is often
a good area to explore. The end date may be beyond the test team’s control, but in¬
ternal checkpoints owned by the team can often be adjusted. Perhaps now that test
execution is underway, some assumptions made during planning are dubious and
should be revisited. By stepping back to take a fresh look, you might see an adjust¬
ment to testing order or overlap that can help the overall efficiency of the test. A few
such tweaks might actually improve the end date, or at least make it achievable.
their approach. If the need to shift strategies becomes clear, the test team must be
able to dynamically change the execution plan and have management’s support in
doing so. To enact a dynamic plan change, you may need to have a rapid review and
approval process in place. Alternatively, you can anticipate the need for dynamic
adjustments up front and make provisions for them by including time for artistic
testing in your plan. The important thing is to find a way to keep your testing flex¬
ible. Move quickly and be nimble—change your plan to find those bugs.
Postmortems
Is all this reviewing worthwhile? Well, to tell you the truth, yes! If you don’t recog¬
nize the problem, it’s tough to fix it. Whether done at the end of a test or at logical
checkpoints along the way, some simple reviews can really pay off. They may lead
to testing modifications that will increase your efficiency and effectiveness at hunt¬
ing down bugs—and what could be more worthwhile than that?
Escape Analysis
No matter how comprehensive a test plan is, there will inevitably be problems that
escape from one test phase to the next. Unfortunately, some problems will slip out
to customers as well. Escapes are likejy in any environment, but steps can be taken
to limit them. One important technique is for testers to perform an analysis of the
ones that got away.
The Testing End Game 325
Postproject and in-process escape analysis activities are critical means for driving
test improvements. It’s important to review the problems with some specific goals in
mind. How can you attack that analysis? First and foremost, look at the function
that s in error. Also, examine the type ol defect. For example, is the error in a simple
API, a complex serialization method, or a user interface?
Sources of Escapes
The escape trends you just identified can now become important feedback for the
test team. The team can map these into their test cases, tools, environments,
processes, and methods to see what can be done differently the next time to prevent
not only the defects that did escape, but others in the same class.
Yes, this may be a painful exercise. On the other hand, if the analysis identifies
bugs that escaped because the team was missing some key hardware, that can create
326 Software Testing Techniques: Finding the Defects that Matter
a powerful argument for additional test equipment the next time around. Take ad¬
vantage of the findings.
Customer Involvement
An excellent way to understand the shortcomings of your test efforts is to share
them directly with actual customers. Customers are often amenable to discussions
of improvements in the vendor’s testing.
If the problems encountered by customers are significant, they will expect ac¬
tion by the software vendor to address the apparent lack of test coverage. But ex¬
perience shows that customers also tend to be helpful in identifying specific gaps
and helping to create solutions. In fact, a close working relationship with a partic¬
ular customer can help not only them, but also the industry of which they are a
member, or even the entire customer community. Analysis of a customer’s envi¬
ronment and their exploitation of the software will help both to identify what ex¬
posed the problem, and to formulate an action to take in response.
As we discussed Chapter 2, “Industrial-Strength Software—It’s Not a Science
Project,” customer environments have a wide combination of system software, mid¬
dleware applications, different hardware devices, and end-user clients. These inte¬
grated environments challenge the test team to keep up with state-of-the-art testing
techniques to simulate them. Understanding customers is a critical first step.
Conference calls, executive briefings, and on-site meetings with customers can
help pave the way for a relationship that can continue to improve the effectiveness
of both your test team and the customer. Being able to meet directly with their lead¬
ing information technology specialists allows you to hear directly from the folks
who feel the satisfaction or pain. This environment encourages building more com¬
prehensive test scenarios and programs at both the vendor and the customer.
A
Communication between Test Teams
of possible changes. You may be surprised at the number of great ideas that appear
once someone expects them.
We have covered much information. What message should you walk away with?
Conquering Complexity
Industrial-strength software is complex. A test team must disassemble that com¬
plexity through a number of techniques. Once the complexity is conquered, the
team can identify the most strategic test approach.
Complexity can be conquered through technical assessments of the software’s
components—which has the byproduct of building the testers’ expertise. It can
330 Software Testing Techniques: Finding the Defects that Matter
also be handled by pulling together knowledgeable testers with various types of ex¬
pertise to construct a comprehensive test approach that will address all areas of the
product.
Reuse is wonderful, but it often isn’t enough. You’ll need to develop new test cases
to attack new features and functions in the target software. Make sure that whichever
test phase they are targeted for, the test cases are built efficiently and with all of the
necessary infrastructure and diagnostic capability they need. Don’t forget that in the
future someone is likely to reuse them. Ensure they meet all of those reusable char¬
acteristics as well.
These cool tools can be as simple as a small program that changes a bit mask in
the system to masquerade as something else, or as complex as one entire operating
system emulating another. See if you can leverage the power of virtualized envi¬
ronments to your advantage.
This Is Chaos
Uncontrolled testing can quickly spin out of control. But it doesn’t have to be that
way. The test team can utilize a large range of techniques to get things back in line.
Run logs and checklists can help generate a script of testing activities, so that you
don’t miss those subtle or not so subtle problems that customers will somehow spot
immediately.
What if the test end date nears but you aren’t done? How close are you? Have
you uncovered all the crippling defects, so all that remains are minor problems?
Simple yet accurate status tracking can help testers get the critical bugs they iden¬
tify promptly fixed. But what if you haven’t even attempted all of your tests yet?
Options include delaying the product, including some code that disables the
new function, or shipping it with restrictions. The project management team needs
to understand all of the risks and available options. This means that the test team
better have a handle on it. Keep an eye on how you are tracking against your plan
and how the product is holding up.
Your test is finished and you think you did a good job. But, could you have done any¬
thing better? Absolutely! Look over what you’ve done, what problems have escaped
to your fellow testers and customers, and what others think of your product. Con¬
tinuous improvement requires communication and effort, but it pays big dividends.
The Testing End Game 333
SUMMARY
The defects that matter, really do. Your customers will be the first to tell you that.
Focus your test on finding the problems that impact enterprises and their cus¬
tomers. Drive your test team toward processes, practices, and technical expertise
that will make you all successful and envied. Don’t be timid—take control of your
testing objectives.
Now go find some bugs!
References
335
336 Software Testing Techniques: Finding the Defects that Matter
« Glossary
ABEND ABnormal END, a mainframe term for a program crash. It is always associated
with a failure code, known as an ABEND code.
Accessibility Testing Testing that determines if the software will be usable by people with
disabilities.
Address Space A z/OS construct, analogous to a UNIX process, that provides a range of
virtual memory within which work executes. Each interactive time-sharing user, started
task, or batch job is given its own address space. Multiple tasks, analogous to a UNIX
thread, can run within a single address space.
Agile Software Development Processes used to develop software with the goal of early
and often code delivery.
Algorithm Verification Testing A software development and test phase focused on the
validation and tuning of key algorithms using an iterative experimentation process.
Artistic Testing Testing that takes early experiences gained with the software and uses
them to devise new tests not imagined during initial planning. It is often guided by the in¬
tuition and investigative instincts of the tester. Also known as Exploratory Testing.
Automation Software that watches for specific system events and takes predefined actions
in response to each event.
Batch Job A class of software program that does not interact directly with the user. It is
submitted by the user to the system, eventually executes (when system resources permit),
and returns a result. May be defined to run automatically at a specific time.
Batch Window A time period, usually at night, used to perform database consolidation
and reconciliation activities or other disruptive tasks, often by running streams of special¬
ized batch jobs.
339
340 Software Testing Techniques: Finding the Defects that Matter
Black-box Testing A technique that assumes little, if any, understanding of software’s in¬
ternals, but rather relies on a comprehensive view of its inputs and associated external
behaviors.
Boundary Condition Testing See Limits Testing.
Breakpoint A function provided by interactive development environments, debugging
tools, and hypervisors in which a specific instruction in a program is marked, such that
when that instruction is reached the program or system is suspended to allow a person to
observe its status information. Also known on some systems as a trap.
Cluster A collection of computer servers that operate together as a single unit to process
work.
Coexistence Testing Testing that checks to see if a new version of software interoperates
successfully with older versions of that same software.
Compatibility Testing Ensuring an updated product can still interact properly with other
older products. ^
Component Spy A tester who explores an entire software component with the goal of be¬
coming a technical resource within the test team for anything having to do with that area.
Considerations List A list of key areas of concern that should be focused on during a test.
The descriptions are brief and are used during early test planning. They also typically feed
into the formulation of a variation list.
Contingency Factor A percentage of the total amount of time or resources required for
the test. This factor represents the estimated time required for reacting to problems, wait¬
ing for fixes, handing test machine failures, and so on. It is added to the overall projected
schedule as a safety cushion. See Task Contingency and Methodology Contingency.
Cookbook Scenario A test scenario description that provides complete, step-by-step de¬
tails about how the scenario should be performed. It leaves nothing to chance.
Glossary 341
Coupling Facility (CF) Licensed internal code running in a special type of logical parti¬
tion in certain maintrame processors that provides a shared storage medium used in a Par¬
allel Sysplex cluster.
Coupling Facility Rebuild 1 he act of moving coupling facility structures from one CF to
another.
Coupling Facility Structures Hardware assists within a coupling facility to enable multi¬
system data-sharing support. These assists support global locking, caching features, and a
set of queuing constructs. Unique structures can be allocated for different software ex¬
ploiters.
Data Corruption What occurs when data is incorrectly modified by some outside source
and the user is not notified.
Data Integrity The completeness and correctness of data that has not been corrupted.
Data Integrity Monitor A small test program that creates specific conditions in which
data corruption might occur, and then checks to see if it does. See Thrasher.
Emulator A software application that emulates a hardware platform different from the
one it is actually executing on. It runs as an application on a host operating system, but un¬
like the software partitioning approach, the emulated environment is not tied to a particu¬
lar operating system. Any operating system that can run on the hardware being emulated
can also run under the emulator.
First Failure Data Capture (FFDC) The ability for software to automatically collect and
save enough status information at the time of its failure to enable the error s root cause to
be determined later by a human. Software with this trait eliminates the need to turn on spe¬
cial traces or other diagnostic aides and then re-create the problem in order to diagnose it.
Fix Testing Rerunning of a test that previously found a bug in order to see if a supplied
fix works.
Framework Scenario A test scenario definition that provides only enough high-level in¬
formation to remind the tester of everything that needs to be covered for that scenario. The
description captures the activity’s essence, but trusts the tester to work through the specific
steps required.
Function Verification Test (FVT) Testing of a complete, yet containable functional area
or component within the overall software package. Normally occurs immediately after Unit
Test. Also known as Integration Test (though not in this book).
Grooved Tests Tests that simply repeat the same activity against a target product from
cycle to cycle.
Host Hypervisor A hypervisor that coexists with a host operating system on the same ma¬
chine.
Hypervisor Software for providing virtualization which runs the virtual computer na¬
tively, suspends it any time it executes an operation that would affect the state of the real
CPU, and emulates that operation before returning control to the virtdal computer. Also
known as a Virtual Machine Monitor. See Virtualization.
Integration Test A test that looks at an entire solution as a whole. It moves beyond the
single-product domain of system verification test and tries to integrate the new software into
a simulated customer environment. It takes a big picture view, with the new software as
merely one of many elements in that picture—just as if it had been thrown into production.
Also known as Acceptance Testing (but not in this book).
Internationalization Testing Validates that a program which has been translated for sale
in other countries continues to operate properly. Also known as Translation Testing.
Iterative Software Development Software creation processes that are focused on provid¬
ing frequent feedback in all phases.
Glossary 343
Limits Testing Testing a piece of code to its defined limits and then a little bit more. Also
known as Boundary Condition Testing.
Load/Stress High levels of activity applied to software in order to put it under extreme
pressure to see what limits it can endure.
Logical Partition (LPAR) One subdivision of a physical computer that has been divided
into a number of smaller, logical computers. The number of CPUs, memory, and I/O paths
defined for each LPAR image is adjustable. Each LPAR is independent and capable of run¬
ning a different operating system instance.
Mainframe A class of large computer systems architected to address the needs of com¬
mercial computing, designed to handle the constant movement of data between the proces¬
sor and external storage, very large amounts of data, processor resource consumed in short
bursts for each transaction or read/write operation, and large pools of simultaneous users
and tasks competing for resources.
Memory Leak A type of software defect. Typically occurs when memory is repeatedly ob¬
tained and released, but the amount released is inadvertently less than what was obtained,
causing memory consumption to gradually accumulate.
Operator Someone who handles the day-to-day monitoring of the production systems.
Pressure Points Areas of software that have the toughest demands placed upon them.
Privileged Instructions A carefully grouped subset of the CPU instructions that can only
be executed when the CPU is in a specially authorized state.
Program Listing Output that shows the low-level assembly and machine language in¬
structions that make up a compiled program, along with the offset of each such instruction
from the program’s beginning. Can usually be generated by a compiler when it is invoked
with the appropriate options.
Reactionary Iterative Model Software development process that makes use of quick and
iterative reactions to issues that arise. A
Real Memory Memory that is physically installed on a computer. See Virtual Memory.
Recovery Routine A chunk of software code that is automatically given control if an un¬
expected failure occurs so it can perform any needed cleanup activities and allow the soft¬
ware module it is protecting to resume operating normally.
Regression Testing Testing that determines if new code has broken, or “regressed” old
functions.
Scaffolding Bits ot code that surround a module to prop it up during early testing by re¬
ceiving invocations from the module and responding to them according to an expected pro¬
tocol. See Stub Routines.
Serialization A software construct (such as a lock or mutex) used to coordinate the ac¬
tions of multiple software entities executing concurrently on the same computing system.
It is used to force a serial order in the use of resources they all share, such as control struc¬
tures in common memory areas.
Service Level Agreement (SLA) A contract between the provider of information technol¬
ogy services for the enterprise (typically the company’s own I/T department, or an out¬
sourcing vendor), and the users of those services.
Service Test Tests software fixes, both individually and bundled together, for software
that is already in use by customers.
Sniff Test A quick check to see if any major abnormalities are evident in the software.
Soft Dependency Something required in order for the dependent software to work a cer¬
tain way.
Software Stack The layers of software necessary to allow an application to execute. Typi¬
cally includes an operating system, one or more middleware products, and the application
itself.
Spiral Development Iterative software creation process focused on performing risk re¬
view and assessment throughout the iterations of the test cycles to influence the improve¬
ments for the next phase of testing.
Streamable Test Cases Test cases which are able to run together as part of a large group.
Stub Routines Tiny routines that stand in for actual modules by imitating their responses
when invoked. See Scaffolding.
Subsystem Software that provides critical supporting services to an operating system,
such as job submission control or security management.
Symmetrical Multiprocessor (SMP) A server that contains multiple tightly coupled
processors, each of which can independently accept work.
346 Software Testing Techniques: Finding the Defects that Matter
Sympathy Sickness A situation where one system’s difficulties cause a chain reaction in
which other systems it is in contact with also get sick in “sympathy.”
System Programmer Someone who installs, sets up, configures, customizes, and main¬
tains system software. The term programmer is included in the title due to their typical tasks
of writing scripts, customization exits, etc.
System Verification Test (SVT) Testing of an entire software package for the first time,
with all components working together to deliver the project’s intended purpose on sup¬
ported hardware platforms.
Task Contingency A cushion of time or resources added to the schedule for each set of ac¬
tivities performed during a test.
Test Case A software program that, when executed, will exercise one or more facets of the
software under test, and then self-verify its actual results against what is expected.
Testability An attribute of software describing how well it lends itself to being tested.
Testability Hooks Those functions, integrated in the software that can be invoked
through primarily undocumented interfaces to drive specific processing which would
otherwise be difficult to exercise.
Thrasher A type of program used to test for data integrity errors on mainframe systems.
The name is derived from the first such program, which deliberately generated memory
thrashing (the overuse of large amounts of memory, leading to heavy paging or swapping)
while monitoring for corruption. See Data Integrity Monitor.
Trap A debugging aid ability offered by some operating systems and hypervisors that al¬
lows a user to request that a particular action happens when some event occurs on a live sys¬
tem. The event could be the crash of the given program, the invocation of a particular
module, or even the execution of a specific line of code. The action could be to force a
Glossary 347
memory dump, write a record to a log, or even treeze the entire system. This is different
from the use of the same term on some systems as a means of describing a crash or ABEND.
See Breakpoint.
Unit Test (UT) Testing done by software developers of their own code prior to merging
it into the overall development stream which exercises all of its new and changed paths.
Usability Testing Ensures that software’s human interfaces are intuitive and easy to follow.
Variation A single test to be performed. Similar to the IEEE definition of a test case.
Virtualization A technique that creates one or more simulated (or virtual) computers
within a single physical computer. Each virtual computer appears to the software running
inside it to have all of the resources that are available on a real computer. The virtual com¬
puter may also have the ability to emulate hardware that does not actually exist on the un¬
derlying physical machine.
Virtual Machine One of the virtual computers created by a hypervisor. See Hypervisor
and Virtualization.
Virtual Network Support provided by a hypervisor to allow that virtual machines on the
same physical computer can communicate with each other at memory speeds as if they are
on a physical network. See Hypervisor and Virtualization.
Waterfall Model A linear and sequential approach to developing software that depends
on one phase completing before the next can begin.
Waterwheel Model A modification of the waterfall software development model that re¬
tains its up front activities of requirements collection, specification development, and prod¬
uct design. However, it stages the development and testing phases by shipping code in
logical groupings so that testing can occur on one chunk of code while the next chunk is
being created. It also implements a form of continuous feedback between the development
and test phases to enable ongoing adjustment to the software s operation.
348 Software Testing Techniques: Finding the Defects that Matter
Wild Store When a program modifies memory at an incorrect location, thereby corrupt¬
ing it.
Zap The dynamic alteration of computer memory. Usually intended to bypass a problem
or force software to take a particular action.
Index
349
350 Index
disabled users, see accessibility testing SVT (System Verification Test), 33-34,
documentation 145-146
book drafts and product information, 83 UT (unit test), 29
comments, 178 error codes and dumps, 119
key target areas, 254 error injection, 141—142, 164—165, 274
requirements document, 81 error logs, 119-120, 249-250
specifications, 81 errors
testers and customer documentation, 66 component-level recovery from anticipated,
testing, 253-254 143
test plan document, 108-114 component-level recovery from unantici¬
domains, 258 pated, 144
DOS (denial-of-service) attacks, 128 injecting, 141-142, 164-165,274
down time costs, 17 escape analysis, 102-103, 324-326, 341
dumps, 119 escape of defect, case study, 312-314
dynamic address translation (DAT) tables, evidence trail, 248-250
286 EVO (evolutionary model) of software develop¬
ment, 55-56, 341
examples, see case studies
emulators exit criteria, 48, 156-161
Bochs emulator, 261 exits, user, 316-317
defined, 341 expanded storage, CPU, 282
determining tests to run, 220-227 explicit testing
FLEX-ES emulator, 261 regression, 123
goals for tests, 227-232 usability, 129
managing test environments, 232-234 exploitation of system services, 214
porting customer workloads, 220-224 exploiters, 82, 341
recoverability, 142 exploratory testing, 240, 339
virtual computers, testing with, 261-262 extendability, 209-213
Enabling Technologies Group, 17 extreme Programming (XP), see XP (eXtreme
end-of-day checklist, 295 Programming)
end users, 318, see also customers
enhancements, 101, 102
A
enterprise, 341 fence off functions, 303-304
entry criteria, 48, 156-161 FFDC (First Failure Data Capture), 128, 231, 342
environment file thrasher case study, 194—195, 197-200
beta test, 42, 311 fix testing, 122, 342
environmental failures, 145-146, 147 FLEX-ES emulator, 261
FVT (Function Verification Test), 31-32 flow and function, 81
heterogeneity, 16 frameworks, 206, 208-215
integration test, 38-39 framework scenario, 111, 342
managing test, 232-234 FVT (Function Verification Test)
mirroring customer production objectives, authentication, 115
230-232 authorization, 115
native, 343 avoid “big-bang” tests, 182-183
production, see production environments callable services, 183-184
PVT (Performance Verification Test), 35 compared to SVT, 185
saving test, 267 confidentiality, 115
service test, 40 costs and efficiencies, 32
Index 353
362 Index
64 05438 82
Key Femm! LEARN THE SECRETS OF MAINFRAME TESTERS!