developer-testing
developer-testing
Alexander Tarlinder
Boston • Columbus • Indianapolis • New York • San Francisco • Amsterdam • Cape Town
Dubai • London • Madrid • Milan • Munich • Paris • Montreal • Toronto • Delhi • Mexico City
São Paulo • Sydney • Hong Kong • Seoul • Singapore • Taipei • Tokyo
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and the publisher was
aware of a trademark claim, the designations have been printed with initial capital letters or
in all capitals.
The author and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or omis-
sions. No liability is assumed for incidental or consequential damages in connection with or
arising out of the use of the information or programs contained herein.
For information about buying this title in bulk quantities, or for special sales opportunities
(which may include electronic versions; custom cover designs; and content particular to your
business, training goals, marketing focus, or branding interests), please contact our corporate
sales department at [email protected] or (800) 382-3419.
For government sales inquiries, please contact [email protected].
For questions about sales outside the U.S., please contact [email protected].
Visit us on the Web: informit.com/aw
Library of Congress Control Number: 2016944434
Copyright © 2017 Pearson Education, Inc.
All rights reserved. Printed in the United States of America. This publication is protected
by copyright, and permission must be obtained from the publisher prior to any prohibited
reproduction, storage in a retrieval system, or transmission in any form or by any means,
electronic, mechanical, photocopying, recording, or likewise. For information regarding per-
missions, request forms and the appropriate contacts within the Pearson Education Global
Rights & Permissions Department, please visit www.pearsoned.com/permissions/.
ISBN-13: 978-0-13-429106-2
ISBN-10: 0-13-429106-9
Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana.
1 16
To my grandfather Romuald, who taught me about books.
This page intentionally left blank
Contents
Preface xvii
Acknowledgments xxiii
vii
viii Contents
Challenges 206
Test First or Test Last? 209
Summary 210
Bibliography 289
Index 295
This page intentionally left blank
Foreword by Jeff L angr
Ten years ago, I became the manager and tech lead for a small development team at a
local, small start-up after spending some months developing for them. The software
was an almost prototypically mired mess of convoluted logic and difficult defects. On
taking the leadership role, I began to promote ideas of test-driven development (TDD)
in an attempt to improve the code quality. Most of the developers were at least willing
to listen, and a couple eventually embraced TDD.
One developer, however, quit two days later without saying a word to me. I was
told that he said something to the effect that “I’m never going to write a test, that’s not
my job as a programmer.” I was initially concerned that I’d been too eager (though I’d
never insisted on anything, just attempted to educate). I no longer felt guilty after see-
ing the absolute nightmare that was his code, though.
Somewhat later, one of the testers complained to me about another developer—a
consultant with many years of experience—who continually submitted defect-riddled
code to our QA team. “It’s my job to write the code; it’s their job to find the prob-
lems with it.” No amount of discussion was going to convince this gentleman that he
needed to make any effort to test his code.
Still later and on the same codebase, I ended up shipping an embarrassing defect
that the testers failed to catch—despite my efforts to ensure that the units were well
tested. A bit of change to some server code and an overlooked flipping of a bool-
ean value meant that the client—a high-security chat application—no longer rang the
bell on an incoming message. We didn’t have comprehensive enough end-to-end tests
needed to catch the problem.
Developer tests are tools. They’re not there to make your manager happy—if that’s all
they were, I, too, would find a way to skip out on creating them. Tests are tools that give
you the confidence to ship, whether to an end customer or to the QA team.
Thankfully, 10 years on, most developers have learned that it’s indeed their job
to test their own code. Few of you will embark on an interview where some form of
developer testing isn’t discussed. Expectations are that you’re a software development
professional, and part of being a professional is crafting a high-quality product. Ten
years on, I’d squash any notions of hiring someone who thought they didn’t have to
test their own code.
Developer testing is no longer as simple as “just do TDD,” or “write some inte-
gration tests,” however. There are many aspects of testing that a true developer must
embrace in order to deliver correct, high-quality software. And while you can find
a good book on TDD or a good book on combinatorial testing, Developer Testing:
xiii
xiv Foreword by Jeff Langr
Building Quality into Software overviews the essentials in one place. Alexander sur-
veys the world of testing to clarify the numerous kinds of developer tests, weighing in
on the relative merits of each and providing you with indispensable tips for success.
In Developer Testing, Alexander first presents a case for the kinds of tests you
need to focus on. He discusses overlooked but useful concepts such as programming
by contract. He teaches what it takes to design code that can easily be tested. And
he emphasizes two of my favorite goals: constructing highly readable specification-
based tests that retain high documentation value, and eliminating the various flavors
of duplication—one of the biggest enemies to quality systems. He wraps up the topic
of unit testing with a pragmatic, balanced approach to TDD, presenting both classical
and mockist TDD techniques.
But wait! There’s more: In Chapter 18, “Beyond Unit Testing,” Alexander pro-
vides as extensive a discussion as you could expect in one chapter on the murky world
of developer tests that fall outside the range of unit tests. Designing these tests to be
stable, useful, and sustainable is quite the challenge. Developer Testing doesn’t disap-
point, again supplying abundant hard-earned wisdom on how to best tackle the topic.
I enjoyed working through Developer Testing and found that it got even better as
it went along, as Alexander worked through the meaty coding parts. It’s hard to come
up with good examples that keep the reader engaged and frustration free, and Alex-
ander succeeds masterfully with his examples. I think you’ll enjoy the book too, and
you’ll also thank yourself for getting a foundation of the testing skills that are critical
to your continued career growth.
Foreword by Lisa Crispin
The subtitle says it all—“Building Quality into Software.” We’ve always known that
we can’t test quality in by testing after coding is “done.” Quality has to be baked in.
To do that, the entire delivery team, including developers, has to start building each
feature by thinking about how to test it. In successful teams, every team member has
an agile testing mind-set. They work with the delivery and customer teams to under-
stand what the customers need to be successful. They focus on preventing, rather
than finding, defects. They find the simplest solutions that provide the right value.
In my experience, even teams with experienced professional testers need devel-
opers who understand testing. They need to be able to talk with designers, product
experts, testers, and other team members to learn what each feature should do. They
need to design testable code. They need to know how to use tests to guide coding,
from the unit level on up. They need to know how to design test code as well as—or
even better than—production code, because that test code is our living documenta-
tion and our safety net. They need to know how to explore each feature they develop
to learn whether it delivers the right value to customers.
I’ve encountered a lot of teams where developers are paid to write production
code and pushed to meet deadlines. Their managers consider any time spent testing
to be a waste. If these organizations have testers at all, they’re considered to be less
valuable contributors, and the bugs they find are logged in a defect tracking system
and ignored. These teams build a mass of code that nobody understands and that is
difficult to change without something breaking. Over time they generally grind to a
halt under the weight of their technical debt.
I’ve been fortunate over the years to work with several developers who really
“get” testing. They eagerly engage in conversations with business experts, design-
ers, testers, analysts, data specialists, and others to create a shared understanding of
how each feature should behave. They’re comfortable pairing with testers and hap-
pily test their own work even before it’s delivered to a test environment. These are
happy teams that deliver solid, valuable features to their customers frequently. They
can change direction quickly to accommodate new business priorities.
Testing’s a vast subject, and we’re all busy, so where do you start? This book deliv-
ers key testing principles and practices to help you and your team deliver the qual-
ity your customers need, in a format that lets you pick up ideas quickly. You’ll learn
the language of testing so you can collaborate effectively with testers, customers, and
other delivery team members. Most importantly (at least to me), you’ll enjoy your
work a lot more and be proud of the product you help to build.
xv
This page intentionally left blank
Preface
I started writing this book four years ago with a very clear mental image of what I
wanted it to be and who my readers were going to be. Four years is quite a while, and
I’ve had to revise some of my ideas and assumptions, both in response to other work
in the field and because of deepening understanding of the subject. The biggest thing
that has happened during the course of those years is that the topic has become less
controversial. Several recent books adopt a stance similar to this one, and there’s some
reassuring overlap, which I interpret as being on the right track.
xvii
xviii Preface
This is why I should have had a book written with these goals in mind a decade
ago, but why today? Hasn’t the world changed? Hasn’t there been any progress in the
industry? And here comes the truly interesting part: this book is just as applicable
today as it would have been 10 years ago. One reason is that it’s relatively technol-
ogy agnostic. Admittedly it is quite committed to object-oriented programming,
although large parts hold true for procedural programming, and some contents apply
to functional programming as well. Another reason is that progress in the field it cov-
ers hasn’t been as impressive as in many others. True, today, many developers have
grasped the basics of testing, and few, if any, new popular frameworks and libraries
are created without testability in mind. Still, I’d argue that it’s orders of magnitude
easier to find a developer who’s a master in writing isomorphic JavaScript applica-
tions backed by NoSQL databases running in the cloud than to find a developer who’s
really good at unit testing, refactoring, and, above all, who can remain calm when the
going gets tough and keep applying developer testing practices in times of pressure
from managers and stressed-out peers.
Being a consultant specializing in software development, training, and men-
toring, I’ve had the privilege to work on several software development teams and to
observe other teams in action. Based on these experiences, I’d say that teams and
developers follow pretty much the same learning curve when it comes to quality
assurance. This book is written with such a learning curve in mind, and I’ve done my
best to help the reader overcome it and progress as fast as possible.
Preface xix
Target Audience
This is a book for developers who want to write better code and who want to avoid
creating bugs. It’s about achieving quality in software by acknowledging testability
as a primary quality attribute and adapting the development style thereafter. Readers
of this book want to become better developers and want to understand more about
software testing, but they have neither the time nor support from their peers, not to
mention from their organizations.
This is not a book for beginners. It does explain many foundations and basic
techniques, but it assumes that the reader knows how to work his development envi-
ronment and build system and is no stranger to continuous integration and related
tooling, like static analysis or code coverage tools. To get the most out of this book,
the reader should have at least three years of experience creating software profession-
ally. Such readers will find the book’s dialogues familiar and should be able to relate
to the code samples, which are all based on real code, not ideal code.
I also expect the reader to work. Even though my ambition is to make lots of
information readily available, I leave the knowledge integration part to the reader.
This is not a cookbook.
chapters are quite independent and can be read in isolation. However, starting with
the first four chapters is recommended, as they lay a common ground for the rest of
the material.
Here’s a quick overview of the chapters:
Writing a book is a team effort. The author is the one who writes the text and spends
the most time with it, but many people make their contributions. This book is no excep-
tion. My first thanks go to Joakim Tengstrand, an expert in software development with a
unique perspective on things, but above all, my friend. He’s been giving me continual and
insightful feedback from very early stages of writing to the very end.
Another person who needs a special mention is Stephen Vance. He helped me by
doing a very exhaustive second-pass technical review. Not only did he offer extensive
and very helpful feedback, he also found many, if not all, places where I tried to make
things easy for myself. In addition, he helped me broaden the book by offering alter-
natives and perspectives.
As a matter of fact, this entire book wouldn’t exist in its present form without
Lisa Crispin’s help. She’s helped me to get it published, and she has supported me
whenever I needed it throughout the entire process. I’m honored to have her write one
of the forewords. Speaking of which, Jeff Langr also deserves my deepest gratitude
for writing a foreword as well and for motivating me to rewrite an important section
that I had been postponing forever. Mike Cohn, whom I’ve never had the pleasure of
meeting, has accepted this book into his series. I can’t even express how grateful I am
and what it means to me. Thanks!
While on the topic of publication, I really need to thank Chris Guzikowski at
Addison-Wesley. He’s been very professional throughout the process and, above all,
supportive beyond all limits. I don’t know how many e-mails I started with some-
thing akin to: “Thanks for your patience! There’s this thing I need to do before hand-
ing in the manuscript . . .” During the process of finalizing the book, I’ve had the
pleasure to work with very professional and accommodating people, who really made
the end of the journey interesting, challenging, and quite fun. Many thanks to Chris
Zahn, Lisa McCoy, Julie Nahil, and Rachel Paul.
My reviewers, Mikael Brodd, Max Wenzin, and Mats Henricson, have done a
huge job going through the text while doing the first-pass technical review.
Carlos Blé deserves special thanks for taking me through a TDD session that
ended up producing a solution quite different from the one in the chapter on TDD.
It sure gave me some things to think about, and it eventually led to a rewrite of the
entire chapter. Ben Kelly has helped me enormously in getting the details of the test-
ing terminology right, and he didn’t let me escape with dividing some work between
developers and testers. Dan North has helped me get the details straight about BDD
and ATDD. Frank Appel has helped me around the topic of unit testing and related
xxiii
xxiv Acknowledgments
material. His well-grounded and thorough comments really made me stop and think
at times. Many thanks. Alex Moore-Niemi has widened the book’s scope by provid-
ing a sidebar on types, a topic with which I’m only superficially familiar.
I’d also like to extend my thanks to Al Bagdonas, my first-pass proofreader and
copy editor for his dedication to this project.
In addition, I’d like to thank other people who have helped me along the way
or served as inspiration: Per Lundholm, Kristoffer Skjutare, Fredrik Lindgren, Yassal
Sundman, Olle Hallin, Jörgen Damberg, Lasse Koskela, Bobby Singh Sanghera, Gojko
Adzic, and Peter Franzen.
Last, but not least, I’m joining the scores of authors who thank their wives and
families. Writing a book is an endeavor that requires a lot of passion, dedication, and
above all, time away from the family. Teresia, thanks for your patience and support.
About the Author
Alexander Tarlinder wrote his first computer program around the age of 10, some-
time in the early nineties. It was a simple, text-based role-playing game for the Com-
modore 64. It had lots of GOTO statements and an abundance of duplicated code.
Still, to him, this was the most fantastic piece of software ever conceived, and an
entry point to his future career.
Twenty-five years later, Alexander still writes code and remains a developer at
heart. Today, his professional career stretches over 15 years, a time during which
he shouldered a variety of roles: developer, architect, project manager, Scrum-
Master, tester, and agile coach. In all these roles, he has gravitated toward sus-
tainable pace, craftsmanship, and attention to quality, and he eventually got test
infected around 2005. In a way, this was inevitable, because many of his projects
involved programming money somehow (in the banking and gaming industry),
and he always felt that he could do more to ensure the quality of his code before
handing it over to someone else.
Presently, Alexander seeks roles that allow him to influence the implementa-
tion process on a larger scale. He combines development projects with training
and coaching, and he shares technical and nontechnical aspects of developer test-
ing and quality assurance in conferences and local user groups meetings.
xxv
This page intentionally left blank
Chapter 1
Developer Testing
Developers Test
Developers have and will always test their software. Imagine the beginners writing
their first “Hello, World” program. No doubt they will execute it to verify that it actu-
ally outputs the everlasting words that have been echoed decade after decade by thou-
sands of programmers around the globe (see Figure 1.1).
Developers don’t need to be testing experts. Some types of testing require specific
skills or some distance from the tested software in order to mitigate any bias its cre-
ators may be subject to. This is why testing is a separate area of expertise.
Before embarking further into the field, let’s pause for a moment and get the
meaning of the word “developer” clarified. In some teams, most notably the ones
doing Scrum, all members of the development team are developers, and they spe-
cialize in programming, testing, interface design, or architecture (Sutherland &
Schwaber 2013). In this book the word “developer” refers to a person whose primary
responsibility is to write source code.
Regardless of whether all testing is done within the team or by someone from
outside, the output of the developers should be working software, not just something
that compiles. To either fulfill the quality standards set by the team or to avoid that
whoever does the final testing gets handed software of inferior quality, developers
must ensure the correctness of their code. In order to do that, they have to write their
code in a way that makes verification possible. Enter developer testing!
1
2 Chapter 1 Developer Testing
Unit Testing
Developers write unit tests. It’s their easiest, fastest, and most consistent way to verify
their assumptions about the code they produce. Either they do it before writing the
code to drive its design, or they do it after having written the code to verify that it
works as expected. In the first case, the testing and verification aspect may not be as
apparent as in the second. Nevertheless, unit tests are 100 percent developer-owned.
Integration Testing
In this chapter, the exact definition of the term “integration test” will remain a bit
vague (it’ll be defined in Chapter 3, “The Testing Vocabulary”). For now, let’s just
acknowledge that some tests are more complex than unit tests and benefit from being
written by developers. Such tests require more sophisticated setup and may execute
Developer Testing Activities 3
significantly slower. Running them manually would be both hard, because of their
coupling to the source code and implementation details, and impractical because of
their sheer number.
Maintenance
That the majority of a system’s life cycle is about maintenance isn’t a closely guarded
secret in the industry. It’s a well-known fact. Once a piece of software has been rolled
out into production, it goes into maintenance, which falls into either of two categories:
Patching and bug fixing—The system has been stable for quite a while and
requires relatively little intervention, but once in a while a defect pops up and
a bug fix is required.
Changes are introduced carefully, and their scope is limited to addressing
the defect, while leaving everything else intact. A well-proven technique for
fixing bugs is restraining oneself from rushing ahead to implement a fix, and
first writing a test that’ll fail because of the bug’s presence. In the absence of
the bug, that test would pass. Once the test is in place, the bug is fixed. If the
fix is correct, the test passes. That test is now in the codebase and ensures the
presence and correctness of the fix. This is also developer work.
Both types of maintenance require that the code be written with testability in
mind. The opposite—code that turns all attempts to change it into a mixture of one
part guessing game and one part nightmare—is called legacy code. Michael Feathers,
the author of Working Effectively with Legacy Code, defines legacy code as code with-
out tests.
A safe way of working with legacy code is adding tests to it retroactively to pin
down its behavior before making any changes. Such tests are called characterization
tests (Feathers 2004). Doing this is time consuming, sometimes hard, and not always
4 Chapter 1 Developer Testing
a very exciting activity, but the alternative is reading the code carefully before making
any changes and wishing that nothing breaks.1
Adding the missing tests and making the actual changes fall on the developers.
Continuous Integration
Continuous integration (CI) is the practice of integrating frequently and always keep-
ing the main build stable (Duvall, Matyas & Glover 2007). There are two sides to this
practice—the technical side and the social side. The technical side of continuous inte-
gration is made up of the process and infrastructure needed to achieve an automated
stable build:
The social dimension is about following the practices to the letter by actually run-
ning the tests locally before committing, by committing frequently, and, above all, by
reacting to broken builds and fixing them immediately before committing any other
work. This requires discipline and a dedicated team pulling in the same direction.
Getting this right is often harder than setting up the infrastructure and automation.
1. Actually, legacy code can be attacked by pair programming or working with reviews or formal
code inspections. However, they are only as good as the moment they are performed in. Tests
live longer and can be run over and over again.
2. Continuous integration can get arbitrarily complex depending on the type of system and the
expertise of the team. Experienced teams include deployment of a new version of the system
and end-to-end tests that require the system to be up and running in their CI build. This is
where continuous integration starts becoming continuous delivery (CD). For a more in-depth
description of continuous delivery, see Humble and Farley (2010).
What Developers Usually Don’t Do 5
So where do developers come in? They’re the ones writing and running the tests
before committing, and they’re the ones fixing the build if it breaks. More often than
not, they’ll be the ones to set up the CI server, especially when they need to run the
unit and integration tests.
Test Automation
In many cases, test automation is a developer activity. Only time and imagination set
the bounds for what kind of work we can automate: test data and environment gen-
eration, scripted execution, or automated checking, to name a few examples.
Acceptance test-driven development is also a good example, because it boils
down to authoring a test that’s readable to nontechnical users, implementable by
developers, and executable by a dedicated framework. There are different opinions
on exactly who should write the test, using what format and what tool. However, from
the developer’s point of view, these differences can be thought of as minor. In the
end, it’s the developer’s job to provide the infrastructure that will execute the tests.
In many cases it’s quite a body of code. The same goes for the other aforementioned
automation activities.
Due to the complexity of both professions, it’s impossible to say exactly when
developer work becomes tester work. That depends entirely on the context and on
factors like application domain, complexity, legal regulations, or team composition.
However, there are cases where it’s quite clear that a developer’s verification yields
diminishing returns.
to refine the theory and practices underlying developer testing even further. Here are
some of them:
Summary
Developers perform activities related to verification and quality assurance more often
than they may realize. In addition to running their code to check that it seems to
behave correctly, they
Each of these activities will benefit from the developer having some fundamental
testing knowledge and skills.
Developer testing is everything developers do to test their code, and this book
describes helpful behaviors, activities, and tools related to building quality into the code.
Although developers can and should do as much as possible to ensure the cor-
rectness and quality of their software, some testing-related activities are still best
performed by someone with a skill set slightly different from the developer’s. Such
activities include
Performance testing
Security testing
Usability testing
Testing the untypical and pathological cases
Nothing prevents the developer from doing any of these activities, but they aren’t
covered in this book.
Chapter 2
Testing Objectives,
Styles, and Roles
Organizations may differ enormously in their views on testing and development and
above all, in their opinions on how these two activities should be combined. In this
chapter we’ll take a quick look at what testing and quality assurance may look like in
different settings and see how developer testing fits into the picture.
9
10 Chapter 2 Testing Objectives, Styles, and Roles
Testing Objectives
Another way to look at testing is to examine its underlying objectives. At the extremes,
there are two fundamental approaches to testing: critiquing and supporting. They
come with different objectives and different vocabularies. Few, if any, organizations
operate in either extreme, but one of the perspectives usually dominates and gives
rise to the processes and the in-house vocabulary.
Testing to Critique
Testing to critique means to test something that’s finished and needs evaluating.
Once the software to be tested exists, the objective of the testing is to obtain informa-
tion about it. Such information can be used to answer questions like: “Does it deviate
from the specification?” or “Are there any defects in it?” In many people’s eyes, this is
the archetype of testing: verifying that something works.
If the information gathering happens in a wider scope and targets areas beyond
defects and deviations from the specification, questions like the following may be
answered:
The vocabulary of testing to critique includes the tester mind-set and the devel-
oper mind-set, according to which developers want to build and testers want to break.
After all, the majority of a tester’s time and skill set is spent investigating how the
product might fail, whereas the developer’s energy is channeled into constructing it.
As a consequence, developers may fall victims to viewing their code as an extension
of themselves. If so, they will work very hard to prove that the code is correct, even
though it’s full of obvious bugs. If a bug is found, they’re imperfect—they may suf-
fer from cognitive dissonance, a psychologically inconvenient state, and try to reduce
Testing Styles 11
Testing to Support
Testing to support is about safety, sustainable pace, and the team’s ability to work fast
and without fear of introducing defects during development. Its purpose is to pro-
vide feedback and help the team achieve immediate and constant confidence in the
software it produces. To gain such confidence, the team, and especially those whose
primary responsibility is to be quality champions, will sometimes perform testing
activities that critique. That said, their emphasis won’t be on obtaining information
based on supposedly completed software, but rather on obtaining information as
quickly as possible in parallel with the ongoing implementation. So, although infor-
mation gathering does take place and defects are being found, these activities are part
of the team’s quality feedback loop, which ultimately supports the whole team’s devel-
opment effort.
Test automation, test-driven development, and activities that aim at stabilizing
the development process and introducing fail-safes also belong in the domain of sup-
port testing.
By now it should be obvious that developer testing, as described in this book, is
testing meant to support.
Testing Styles
In some environments the style of testing is more noticeable than the underlying
objectives. Certain testing styles are more coupled to specific processes than others.
Traditional Testing
Traditionally, testing is thought of as a verification phase occurring after a construc-
tion phase. First something gets built and then it’s verified to make sure that it works.
What “built” and “verified” mean and how much effort these phases require vary
between industries and products.
12 Chapter 2 Testing Objectives, Styles, and Roles
This view often goes hand in hand with the building metaphor for systems and
their architectures. It assumes that there’s a master blueprint or specification to guide all
aspects of the construction (see Figure 2.1). Given this assumption, it makes perfect sense
to have a verification phase after the construction phase. Because a lot of effort has been
put into creating the blueprint,1 building the system should be only about following it. In
that sense, traditional testing is an embodiment of testing to critique.
While theoretically guaranteeing independent testing and immunity to all forms
of author bias, this setup comes with an inherent risk of fragmentation and conver-
gence. Because of the clear division of labor, employing traditional testing may create
an environment where developers and testers develop quite an adversarial view of
each other. Therefore, it’s not uncommon that developers and testers start using the
blueprint in isolation from each other and with very little communication between
the groups. While the developers try to implement it or create some kind of design
document out of it, the testers start deriving test cases from it. Once all features are
implemented, the resulting system is tested, and it comes as a surprise that the blue-
print has diverged and that there’s a mismatch between the produced software, the
test cases, and the original intent.
Well-defined processes are crucial for traditional testing to work. One such process
is the fundamental test process, which involves the following activities (ISTQB 2011):
1. Business analysts (BAs), architects, and customer representatives have spent many meeting
hours in creating an exhaustive specification.
Testing Styles 13
likely, that will be the extent of your verification activities, apart from reading bug
reports created by a separate quality assurance (QA) group or department. I’d argue
that nothing in the process says that it has to be this way, but my experience is that
this is how it plays out.
Agile Testing
Agile testing is testing that enables agile development. In essence, it’s about empower-
ing the tester and increasing collaboration within the team and with external stake-
holders (Gregory & Crispin 2008). In agile testing, the role of the tester is shifted from
14 Chapter 2 Testing Objectives, Styles, and Roles
reactive to proactive. Instead of writing test cases, waiting for something to test, or
executing manual tests, the tester becomes the team’s quality champion and contrib-
utes to a successful release in any way she can. For example, by helping the customer
or product owner to specify desired functionality, by making sure that testing activi-
ties are taken into account during planning and estimation meetings, by educating
and assisting the developers in test design and test automation, or by pair program-
ming or pair testing. Thus the tester’s role blends with the developer’s in the sense
that both take part in the development process, but from different angles. Having
testing experts on the development team provides several immediate advantages:
2. The wording is important here. In traditional testing, tests are supposed to be planned and
created in parallel with the development. The difference is that collaboration, joint planning,
and common success/completion criteria aren’t emphasized.
Testing Styles 15
Instead, developers will likely be notified about any errors they’ve introduced as soon
as they’re found.
Developers will still write unit tests, but they always have a colleague to ask about test
design. Imagine always being able to ask: “How will you test this?” or “What else will you
test?” Such an environment stimulates learning about testing and quality assurance.
and constantly allows tests, or sometimes3 even source code, to be written in such a
way that nontechnical stakeholders can verify them.
A ubiquitous language is one pillar of shared understanding; concrete examples
are another. They replace the vague language often seen in specifications that make
too much use of words like “shall,” “must,” and “should.” The team will use the exam-
ples in its conversations, workshops, and planning meetings to uncover assumptions,
corner cases, ambiguities, and inconsistencies that would remain hidden behind the
high-level wording of a user story or requirements document.
Concrete examples are either written as textual scenarios:
Or in tabular form:
1 150.00 No
1 150.00 No
2 100.00 No
3 99.00 No
3 99.01 Yes
10 99.01 Yes
10 99.00 No
Here we see that a seemingly trivial story can contain magic words like “loyal
customer” and “exceeds,” which are easily clarified using actual values. In this case,
customers are considered loyal if they’ve placed at least three orders in the past, and
they qualify for gifts if they exceed the $99 threshold by as little as one cent.
Concrete examples can easily evolve into tests, which will serve to enforce the
acceptance criteria. If the new functionality behaves as illustrated by the examples
3. One of my reviewers suggested that I get rid of this “sometimes.” I wish that I could, but
unfortunately, using a ubiquitous language and having a shared understanding don’t prevent
us from messing up the code. On the other hand, teams that have successfully embraced these
practices are likely to have good coding practices as well.
Testing Styles 17
after having been implemented, it’s most likely correct. Therefore, the next step is
to turn the examples into executable specifications. This is done using tools like Fit-
Nesse, Concordion, Cucumber, or SpecFlow, which all allow binding a textual arti-
fact—a scenario or table—to executable code. The tests run from outside the system,
or at least against the business layer, which is why they are often called automated
acceptance tests. Their function is to provide a receipt of the new functionality being
implemented, and they’re written ahead of the production code.
You know what? None of these factors really matter. If you’re the only developer,
or your team doesn’t have any testers, or you’re being rushed by others, or the system
is old and crappy, your quality assurance process is the only one you have, and it will
make or break your software.
Conversely, if your code will be tested by someone else, do you want that person to
find obvious and plainly stupid bugs in it? Do you want to waste that person’s time and
your employer’s money by turning trivial checks that are easily automated into manual
test cases or subjects of an exploratory testing session? Probably not. For many develop-
ers, the harsh reality is that professional testers who know their craft are a rare com-
modity, which is why we don’t want to waste their time and effort by creating software
that’s flawed by design and full of bugs that could easily have been avoided.
Every organization, team, and project is different, and provocative as it may sound,
that shouldn’t affect how the developers work. At the end of the day, it’s you who’ll make
changes to the software and fix the bugs, irrespective of the quality assurance process.
Therefore, it’s in your interest that the software be both testable and tested.
Summary 19
Summary
There’s a difference between testing and checking. The former assumes curiosity and
creativity, whereas the latter is mechanical and can safely be delegated to a computer.
Testing can be performed either to critique or to support. The contents of the
developer role and tester role are greatly affected by the organizational culture and
beliefs about what the two roles are about and how they should contribute. In cross-
functional teams, smaller companies, or agile-minded organizations, the developers
will be more involved in quality assurance, either by collaborating with testers on a
daily basis or by doing the verification and other QA activities themselves.
In larger companies or in companies that separate testing from development, the
developer may be at the mercy of the QA or testing department. There will be test
plans, and bugs will be called defects in a bug-tracking tool.
Most organizations will most likely adapt one of the following stances on testing:
What do people mean when they say that software should be tested? What activities,
performed when, and by whom do they refer to? The previous chapter described the
objectives and styles of testing. This chapter will get more concrete and take on
actual testing activities and the vocabulary of testing. Unfortunately, the language
of testing is quite elusive and the terminology rather ambiguous at times. The use
of terms and employment of techniques vary not only across different organiza-
tions, but chances are that as soon a new person enters your team, that person may
attach a different meaning to some of the words that you use when you speak about
testing and quality assurance.
This chapter is organized as a taxonomy of different types of testing and a dic-
tionary of some terms frequently used by testers. As a developer, it’s crucial to be well
familiar with the nuances of this vocabulary. There’s a high probability that it has
affected the way your colleagues approach quality assurance, so you’d better know
where the stuff in the walls comes from. This is especially true in organizations in
which development and testing have been, or still are, disconnected.
In addition, knowing about various types of testing gives a developer a more
solid understanding of the work needed to ensure correctness and other desirable
properties of the software. Thus, it helps to decompose the mystical task of testing
into very concrete activities, some of which are performed by developers, and some
by team members with other specialties. Estimating testing activities gets easier and
it becomes clear when the software is “good enough.”
Putting this material together was challenging, because getting just one pre-
cise definition of a certain type of test is hard and maybe not even meaningful. The
important fact to be aware of is that there are variations and differences. As you read
this chapter, please keep this in mind: what’s really important is that you agree on
the terminology in your organization. Ideally, your team decides on how its testing
is conducted and how it uses the vocabulary, after which it documents the results so
that they’re visible to everybody, like on a poster in the team’s room. In a not so ideal
world, an architect or test manager makes these decisions and writes them down in a
document (where they’ll likely never be found and read).
21
22 Chapter 3 The Testing Vocabulary
1. According to lore, Rear Admiral Grace Murray Hopper found a moth trapped in a relay of a
Mark II computer.
Classifying Tests 23
Classifying Tests
There are numerous ways to test software. Depending on the type of information
we want to discover about it and the kind of feedback we’re interested in, a certain
way of testing may be more appropriate than another. Tests are traditionally classified
along two dimensions: test level and test type (see Figure 3.1). Combining them into a
matrix provides a helpful visualization of the team’s testing activities.
Test Levels
A test level can be thought of as expressing the proximity to the source code and the foot-
print of the test. As an example, unit tests are close to the source code and cover a few
lines. On the contrary, acceptance tests aren’t concerned about implementation details
and may span over multiple systems and processes, thus having a very large footprint.
Unit Test
Unit testing refers to authoring fast, low-level tests that target a small part of the sys-
tem (Fowler 2014). Because of their natural coupling to the code, they’re written by
developers and executed by unit testing frameworks.
24 Chapter 3 The Testing Vocabulary
This sounds simple enough, but the term comes with its gray areas: size and scope
of a unit of work, collaborator isolation, and execution speed. Where the boundary of
a unit is drawn depends on the programming language and type of system. A unit
test may exercise a function or method, a class, or even a cluster of collaborating
classes that provide some specific functionality. This description may seem fuzzy, but
given some experience, it’s easy to spot unit tests that don’t make sense or are too
complicated. Collaborator isolation, along with speed of execution, is subject to more
intense debate. There are those who mandate that a unit test isolate all collaborators
of the tested code. Others strive for a less ascetic approach and isolate only collabora-
tors that, when invoked, would make the test fail because of unavailable or unreach-
able resources or external hosts. In either case, execution speed isn’t an issue. Finally,
some people argue that unit tests don’t have to replace slower collaborators at all as
long as the test is otherwise simple and to the point. This book uses a definition of
unit testing that fits the second of the three aforementioned variants.
When doing research for this book, I found that some sources used the terms
unit and component more or less interchangeably, in which case both referred to a
rather small artifact that can be tested in isolation. To a developer, a unit and a com-
ponent mean different things. As stated previously, a unit of work is a small chunk
of functionality that can be tested in a meaningful way. Components have a more
elusive definition, but the authors of Continuous Delivery—Reliable Software Deliv-
ery through Build, Test, and Deployment Automation nail it quite well: “. . . a rea-
sonably large-scale code structure within an application, with well-defined API, that
could potentially be swapped out for another implementation” (Humble & Farley
2010). This definition happens to coincide with how components are described in the
Classifying Tests 25
literature about software architecture. Thus, components are much larger than units
and require more sophisticated tests.
Integration Test
The term integration test is unfortunately both ambiguous and overloaded. The
ambiguity comes from the fact that “integration” may refer to either two systems or
components talking to each other via some kind of remote procedure call (RPC), a
database, or message bus; or it may mean “an integration test is that which is not a
unit test and not a system test.”
Actually there’s a point in maintaining this distinction. Testing whether two sys-
tems talk to each other correctly is a black box activity. Because the systems com-
municate through a (hopefully) well-defined interface, that communication is most
likely to be verified using black box testing. Traditionally, this would fall into the tes-
ter’s domain.
It’s the second definition, encountered frequently enough, that gives rise to the
overloading. The common reasoning goes something like the following, where Tracy
Tester and David Developer argue about a test:
Tracy: Have you tested that the complex customer record is written correctly to
the database?
David: Sure! I wrote a unit test where I stubbed out the database. Piece of cake!
Tracy: But the database contains both some triggers and constraints that could
affect the persistence of the customer record. I don’t think your unit test can
account for that.
David: Then it’s your job to test it! You’re responsible for the system tests.
Tracy: I’m not sure whether the database is a “system.” After all it’s your way of
implementing persistence. And besides, wouldn’t you want to be certain that
persisting the complex customer record won’t be messed up by somebody else
on the team? Sure, I can test this manually, but there are only so many times I
can do it.
David: You’re right, I guess. I need a test that runs in an automated manner, like
a unit test, but more advanced. It must talk to the database. Hmm . . . Let’s
call this an integration test! After all, we’re integrating the system with the
database.
Tracy: . . .
Based on the preceding logic, a test that opens a file to write “Hello world” to it
or just outputs the same string on the screen isn’t a unit test. Because it’s definitively
26 Chapter 3 The Testing Vocabulary
not a system test, it must be an integration test by analogy. After all, something is
integrated with the file system. Confused yet?
Integration tests, as per the second definition, are often intimately coupled to
the source code. Given that the line where a test stops being a unit test and becomes
something else is blurry and debated, many integration tests will feel like advanced or
slower unit tests. Because of this, it shouldn’t be controversial that integration testing
really is a developer’s job. The hard part is defining where that job starts and ends.
System Test
Systems are made up of finished and integrated building blocks. They may be compo-
nents or other systems. System testing is the activity of verifying that the entire system
works. System tests are often executed from a black box perspective and exercise inte-
grations and processes that span large parts of the system. A word of caution about
system testing: if the individual systems or components have been tested in isolation
and have gone through integration testing, system testing will actually target the
overall functionality of the system. However, if the underlying building blocks have
remained untested, system tests will reveal defects that should have been caught by
simpler and cheaper tests, like unit tests. In the worst cases, organizations with infe-
rior and immature development processes, that is, where the developers just throw
code over the wall for testing, have to compensate by running only system tests by
dedicated QA people.2
Acceptance Test
In its traditional meaning acceptance testing refers to an activity performed by the
end users to validate that the software they received conforms to the specifications
and their expectations and is ready for use. Alas, the term has been kidnapped. Now-
adays the aforementioned activity is called user acceptance testing (UAT) (Cimper-
man 2006), whereas acceptance testing tends to refer to automated black box testing
performed by a framework to ensure that a story or part of a story has been correctly
implemented. The major acceptance test frameworks gladly promote this definition.
Test Types
Test type refers to the purpose of the test and its specific objective. It may be to verify
functionality at some level or to target a certain quality attribute. The most prevalent
distinction between test types is that between functional and nonfunctional testing.
The latter can be refined to target as many quality attributes as necessary. Regression
testing is also a kind of testing that can be performed at all test levels, so it makes
sense to treat it as a test type.
Functional Testing
Functional testing constitutes the core of testing. In a striking majority of cases, say-
ing that something will need testing will refer to functional testing. Functional test-
ing is the act of executing the software and checking whether its behavior matches
explicit expectations, feeding it different inputs and comparing the results with the
specification,3 and exploring it beyond the explicit specification to see if it violates
any implicit expectations. Depending on the scope of the test, the specification may
be an expected value, a table of values, a use case, a specification document, or even
tacit knowledge. At its most fundamental, functional testing answers the questions:
Developers will most often encounter functional tests at the unit test level, simply
because they create many more of such tests in comparison to other types of tests.
However, functional testing applies to all test levels: unit, integration, system, and
acceptance.
Behavior
You will see the word behavior many times in this book. One reviewer, Frank Appel,
pointed out that this term is used very often in the industry without really being
defined. He suggested defining a component’s behavior as the outcome produced by its
functionality under certain preconditions.
I think this is a great definition that captures the meaning of this elusive term.
Because this is a chapter on terminology, I feel obliged to warn about the use of the
word component, though. Later in the book, I introduce the term program element,
which I think is a better fit.
3. Here the word specification doesn’t need to refer to a thick document. It could mean a user story
or any other way of expressing what the software should do.
28 Chapter 3 The Testing Vocabulary
Nonfunctional Testing
Nonfunctional testing, which by the way is a very unfortunate name, targets a solu-
tion’s quality attributes such as usability, reliability, performance, maintainability,
and portability, to name a few. Some of them will be discussed further later on.
Quality attributes are sometimes expressed as nonfunctional requirements,
hence the relation to nonfunctional testing.
Performance Testing
Performance testing focuses on a system’s responsiveness, throughput, and reliabil-
ity given different loads. How fast does a web page load? If a user clicks a button on
the screen, are the contents immediately updated? How long does it take to process
10,000 payment transactions? All of these questions can be asked for different loads.
Under light or normal load, they may indeed be answered by a performance
test. However, as the load on the system is increased—let’s say by more and more
users using the system at the same time, or more transactions being processed per
second—we’re talking about load testing. The purpose of load testing is to determine
the system’s behavior in response to increased load. When the load is increased
beyond the maximum “normal load,” load testing turns into stress testing. A special
type of stress testing is spike testing, where the maximum normal load is exceeded
very rapidly, as if there were a spike in the load. Running the aforementioned tests
helps in determining the capacity, the scaling strategy, and the location of the
bottlenecks.
Performance testing usually requires a specially tailored environment or soft-
ware capable of generating the required load and a way of measuring it.
Security Testing
This type of testing may require a very mixed set of skills and is typically performed
by trained security professionals. Security testing may be performed as an audit, the
purpose of which is to validate policies, or it may be done more aggressively in the
form of a penetration test, the purpose of which is to compromise the system using
black hat techniques.
Classifying Tests 29
There are various aspects of security. The security triad known as CIA is a com-
mon model that brings them all together (Stallings & Brown 2007). Figure 3.2 pro-
vides an illustration of the concepts in the triad. They include the following:
Confidentiality
Data confidentiality—Private or confidential information stays that way.
Privacy—You have a degree of control over what information is stored
about you, how, and by whom.
Integrity
Data integrity—Information and programs are changed by trusted sources.
System integrity—The system performs the way it’s supposed to without
being compromised.
Availability
Resources are available to authorized users and denied to others.
Each leg of the CIA triangle can be subject to an infinite number of attacks.
Whereas some of them will assume the shape of social engineering or manipu-
lation of the underlying operating system or network stack, many of them will
make use of exploits that wouldn’t be possible without defects in the software (devel-
oper work!). Therefore, it follows that knowing at least the basics of how to make an
application resilient to the most common attacks is something that a developer should
know by profession.
30 Chapter 3 The Testing Vocabulary
Most network protocols are not secure, and sending sensitive data over
the network is usually a bad idea.
Searching for Joe accounts, that is, accounts with easily guessed
credentials, is a common practice among digital villains.
Computers are fast; cracking a simple password may take minutes or
even seconds.
SQL injections wouldn’t be possible without developer ignorance, or
most likely laziness.a
The same is true for various file system traversal vulnerabilities.
If your program contains a fixed-size buffer for user input and that input
isn’t truncated, someone will send too much of it and either crash the
program or escalate privileges.
People can get very creative in attempts to put JavaScript code in HTML
forms, which is known as cross-site scripting (XSS).
a
ven in 2013, SQL injections were still the number-one threat according to the OWASP
E
top 10 list (OWASP 2015).
The way security testing has been described so far really makes it sound like non-
functional testing. However, there does exist a term like functional security testing
(Bath & McKay 2008). It refers to testing security as performed by a “regular” tester.
A functional security test may, for example, be about logging in as a nonprivileged
user and attempting to do something in the system that only users with administra-
tive privileges are allowed to do.
Normally, when we talk about security testing, we refer to the nonfunctional kind.
Regression Testing
How do we know that the system still behaves like it’s supposed to once we’ve changed
some functionality or fixed a bug? How do we know that we haven’t broken anything?
Enter regression testing.
The purpose of regression testing is to establish whether changes to the system
have broken existing functionality or caused old defects to resurface. Traditionally,
regression testing has been performed by rerunning a number of, or all, test cases
on a system after changes have been made. In projects where tests are automated,
Classifying Tests 31
regression testing isn’t much of a challenge. The test suite is simply executed once
more. In fact, as soon as a test is added to an automated suite of tests, it becomes a
regression test.
The true challenge of regression testing faces organizations that neither have a
traditional QA department or tester group, nor automate their tests. In such organi-
zations, regression testing quickly turns into the Smack-a-Bug game.
4. Ideally your team can perform all its testing always, constantly, and continually. In my
experience, such cases are rare. Even great cross-functional teams may lack competence or
resources to perform certain kinds of nonfunctional testing.
32 Chapter 3 The Testing Vocabulary
If a customer uses direct bank payments to pay for our product and pays too
much, does he or she get a refund, or is the excess amount stored and used in
the next transaction?
If validation of the credit card fails, the transaction enclosing the purchase is
rolled back, nothing is stored in the database, and the event is logged.
Another dimension of the testing quadrants is the distinction between tests that
guide development, like tests written by developers to ensure that the produced code
is correct, and tests that critique the product. The latter are directed toward the fin-
ished product and attempt to find deficiencies in it.
In my opinion, this is one of the most usable models in the domain of software
testing. No, it’s the most usable. It facilitates teamwork by turning testing into a coop-
erative activity, instead of an adversarial one, while at the same time reminding us
of the duality of guiding/supporting testing and the critiquing kind. The model also
tells us that in order for a team to deliver a product that functions correctly, delights
the users, and solves the business problem, it must view its testing activities from sev-
eral disparate perspectives.
When projected onto the Agile Testing Quadrants, developer tests cover the
whole of the lower left quadrant, large parts of the upper left quadrant, and a fair
share of the lower right quadrant.
5. The model was originally created by Brian Marick (2003) and has been popularized by Lisa
Crispin and Janet Gregory (2008). It has been challenged, adapted, and revised, so there’s plenty
of material available online. Gojko Adzic’s (2013) and Michael Bolton’s (2014) work on the topic
are good entry points to this material.
Some Other Types of Testing 33
Figure 3.3 Agile Testing Quadrants as presented in the book More Agile Testing by Lisa
Crispin and Janet Gregory (2014).
Smoke Testing
The term smoke testing originated from engineers testing pipes by blowing smoke
into them. If there was a crack, the smoke would seep out through it. In software
development, smoke testing refers to one or a few simple tests executed immediately
after the system has been deployed. The “Hello World” of smoke testing is logging
into the application.6 Trivial as it may seem, such a test provides a great deal of infor-
mation. For example, it will show that
6. Because the “Hello World” of applications is an application that requires logging in.
34 Chapter 3 The Testing Vocabulary
The database could be reached (because user credentials are usually stored in
the database)
The application starts, which means that it isn’t critically flawed
Smoke tests are perfect candidates for automation and should be part of an automated
build/deploy cycle. Earlier we touched on the subject of regression tests. Smoke tests
are the tests that are run first in a regression test suite or as early as possible in a con-
tinuous delivery pipeline.
End-to-End Testing
Sometimes we encounter the term end-to-end testing. Most commonly, the term
refers to system testing on steroids. The purpose of an end-to-end test is to include
the entire execution path or process through a system, which may involve actions
outside the system. The difference from system testing is that a process or use case
may span not only one system, but several. This is certainly true in cases where the
in-house systems are integrated with external systems that cannot be controlled. In
such cases, the end-to-end test is supposed to make sure that all systems and subsys-
tems perform correctly and produce the desired result.
What’s problematic about this term is that its existence is inseparably linked to
one’s definition of a system and system boundary. In short, if we don’t want to make
a fuss about the fact that our e-commerce site uses a payment gateway operated by a
third party, then we’re perfectly fine without end-to-end tests.
Characterization Testing
Characterization testing is the kind of testing you’re forced to engage in when chang-
ing old code that supposedly works but it’s unclear what requirements it’s based on,
and there are no tests around to explain what it’s supposed to be doing. Trying to
figure out the intended functionality based on old documentation is usually a futile
attempt, because the code has diverged from the scribblings on a wrinkled piece of
paper covered with coffee stains long ago.7 In such conditions, one has to assume
that the code’s behavior is correct and pin it down with tests (preferably unit tests),
so that changing it becomes less scary. Thus, the existing behavior is “characterized.”
7. My experience is that truly old specifications always come in paper form only! It’s not that they
predate text files, but the original document has been lost forever in a disk crash, reorganization
of the shared network drive, or somebody’s project directory cleanup frenzy.
Some Other Types of Testing 35
Characterization tests differ from regression tests in that they aim at stabilizing exist-
ing behavior, and not necessarily the correct behavior.
Small tests—Correspond closely to unit tests; they’re small and fast. They’re
not allowed to access networks, databases, file systems, and external systems.
Neither are they allowed to contain sleep statements or test multithreaded
code. They must complete within 60 seconds.
Medium tests—May check the interactions between different tiers of the
application, which means that they can use databases, access the file system,
and test multithreaded code. They should stay away from external systems
and remote hosts, though, and should execute for no longer than 300 seconds.
Large tests—Not restricted by any limitations.
Summary
Many of the terms in this chapter have multiple meanings and can be interpreted dif-
ferently in different contexts. The purpose of this chapter is to bring to light several
key terms that are used during discussions about software development and testing.
Human mistakes are called errors in testing speak. Errors frequently lead to soft-
ware defects—bugs. Bugs may lead to software failures.
White box testing assumes having access to the source code and targets the inter-
nal structure of a system, whereas black box testing is done “from the outside” and
targets the functionality.
Unit tests ensure that a small unit of code, like a function, a class, or a group
of classes, works as expected. Integration tests verify that components/systems can
talk to each other, but sometimes the term is used to describe tests that are some-
where between unit tests and system tests. System tests are run to verify an entire
system. Finally, acceptance tests are performed by the customer to make sure that the
expected system has been delivered, whereas automated acceptance tests are written
by the team and executed by a testing framework to verify that a story or scenario has
been implemented.
The Agile Testing Quadrants is a model that divides tests into dimensions of tech-
nology versus business oriented, as well as guiding the development versus critiquing
the product.
Classifying tests can clarify discussions about responsibility and what to test,
when, and how. The important thing is to use a classification that everybody in the
organization agrees on (or at least is familiar with).
Chapter 4
Testability from a
Developer’s Perspective
Testability means different things to different people depending on the context. From
a bird’s eye view, testability is linked to our prior experience of the things we want to
test and our tolerance for defects: the commercial web site that we’ve been running
for the last five years will require less testing and will be easier to test than the insu-
lin pump that we’re building for the first time. If we run a project, testability would
be about obtaining the necessary information, securing resources (such as tools and
environments), and having the time to perform various kinds of testing. There’s also
a knowledge perspective: How well do we know the product and the technology used
to build it? How good are our testing skills? What’s our testing strategy? Yet another
take on testability would be developing an understanding of what to build by having
reliable specifications and ensuring user involvement. It’s hard to test anything unless
we know how it’s supposed to behave.1
Before breaking down what testability means to developers, let’s look at why
achieving it for software is an end in itself.
Testable Software
Testable software encourages the existence of tests—be they manual or automatic.
The more testable the software, the greater the chance that somebody will test it, that
is, verify that it behaves correctly with respect to a specification or some other expec-
tations, or explore its behavior with some specific objective in mind. Generally, peo-
ple follow the path of least resistance in their work, and if testing isn’t along that path,
it’s very likely not going to be performed (Figure 4.1).
That testable software will have a greater chance of undergoing some kind of
testing may sound really obvious. Equally apparent is the fact that lack of testability,
often combined with time pressure, can and does result in bug-ridden and broken
software.
Whereas testable software stands on one side of the scale, The Big Ball of Mud
(Foote & Yoder 1999) stands on the other. This is code that makes you suspect that
1. For an in-depth breakdown of testability, I recommend James Bach’s work on the subject (2015).
37
38 Chapter 4 Testability from a Developer’s Perspective
(to say nothing of waiting for the application to start up), or if you must take a coffee
break every time you want to check if your batch program behaves correctly for that
special almost-never-occurring edge case?
Testers approaching a system with The Big Ball of Mud architecture also face a
daunting task. Their test cases will start with a long sequence of instructions about
how to put the system in a state the test expects. This will be the script for how to
fill in the values in the UI or how to set the system up for the 20-minute-long batch
execution. Not only must the testers author that script and make it detailed enough,
they must also follow it . . . many times, if they are unlucky. Brrr.
Benefits of Testability
Apart from shielding the developers and testers from immediate misery, testable soft-
ware also has some other appealing qualities.
Such discussions take place if the software’s functionality isn’t verifiable and is
expressed as guesses instead. Lack of testability makes confirming these guesses hard
and time consuming. Therefore, there’s a strong probability that it won’t be done.
And because it won’t be done, some of the software’s features will only be found
in the lore and telltales of the organization. Features may “get lost” and, even worse,
features may get imagined and people will start expecting them to be there, even
though they never were. All this leads to “this is not a bug, it’s a feature” type of argu-
ments and blame games.
40 Chapter 4 Testability from a Developer’s Perspective
It Can Be Changed
Software can always be changed. The trick is to do it safely and at a reasonable cost.
Assuming that testable software implies tests, their presence allows making changes
without having to worry that something—probably unrelated—will break as a side
effect of that change.
Changing software that has no tests makes the average developer uncomfort-
able and afraid (and it should). Fear is easily observed in code. It manifests itself as
duplication—the safe way to avoid breaking something that works. When doing code
archaeology, we can sometimes find evidence of the following scenario:
At some point in time, the developer needed a certain feature. Alas, there wasn’t
anything quite like it in the codebase. Instead of adapting an existing concept, by gener-
alizing or parameterizing it, he took the safe route and created a parallel implementa-
tion, knowing that a bug in it would only affect the new functionality and leave the rest
of the system unharmed.
2. A slight variation of this is nicely described in the book Pragmatic Unit Testing by Andrew Hunt
and David Thomas (2003). They plot productivity versus time for software with and without
tests. The productivity is lower for software supported by tests, but it’s kept constant over time.
For software without tests, the initial productivity is higher, but it plummets after a while and
becomes negative. Have you been there? I have.
Benefits of Testability 41
This is but one form of duplication. In fact, the topic is intricate enough to deserve
a chapter of its own.
Domain-Specific Languages
Domain-specific languages (DSLs) have promise. They simplify the work for
their users and avoid the repetitive creation of similar code. They bring us
closer to being able to say exactly what we mean in the language of the
problem we are solving by encapsulating potentially complex logic in a higher-
order vocabulary. If the author guarantees the correctness of the elements of
the DSL, whole layers of code are correct before we try to use them.
However, good DSLs are notoriously hard to write. Arguably, almost
every API we use should be a good DSL, but how many are? Creating a good
DSL requires not only taking the time to understand the domain, but also
playing with different models of the domain and its interactions to optimize
its usability and utility. Additionally, there may be multiple characteristic
usage patterns, differing levels of relevant abstractions, varying levels of user
expertise, and impactful technological changes over time.
Take, for example, the Capybara acceptance test framework for Ruby,
often cited as an example of a well-crafted DSL in the context of its host
language. With a set of actions like visit, fill_in, click_button and
matchers like have_content, it is well suited to static web pages. Under the
covers, it has adapted to the rapid evolution of underlying tools like Selenium,
but not without challenges at times. However, it still has difficulty dealing with
the dynamic, time-dependent behaviors of single-page applications.
Formal Methods
Formal methods sound good. They provide formal proof of the correctness
of the code. Unfortunately, we have had a hard time adapting them to larger
Testability Defined 43
problems, they are very labor intensive, and most programmers I’ve met
prefer not to deal in that level of mathematical rigor. The research continues,
but we’re not there yet.
Types
Types bridge the gap between mainstream languages and formal methods in
my opinion. By using a subset of formal specification, they help you ensure
correctness by cleanly and compactly expressing your illegal “corner cases”
in the context they can be most readily applied.
Others
Other approaches provide partial, complex, or laborious solutions. If you’re so
inclined, maybe you can find that great breakthrough. Until then, keep testing.
Testability Defined
Testability is a quality attribute among other “ilities” like reliability, maintainability,
and usability. Just like the other quality attributes, it can be broken down into more
fine-grained components (Figure 4.2). Observability and controllability are the two
cornerstones of testability. Without them, it’s hard to say anything about correctness.
The remaining components described next made it to the model based on my practi-
cal experience, although I hope that their presence isn’t surprising or controversial.
When a program element (see “Program Elements”) is testable, it means that it
can be put in a known state, acted on, and then observed. Further, it means that this
can be done without affecting any other program elements and without them inter-
fering. In other words, it’s about making the black box of testing somewhat transpar-
ent and adding some control levers to it.
Program Elements
From time to time I’ll be using the term program element. The meaning of the term
depends on the context. Sometimes it’s a function, sometimes a method, sometimes a
class, sometimes a module, sometimes a component, or sometimes all of these things.
I use the generic term to avoid clumsy sentences.
Using a catch-all term also solves the problem of emphasizing the difference
between programming paradigms. Although the book favors object-oriented code,
many techniques apply to procedural and functional constructs too. So instead of
writing “class” and “method” everywhere, I can use “program element” and refer to
“function” or “module” as well, like a C file with a bunch of related functions.
44 Chapter 4 Testability from a Developer’s Perspective
Observability
In order to verify that whatever action our tested program element has been subjected
to has had an impact, we need to be able to observe it. The best test in the world isn’t
worth anything unless its effects can be seen. Software can be observed using a vari-
ety of methods. One way of classifying them is in order of increasing intrusiveness.
The obvious, but seldom sufficient, method of observation is to examine whatever
output the tested program element produces. Sometimes that output is a sequence of
characters, sometimes a window full of widgets, sometimes a web page, and some-
times a rising or falling signal on the pin of a chip.
Then there’s output that isn’t always meant for the end users. Logging statements,
temporary files, lock files, and diagnostics information are all output. Such output is
mostly meant for operations and other more “technical” stakeholders. Together with
the user output, it provides a source of information for nonintrusive testing.
To increase observability beyond the application’s obvious and less obvious out-
put, we have to be willing to make some intrusions and modify it accordingly. Both
testers and developers benefit from strategically placed observation points and vari-
ous types of hooks/seams for attaching probes, changing implementations, or just
peeking at the internal state of the application. Such modifications are sometimes
frowned upon, as they result in injection of code with the sole purpose of increasing
observability. At the last level, there’s a kind of observability that’s achievable only by
Testability Defined 45
developers. It’s the ability to step through running code using a debugger. This cer-
tainly provides maximum observability at the cost of total intrusion. I don’t consider
this activity testing, but rather writing code. And you certainly don’t want debugging
to be your only means of verifying that your code works.
Too many observation points and working too far from production code may
result in the appearance of Heisenbugs—bugs that tend to disappear when one tries to
find and study them. This happens because the inspection process changes something
in the program’s execution. Excessive logging may, for example, hide a race condition
because of the time it takes to construct and output the information to be logged.
Logging, by the way, is a double-edged sword. Although it’s certainly the easiest
way to increase observability, it may also destroy readability. After all, who hasn’t
seen methods like this:
Although all of this is true, the root cause of the problem isn’t really information
hiding or encapsulation, but poor design and implementation, which, in turn, forces
us to ask the question of the decade: Should I test private methods? 3
Old systems were seldom designed with testability in mind, which means that
their program elements often have multiple areas of responsibility, operate at differ-
ent levels of abstraction at the same time, and exhibit high coupling and low cohesion.
Because of the mess under the hood, testing specific functionality in such systems
through whatever public interfaces they have (or even finding such interfaces) is a
laborious and slow process. Tests, especially unit tests, become very complex because
they need to set up entire “ecosystems” of seemingly unrelated dependencies to get
something deep in the dragon’s lair working.
In such cases we have two options. Option one is to open up the encapsulation by
relaxing restrictions on accessibility to increase both observability and controllabil-
ity. In Java, changing methods from private to package scoped makes them accessible
to (test) code in the same package. In C++, there’s the infamous friend keyword,
which can be used to achieve roughly a similar result, and C# has its Internals-
VisibleTo attribute.
The other option is to consider the fact that testing at a level where we need to
worry about the observability of deeply buried monolithic spaghetti isn’t the course
of action that gives the best bang for the buck at the given moment. Higher-level tests,
like system tests or integration tests, may be a better bet for old low-quality code that
doesn’t change that much (Vance 2013).
With well-designed new code, observability and information hiding shouldn’t be
an issue. If the code is designed with testability in mind from the start and each pro-
gram element has a single area of responsibility, then it follows that all interesting
abstractions and their functionality will be primary concepts in the code. In object-
oriented languages this corresponds to public classes with well-defined functionality
(in procedural languages, to modules or the like). Many such abstractions may be
too specialized to be useful outside the system, but in context they’re most meaning-
ful and eligible for detailed developer testing. The tale in the sidebar contains some
examples of this.
3. Or functions, or modules, or any program element, the accessibility to which is restricted by the
programming language to support encapsulation.
Testability Defined 47
So what happens if, let’s say, the parsing code is replaced with a third-
party implementation? Numerous tests will be worthless, because the new
component happens to be both well renowned for its stability and correctness
and well tested. This wouldn’t have happened if all tests targeted the initial
public interface. Well, this is the “soft” in software—it changes. The tests that
are going to get thrown away once secured the functionality of the parser,
given its capabilities and implementation. The new parsing component comes
with new capabilities, and certainly a new implementation, so some tests will
no longer be relevant.
Controllability
Controllability is the ability to put something in a specific state and is of paramount
importance to any kind of testing because it leads to reproducibility. As developers,
we like to deal with determinism. We like things to happen the same way every time,
or at least in a way that we understand. When we get a bug report, we want to be able
to reproduce the bug so that we may understand under what conditions it occurs.
Given that understanding, we can fix it. The ability to reproduce a given condition in
a system, component, or class depends on the ability to isolate it and manipulate its
internal state.
Dealing with state is complex enough to mandate a section of its own. For now,
we can safely assume that too much state turns reproducibility, and hence control-
lability, into a real pain. But what is state? In this context, state simply refers to what-
ever data we need to provide in order to set the system up for testing. In practice, state
isn’t only about data. To get a system into a certain state, we usually have to set up
some data and execute some of the system’s functions, which in turn will act on the
data and lead to the desired state.
Different test types require different amounts of state. A unit test for a class that
takes a string as a parameter in its constructor and prints it on the screen when a
certain method is called has little state. On the other hand, if we need to set up thou-
sands of fake transactions in a database to test aggregation of cumulative discounts,
then that would qualify as a great deal of state.
Deployability
Before the advent of DevOps, deployability seldom made it to the top five quality attri-
butes to consider when implementing a system. Think about the time you were in a
large corporation that deployed its huge monolith to a commercial application server.
Was the process easy? Deployability is a measure of the amount of work needed to
deploy the system, most notably, into production. To get a rough feeling for it, ask:
Testability Defined 49
“How long does it take to get a change that affects one line of code into production?”
(Poppendieck & Poppendieck 2006).
Deployability affects the developers’ ability to run their code in a production-like
environment. Let’s say that a chunk of code passes its unit tests and all other tests on
the developer’s machine. Now it’s time to see if the code actually works as expected in
an environment that has more data, more integrations, and more complexity (like a
good production-like test environment should have). This is a critical point. If deploy-
ing a new version of the system is complicated and prone to error or takes too much
time, it won’t be done. A typical process that illustrates this problem is manual deploy-
ment based on a list of instructions. Common traits of deployment instructions are that
they’re old, they contain some nonobvious steps that may not be relevant at all, and
despite their apparent level of detail, they still require a large amount of tacit knowledge.
Furthermore, they describe a process that’s complex enough to be quite error prone.
Being unable to deploy painlessly often punishes the developers in the end. If
deployment is too complicated and too time consuming, or perceived as such, they
may stop verifying that their code runs in environments that are different from their
development machines. If this starts happening, they end up in the good-old “it works
on my machine” argument, and it never makes them look good, like in this argument
between Tracy the Tester and David the Developer:
Tracy: I tried to run the routine for verifying postal codes in Norway. When I
entered an invalid code, nothing happened.
David: All my unit tests are green and I even ran the integration tests!
Tracy: Great! But I expected an error message from the system, or at least some
kind of reaction.
David: But really, look at my screen! I get an error message when entering an
invalid postal code. I have a Norwegian postal code in my database.
Tracy: I notice that you’re running build 273 while the test environment runs
269. What happened?
David: Well . . . I didn’t deploy! It would take me half a day to do it! I’d have to
add a column to the database and then manually dump the data for Norway.
Then I’d have to copy the six artifacts that make up the system to the
application server, but before doing that I’d have to rebuild three of them. . . .
I forgot to run the thing because I wanted to finish it!
The bottom line is that developers are not to consider themselves finished with
their code until they’ve executed it in an environment that resembles the actual pro-
duction environment.
Poor deployability has other adverse effects as well. For example, when prepar-
ing a demo at the end of an iteration, a team can get totally stressed out if getting the
last-minute fixes to the demo environment is a lengthy process because of a manual
procedure.
Last, but not least, struggling with unpredictable deployment also makes critical
bug fixes difficult. I don’t encourage making quick changes that have to be made in a
very short time frame, but sometimes you encounter critical bugs in production and
they have to be fixed immediately. In such situations, you don’t want to think about
how hard it’s going to get the fix out—you just want to squash the bug.
Isolability
Isolability, modularity, low coupling—in this context, they’re all different sides of the
same coin. There are many names for this property, but regardless of the name, it’s
about being able to isolate the program element under test—be it a function, class,
web service, or an entire system.
Isolability is a desirable property from both a developer’s and a tester’s point of
view. In modular systems, related concepts are grouped together, and changes don’t
ripple across the entire system. On the other hand, components with lots of depen-
dencies are not only difficult to modify, but also difficult to test. Their tests will
require much setup, often of seemingly unrelated dependencies, and their interac-
tions with the outside world will be artificial and hard to make sense of.
Isolability applies at all levels of a system. On the class level, isolability can be
described in terms of fan-out, that is, the number of outgoing dependencies on other
classes. A useful design rule of thumb is trying to achieve a low fan-out. In fact, high
fan-out is often considered bad design (Borysowich 2007). Unit testing classes with
high fan-out is cumbersome because of the number of test doubles needed to isolate
the class from all collaborators.
Poor isolability at the component level may manifest itself as difficulty setting
up its surrounding environment. The component may be coupled to other compo-
nents by various communication protocols such as SOAP or connected in more indi-
rect ways such as queues or message buses. Putting such a component under test may
require that parts of it be reimplemented to make the integration points interchange-
able for stubs. In some unfortunate cases, this cannot be done, and testing such a
component may require that an entire middleware package be set up just to make it
testable.
Systems with poor isolability suffer from the sum of poorness of their individ-
ual components. So if a system is composed of one component that makes use of an
enterprise-wide message bus, another component that requires a very specific direc-
tory layout on the production server (because it won’t even run anywhere else), and a
third that requires some web services at specific locations, you’re in for a treat.
Smallness
The smaller the software, the better the testability, because there’s less to test. Simply
put, there are fewer moving parts that need to be controlled and observed, to stay
consistent with this chapter’s terminology. Smallness primarily translates into the
quantity of tests needed to cover the software to achieve a sufficient degree of con-
fidence. But what exactly about the software should be “small”? From a testability
perspective, two properties matter the most: the number of features and the size of
the codebase. They both drive different aspects of testing.
52 Chapter 4 Testability from a Developer’s Perspective
Feature-richness drives testing from both a black box and a white box perspec-
tive. Each feature somehow needs to be tested and verified from the perspective of the
user. This typically requires a mix of manual testing and automated high-level tests
like end-to-end tests or system tests. In addition, low-level tests are required to secure
the building blocks that comprise all the features. Each new feature brings additional
complexity to the table and increases the potential for unfortunate and unforeseen
interactions with existing features. This implies that there are clear incentives to keep
down the number of features in software, which includes removing unused ones.
A codebase’s smallness is a bit trickier, because it depends on a number of fac-
tors. These factors aren’t related to the number of features, which means that they’re
seldom observable from a black box perspective, but they may place a lot of burden on
the shoulders of the developer. In short, white box testing is driven by the size of the
codebase. The following sections describe properties that can make developer testing
cumbersome without rewarding the effort from the feature point of view.
Singularity
If something is singular, there’s only one instance of it. In systems with high singu-
larity, every behavior and piece of data have a single source of truth. Whenever we
want to make a change, we make it in one place. In the book The Pragmatic Program-
mer, this has been formulated as the DRY principle: Don’t Repeat Yourself (Hunt &
Thomas 1999).
Testing a system where singularity has been neglected is quite hard, especially
from a black box perspective. Suppose, for example, that you were to test the copy/
paste functionality of an editor. Such functionality is normally accessible in three
ways: from a menu, by right-clicking, and by using a keyboard shortcut. If you
approached this as a black box test while having a limited time constraint, you might
have been satisfied with testing only one of these three ways. You’d assume that the
others would work by analogy. Unfortunately, if this particular functionality had
been implemented by two different developers on two different occasions, then you
wouldn’t be able to assume that both are working properly.
A third version?
This example is a bit simplistic, but this scenario is very common in systems that
have been developed by different generations of developers (which is true of pretty
much every system that’s been in use for a while). Systems with poor singularity
Testability Defined 53
appear confusing and frustrating to their users, who report a bug and expect it to be
fixed. However, when they perform an action similar to the one that triggered the bug
by using a different command or accessing it from another part of the system, the
problem is back! From their perspective, the system should behave consistently, and
explaining why the bug has been fixed in two out of three places inspires confidence
in neither the system nor the developers’ ability.
To a developer, nonsingularity—duplication—presents itself as the activity of imple-
menting or changing the same data or behavior multiple times to achieve a single
result. With that comes maintaining multiple instances of test code and making sure
that all contracts and behavior are consistent.
Level of Abstraction
The level of abstraction is determined by the choice of programming language and
frameworks. If they do the majority of the heavy lifting, the code can get both smaller
and simpler. At the extremes lie the alternatives of implementing a modern applica-
tion in assembly language or a high-level language, possibly backed by a few frame-
works. But there’s no need to go to the extremes to find examples. Replacing thread
primitives with thread libraries, making use of proper abstractions in object-oriented
languages (rather than strings, integers, or lists), and working with web frameworks
instead of implementing Front Controllers4 and parsing URLs by hand are all exam-
ples of raising the level of abstraction. For certain types of problems and constructs,
employing functional or logic programming greatly raises the level of abstraction,
while reducing the size of the codebase.
The choice of the programming language has a huge impact on the level of
abstraction and plays a crucial role already at the level of toy programs (and scales
accordingly as the complexity of the program increases). Here’s a trivial program
that adds its two command-line arguments together. Whereas the C version needs to
worry about string-to-integer conversion and integer overflow . . .
#include <stdio.h>
#include <stdlib.h>
4. https://en.wikipedia.org/wiki/Front_Controller_pattern
54 Chapter 4 Testability from a Developer’s Perspective
. . . its Ruby counterpart will work just fine for large numbers while being a little more
tolerant with the input as well.
From a developer testing point of view, the former program would most likely
give rise to more tests, because they’d need to take overflow into account. Gener-
ally, as the level of abstraction is raised, fewer tests that cover fundamental building
blocks, or the “plumbing,” are needed, because such things are handled by the lan-
guage or framework. The user won’t see the difference, but the developer who writes
the tests will.
Efficiency
In this context, efficiency equals the ability to express intent in the programming lan-
guage in an idiomatic way and making use of that language’s functionality to keep the
code expressive and concise. It’s also about applying design patterns and best prac-
tices. Sometimes we see signs of struggle in codebases being left by developers who
have fought valorously reinventing functionality already provided by the language or
its libraries. You know inefficient code when you see it, right after which you delete
20 lines of it and replace them with a one-liner, which turns out idiomatic and simple.
Inefficient implementations increase the size of the codebase without providing
any value. They require their tests, especially unit tests, because such tests need to
cover many fundamental cases. Such cases wouldn’t need testing if they were handled
by functionality in the programming language or its core libraries.
Reuse
Reuse is a close cousin of efficiency. Here, it refers to making use of third-party com-
ponents to avoid reinventing the wheel. A codebase that contains in-house implemen-
tations of a distributed cache or a framework for managing configuration data in text
files with periodic reloading5 will obviously be larger than one that uses tested and
working third-party implementations.
This kind of reuse reduces the need for developer tests, because the functionality
isn’t owned by them and doesn’t need to be tested. Their job is to make sure that it’s
plugged in correctly, and although this, too, requires tests, they will be fewer in number.
5. Now this is a highly personal experience, but pretty much all legacy systems that I’ve seen have
contained home-grown caches and configuration frameworks.
Summary 55
Mind Maintainability!
All of the aforementioned properties may be abused in a way that mostly hurts
maintainability. Singularity may be taken to the extreme and create too tightly
coupled systems. Too high a level of abstraction may turn into some kind of “meta
programming.” Efficiency may turn into unmotivated compactness, which hurts
readability. Finally, reuse may result in pet languages and frameworks being brought
in, only to lead to fragmentation.
Summary
If the software is designed with testability in mind, it will more than likely be tested.
When software is testable, we can verify its functionality, measure progress while
developing it, and change it safely. In the end, the result is fast and reliable delivery.
Testability can be broken down into the following components:
Structuring code so that it’s testable, whereby increasing its probability of being
tested isn’t the only way to aim for correct software. Another approach would be to
go down the road of formal methods, that is, mathematical proofs. In this chapter,
we examine yet another alternative, which is modeling the software as transactions
between a client and supplier, who agree on a contract that forces them to uphold
certain obligations to each other (see Figure 5.1). In exchange, both get some benefits.
If the contract is violated, the application stops. For such an approach to be effective,
the contract must be constantly checked at runtime, as opposed to running a suite of
tests now and then or proving a fact about the program on paper.
When software is written in this way, we’re talking about Programming by Con-
tract.1 This technique is quite characteristic for Eiffel, where it’s built right into the
language.2 However, even without full language support it’s still quite usable.
1. Actually, the more well-known term is “Design by Contract,” but the term is trademarked and
won’t be used in this book.
2. According to Wikipedia, roughly 15 languages have built-in contract support (http://en
.wikipedia.org/wiki/Design_by_contract).
57
58 Chapter 5 Programming by Contract
Postconditions are constraints on the supplier’s internal state and often the return
value. They need to be met prior to returning from a call to the supplier. If such a
constraint isn’t met, the supplier terminates before returning. Postconditions are also
short lived. They apply only when returning to the calling client. All of the following
would make reasonable postconditions:
When transferring funds between accounts, the same amount is added to one
account and subtracted from the other.
When creating an object, its member variables have all been initialized to
legal values.
When adding an element to a linked list, the new element becomes the list’s
head and it points to the previous head of the list.
Invariants are the third building block of contracts. Two common types of invari-
ants are class invariants and loop invariants. Class invariants are constraints that are
always upheld for a class’s internal state. For example, if we have a class that repre-
sents time and uses integers to store hours and minutes, a reasonable class invariant
would require that they be in the ranges 0–23 and 0–59. Constraints upheld by class
invariants can live as long as the executing program. For example, consider a class
invariant on a collection of bank accounts stating that the sum of all transactions
must equal the total balance.
60 Chapter 5 Programming by Contract
Contracts in Eiffel
The following routine, written in Eiffel, uses preconditions to check that its
parameters are valid and a postcondition to verify that the return value is
reasonable. As we see, contract checking is clearly supported by the language.
Thinking in Contracts
Irrespective of your favorite language’s support for contracts, the major shift when
employing them comes from having to think about the produced code in terms of cli-
ents and suppliers and the consequences of formalizing responsibilities. In languages
where contracts are supported natively, specifying the contract prior to writing any
code is an established design practice.
Establishing preconditions, postconditions, and maybe even invariants for the
program elements that we create slows us down—but in a good way. We need to think
about where responsibilities lie and which part of the code should do what. Whether
we strive to uphold the contract at runtime or not is secondary in my opinion. Speci-
fying the contract is the critical aspect of this technique.
Implementing Programming by Contract 61
This may sound obvious, but think about this: How many times have you had to
consider where to put the responsibility for ensuring that the arguments passed to a
method/function/routine are valid? In the majority of systems that I’ve worked with,
this question has been ignored or subject to heated debate, thereby producing the full
spectrum of possibilities:
The caller ensures that the arguments are correct—This stance is typically
taken by libraries and reusable components, which are supposed to be clean
and easy to understand, as opposed to sprinkled with various null and range
checks. Routines in such libraries may crash badly if incorrect arguments are
supplied. Thus, the contracts are clearly stated, but not enforced.
The callee checks the arguments—It’s perfectly logical for the callee to check
the values of the input parameters (and, in fact, a must) in code exposed to
public use. Publicly available remote procedure calls (RPCs) or web services
make good examples. Because they don’t know the intentions of the caller,
whose objective may be to crash the callee for fun or privilege escalation, they
must take their own protective measures. Routines that are to be called by
unknown and potentially malicious clients should be crafted appropriately
and apply additional checks to their input parameters. Common vulnerabili-
ties like buffer overflows and SQL injections are often a result of missing or
too lenient parameter checking.
Legacy systems maintained by generations of developers, where nobody
can trust anything, are another example. Such systems tend to have islands
where defensive programming has been applied and where arguments are
checked more thoroughly. The person who wrote the code probably thought:
“Everything is so buggy. I can’t trust anything, but at least I can make sure that
my routine doesn’t swallow the garbage without a fight.” This is a brave attempt
to enforce some kind of contract.
The responsibility isn’t formalized—Different generations of developers and
programming styles, combined with lack of conventions, typically lead to a
clear absence of argument checking, duplicated effort, or mishmash of the
two preceding strategies.
Contracts, be they part of the language or just a mental model, blend naturally
with object-oriented design. If it’s clear what kind of contract each object honors,
especially its construction logic, many tedious and verbose checks and validations
may be omitted. Suppose that we implement the classic time difference function:
given two dates, it returns the time difference between them. A naïve implementa-
tion using integers as arguments would have to start by checking that the arguments
indeed are valid dates—for example, that they follow the format yyyymmdd. On the
62 Chapter 5 Programming by Contract
other hand, if the same function would accept two date objects, it could stop wor-
rying about validating them and just perform the computation. In other words, the
contract of the class representing the date would save the date difference function
from performing extraneous checks. In fact, this example also illustrates how con-
tracts can help us to follow the Single Responsibility Principle (Martin 2002) by tak-
ing validation and parameter checking off the table.
Enforcing Contracts
Once we decide to actually adopt contracts as a design technique, we have multiple
options at our disposal for how to enforce them. Our choice will be affected by the
availability of the technique in question in our current programming language and
our intention to aim for runtime enforcement, in contrast to expressing the intention
of contracts and more indirect means of enforcement.
Assertions
Assertions are by far the most common way to achieve contract verification. They’re
runtime checks that verify a boolean condition and make the program terminate
with a diagnostic message if that condition isn’t satisfied. A feature of assertions is
that they can be turned off, which means that code executed in them mustn’t be criti-
cal to the execution of the program.
The fact that failed assertions terminate in a way that aborts the execution of
the application without further ado makes them totally inappropriate for verifying
parameters to public functions or input supplied by the user. This is, by the way, in
line with the philosophy of Programming by Contract, according to which contract
checking should be preceded by normal validation logic. Imagine a typical construc-
tor for a simple time class:
this.hour = hour;
this.minute = minute;
}
Using assertions like this would be incorrect, because we don’t want the program
to crash just because invalid parameters have been passed to a public constructor.
Also, we don’t want the constructor to start accepting arbitrary values just because we
decided to deactivate assertions.
In short, precondition verification and assertions apply to situations where we
want to guard against programming errors and incorrect caller behavior, which isn’t
the case for public APIs. Such APIs should use normal error or exception handling to
reject bad input.
Assertions have a slight impact on performance. The cost varies from language to
language and platform to platform, but we can expect at least an additional condi-
tional to be executed for every assertion.3
3. Saving nanoseconds at the cost of turning off assertions may be a bad idea, but the point of this
argument is that we don’t want assertions that aren’t used. They use up the few nanoseconds,
but they also clutter the code if used incorrectly and in excess.
4. There are dedicated Programming by Contract libraries for Java, like Cofoja, but I haven’t seen
them used in practice.
64 Chapter 5 Programming by Contract
In my opinion, this better shows the intent and is more readable than if state-
ments. In addition, it indicates the presence of a contract and its preconditions.
C# developers have a more versatile tool at their disposal in Code Contracts,
which is a package that adds pretty much full-fledged contract support to C# (RiSE
2015). This highly configurable package allows verifying the different building blocks
of contracts at both runtime and to some extent statically.
Discussing the full functionality of Code Contracts is beyond the scope of this
book, but as a teaser, the following snippet shows that the library can be used to both
validate arguments by throwing a developer-specified runtime exception and to
truly enforce a precondition by throwing an unrecoverable ContractException
(Microsoft 2013):
Unit Tests
My personal experience is that neither assertions nor specialized libraries have had
a major breakthrough or have reached the large masses. Hopefully, it’s not because
developers don’t know or care about these techniques and building blocks, but
because they specify and verify their contracts with unit tests. Using tests to express
a contract is an indirect means of enforcement, but that doesn’t make the technique
less effective. After all, unit tests are perfectly capable of verifying preconditions,
postconditions, and invariants once the hard work—specifying them—has been
Summary 65
done. Obviously, the test-based approach takes away the runtime checking and, more
important, the explicit documentation of the contract in the production code, but in
spite of these drawbacks, it’s the most popular choice.
Static Analysis
If runtime enforcement of contracts is at one side of a scale, static analysis is on the
other. Still, static analysis together with type metadata can be used to express the
intention of a contract. When using languages that allow annotating types somehow,
we can make the IDE or a static analysis tool help us to uphold some rudimentary
constraints for variables, method arguments, and return values (depending on which
of these can be annotated). This can be considered a method of enforcing, or at least
expressing, contracts in a way, but it’s limited to the level of sophistication of the type
metadata and compile-time checking.5
The flagship of this technique is some form of null check, like the @Nonnull
or @NotNull annotations in Java (JCP 2006) and the [NotNull] attribute in C#.6
While other annotations exist, this is the one that seems to have caught on the most
at the time of writing.
Summary
Programming by Contract is a technique complementary to testing and is about run-
time verification of constraints defined by contracts. Such constraints may be pre-
conditions, postconditions and different types of invariants. The constraints ensure
that calls are made using valid parameters and that the program is in a sound state. A
constraint violation is an unrecoverable error.
Methods designed with a contract in mind (either explicitly enforced or just as a
design aid) will have clearer responsibilities and will be easier to understand. This, in
turn, simplifies testing.
The majority of languages don’t support contracts directly; rather, they use asser-
tions to achieve the effect of contract checking. Caution should be exercised in such
cases, because assertions don’t necessarily make it into production.
The big takeaway from this chapter is that designing program elements with con-
tracts in mind helps give these elements clear responsibility and helps determine what
kinds of tests, and how many, we need in order to verify that a contract is indeed sup-
ported. Once a contract has been defined, we can verify it using secondary techniques
like unit testing or static analysis.
5. Actually, one can use aspect-oriented programming to provide runtime checks, but I’ve never
seen it done in practice.
6. This attribute comes from the JetBrains.Annotations package and is interpreted by ReSharper
(JetBrains 2016).
This page intentionally left blank
Chapter 6
Drivers of Testability
Some constructs and behaviors in code have great impact on its testability. This chap-
ter is about exploring and harnessing them. Let’s start by looking at two snippets of
code. The first one—matrix multiplication—is a typical programming exercise for
fresh computer science students.
If your brain is wired like mine, you’ll find the second snippet more readable and
easier to understand. However, from a testability point of view, the differences aren’t
in the variable names, nested loops, and opportunities for off-by-one errors. What
truly makes these snippets different is the amount of direct and indirect input and
output in each of them and how they handle state.
1. It’s one of those properties that comes with trade-offs, however. In some cases, relying on only
direct input may conflict with object-oriented design and encapsulation.
2. Equivalence classes and boundary values are mentioned a few times in this chapter, but are
properly introduced in Chapter 8, “Specification-based Testing Techniques.”
Indirect Input and Output 69
1. It’s consistent—Given the same set of input data, it always returns the
same output value, which doesn’t depend on any hidden information,
state, or external input.
2. It has no side effects—The function doesn’t change any variables or
data of any type outside of the function. This includes output to I/O
devices.
Given this definition, functions that have no indirect input or output are
pure. As for side effects, these typically involve
3. Actually, we can count three. A counter is incremented too, but this will be treated later.
70 Chapter 6 Drivers of Testability
State
Let’s return to the dispatchInvoice method once more. Its last line, where a
counter is incremented, presents a challenge in itself when it comes to testing. The
code is written so that we don’t know whether processedInvoices is a class
variable or a member variable, but we do know that some state is changed. The coun-
ter may have numerous uses, spanning from plain simple logging to triggering some
critical business rule.
What if the last line of dispatchInvoice were changed to this instead:
if (++processedInvoices == BATCH_SIZE) {
invoiceRepository.archiveOldInvoices();
invoiceQueue.ensureEmptied();
}
Suddenly the state triggers something important, and any tests written against
the method need to take that into account. A test that wants to trigger the condition
needs to do one of the following:
None of these options are trivially obvious. You have to make the trade-off between
violating encapsulation, writing a more complex test, or reworking the code. Do keep
in mind that the example was about something as simple as a class or member vari-
able and that there are many more elaborate and intricate ways to introduce state.
Databases, by nature, are piles of state. If you’ve ever had to debug an invoicing
algorithm that applied a myriad of business rules to tens of thousands of customers,
all of which had unique purchasing histories, you know the meaning of both state
and pain. The same goes for reports, network-aware applications, page navigation,
and so on.
The point is that all but the most trivial applications will have state, and we need
to take that into account when designing testable code. The question we must ask
ourselves is: “How do I set up a test so that I reach the correct state prior to verify-
ing the expected behavior?” Or a better question may in fact be: “How do I keep the
Temporal Coupling 71
amount of state down and isolated so that I don’t have to ask myself the former ques-
tion too often?”
Temporal Coupling
Temporal coupling is a close cousin of state. “Temporal” means that something has to
do with time. In this case, it’s the time of invocation or, more specifically, the order of
invocation. Given a program element with functions f1 and f2, there exists a temporal
coupling between them if, when f2 is called, it expects that f1 has been called first—
that is, it relies on state set up by f1.
Imagine the multiply function from the example at the beginning of this
chapter being moved to a class and the parameters being set using an old-fashioned
initializer method.
class MatrixMultiplier {
private double[][] m1
private double[][] m2
if (m1[0].length != m2.length) {
throw new IllegalArgumentException(
"width of m1 must equal height of m2"
)
}
this.m1 = m1
this.m2 = m2
}
double[][] multiply() {
// Same as before, but with member variables
}
}
This change, deliberately crude to get your attention, introduces temporal cou-
pling. A call to multiply now requires first calling initialize. Otherwise, it
will reward you with a NullPointerException. Code in the wild will be just as
ruthless. It will either crash if things are called out of order or perform some convo-
luted initialization spanning different layers of abstraction while violating a whole
host of design practices and all forms of logic—all to make it impossible for you to
even dare move a single line within a method.
In essence, temporal coupling arises as soon as one program element needs some-
thing to have happened in another program element in order to function correctly.
72 Chapter 6 Drivers of Testability
Usually, this isn’t the end of the world. In many cases, it’s quite apparent that there’s
some kind of life cycle or otherwise intuitive order of execution. Temporal coupling
becomes dangerous if the succession of invocations isn’t apparent and if calling a
method out of order puts the application in an invalid state or results in some kind of
error, like a NullPointerException.
Temporal coupling is quite common. Many libraries, especially those written in
procedural languages, rely on it for initialization. Knowing what it looks like, there’s
no glory in creating more of it, especially in object-oriented languages that have
constructors.
Integers in modern languages are usually 32-bit numbers ranging from roughly
minus 2 billion to plus 2 billion. This means that the age parameter needs some
more thorough checking to make sure that the value stored in the oversized data type
is reasonable. How about this?
Now the code ensures that the business rule is applied to a reasonable value. This
needs to be done at every place in the code where age is used.4 But what about valida-
tion? Sure, validating age someplace else would do the trick, provided that it is done
4. Actually it doesn’t, but would you feel comfortable about people of age 432544 years passing the
check?
Data Types and Testability 73
everywhere that age is being checked, but this introduces temporal coupling between
the validation and any logic that relies on age. Now, given that validation usually
resides in another layer and may be written in a different language by—heaven
forbid—another person, this type of coupling isn’t something you want to rely on.
The age example may seem trivial, so let’s list some other candidates for this type
of behavior:
Currency
National identification numbers
Date of birth
Date/time
This class can be extended to handle comparisons with other age objects or inte-
gers, and in some designs putting an isOfLegalAge method there would make
real sense.
74 Chapter 6 Drivers of Testability
class CHECK
feature -- Divy up.
split_by (num_of_people: INTEGER) -- Split a check by diners.
require
non_negative: num_of_people >= 1
do
... split it up here ...
ensure
split_checks: check_count = old check_count + 1
end
Isn’t require inside the routine body? Technically no, because the routine
body starts with do. It still seemed weird to me, though, semantically, to have
the precondition anywhere but immediately before the feature name.
Better yet, I wondered: Can I get this precondition entirely out of my
function definition?
Looking again at our precondition num_of_people >= 1, it’s nothing
magical, just a predicate. How else can we encode a predicate instead of as
a precondition inside a function? In Eiffel, the answer would be to encode in
Data Types and Testability 75
Now we don’t need to test for each case because it’s literally impossible
in our system to be an employee and not have the necessary ID.e Instead,
we incrementally build valid data, always guaranteed to have a “legal” state.
Essentially we have implicit preconditions living in our types. How? It’s
hiding right in plain sight: we actually have logical operators operating on
our types! In F#, sum types are represented with | and product types are
represented with *. These correspond to the logical operators ˅ (or) and ˄
(and), respectively.
The more you work in this way, the easier it becomes to see how to
encode valuable business logic out of preconditions in functions, or out of
functions entirely, and into our data types. Does this remove the need for unit
tests entirely? In my experience, no. But the more logic you move into your
type system, the less you will need to test.
a
https://docs.eiffel.com/book/method/et-design-contract-tmassertions-and-exceptions.
b
ven functions have types because in a functional language you have higher order
E
functions that operate on other functions!
c
Formally, not all operations are “closed under” a type.
d
https://blogs.janestreet.com/effective-ml-revisited/.
e
ou may ask: What if we need to represent people who aren’t yet employees? (And
Y
thus don’t have ID.) You’d create a new type!
Domain-to-Range Ratio 77
Domain-to-Range Ratio
Speaking of data types and their ranges naturally takes us to this chapter’s last piece
of theory. How would we test a function, f, that supposedly says whether a number is
odd or even and returns 0 if it’s odd and 1 if it’s even?
Given that we accept that 0 is an even number, the first test that comes to mind
is calling the function with 0 and comparing the result to 0. Next, we’d probably
call it with a 1, and expect 1 in return. Then what? Is f(10) = 0 a good test? Or
maybe f(9999) = 1? That depends.
Let’s leave the world of software and go for a more mathematical definition. f
maps the set of natural numbers to [0, 1]. This means that we no longer have to con-
cern ourselves with things like f("hello world"). The range of f is the set consisting of
0 and 1, whereas its domain is the set of natural numbers. Given these definitions,
the domain-to-range ratio (DRR) can be introduced. It’s the quotient of the number
of possible inputs over the number of different outputs. In a more mathematical lan-
guage, we can state this as the cardinality of the function’s domain over the cardinal-
ity of its range (Woodward & Al-Khanjari 2000):
D
DRR =
R
Why is that interesting? Let’s reduce the size of our problem and replace the infi-
nite set of natural numbers with the set of numbers from 1 to 6. Thus, the size of
the domain is 6, which makes the Domain-to-Range Ratio equal to 6/2. The measure
tells us something about the information loss that occurs when multiple values in the
input map to the same output. In the example, three input values map to the same
output value, three to another. It would be tempting to create only two test cases for
this scenario; after all, there are two reasonable equivalence classes here—odd and
even numbers.
Now, suppose that f looks like this:
f(1) = 1
f(2) = 0
f(3) = 1
f(4) = 1
f(5) = 1
f(6) = 0
It’s almost a function that determines whether a number is even or odd, but it has
an exception built in. If there’s no test for f(4), we’re in for a surprise. This is an exam-
ple of how bugs can creep into areas that suffer from information loss. The problem
78 Chapter 6 Drivers of Testability
is amplified if the input domain (and consequently the DRR) grows. Without getting
too formal, we can say that the DRR is a measure of risk; the higher it is, the more
unsafe it’ll be to have very few tests.
The previous example illustrates how a trivial function with obvious equivalence
partitions can include surprises that may remain unfound, unless the DRR isn’t con-
sidered. Naturally this doesn’t mean that we should throw equivalence partitioning
out the window. Rather, it means that we should be careful both in situations involv-
ing discontinuous large input domains that cannot be easily partitioned and in
situations where there’s information loss (as indicated by the DRR). It’s also yet
another reason for keeping data type sizes close to the range of the variable that
they hold and to introduce abstractions that uphold invariants and keep the size
of the domain down.
Summary
Several constructs and behaviors in code affect testability. Direct input/output is
observable through a program element’s public interface. This makes testing easier,
because the tests need only be concerned about passing in interesting arguments and
checking the results, as opposed to looking at state changes and interactions with
other program elements.
Conversely, indirect input/output cannot be observed through the public inter-
face of a program element and requires tests to somehow intercept the values coming
in to and going out from the tested object. This usually moves tests away from state-
based testing to interaction-based testing.
The more complex state a program element allows, the more complex the tests
need to become. Therefore, keeping state both minimal and isolated leads to simpler
tests and less error-prone code.
Temporal coupling arises if one method requires another method to be invoked
first. Typical examples are initializer methods. Temporal coupling is actually state in
disguise and should therefore be avoided if possible.
The Domain-to-Range Ratio is a measure of information loss in functions that
map large input domains to small output domains, which may hide bugs. It’s yet
another tool when determining what abstractions to use and how many tests there
should be.
Chapter 7
Unit Testing
Unit testing is the professional developers’ most efficient strategy for ensuring that they
indeed complete their programming tasks, that the code they write works in accor-
dance with their assumptions,1 and that it can be changed by them and their peers.
A hobby hack written and used by one person doesn’t need to have unit tests. One
person suffers the consequences of bugs, and if any refactorings take more time than
necessary or totally break the project, that’s probably fine too. If the project is more
about coding for fun than producing something that an actual customer is willing to
pay money for and that can be developed and maintained by more than one person
for a longer period of time, having no unit tests is a viable strategy.
Why Do It?
Why should you invest time in writing unit tests when working with software profes-
sionally? Here are a couple of reasons. Some of them echo arguments made previously
in the book, but it doesn’t make them less true. Unit tests
1. It’s tempting to write “works correctly” instead of “works in accordance with their assump-
tions,” but proving that a program is correct is impossible, except for simplistic snippets used in
a university course on formal methods.
79
80 Chapter 7 Unit Testing
input validation, or invocation of the happy path. This allows testing performed
manually to uncover things that are far more interesting than, let’s say, off-by-
one errors. Conversely, teams and organizations that lack unit tests will have to
compensate by manual means, which translates into manual checking.
Specify behavior and document the code—Ideally, a unit test is a descrip-
tion of some behavior of the tested code; that is, an example of how the code
should work or implement a specific business rule. It’s documentation. And
what documentation tends to actually get read—a dryly written, autogen-
erated method description or working code?
2. The following definition is inspired by Osherove (2009); Langr, Hunt, and Thomas (2015); and
Feathers (2004).
3. This is one of these rules that has exceptions, but stop and think before testing encapsulated,
nonpublic behavior.
82 Chapter 7 Unit Testing
The last two points are sometimes subject to debate, but in this book they’re part
of the definition.
When writing a test that adheres to the preceding definition, it’s quite hard to
make it complex. Another good reason for following the aforementioned constraints
is environment independence (portability). Unit tests have to be portable across all
developers’ environments. They also have to be runnable in environments used for
continuous integration. Such environments will most likely be quite different from
the average developer machine. They may run another operating system, establish
The Life Cycle of a Unit Testing Framework 83
network connections to other hosts, use a different directory layout, and so on. For
these reasons it’s vital that unit tests don't involve external resources.
Test Methods
Different frameworks use different mechanisms for test discovery. If the program-
ming language in which the tests are written supports metadata (such as annotations
or attributes), this mechanism tends to be the first choice. Such is the case for JUnit
4.x, NUnit, and MSTest, to mention just a few. In other cases, the framework relies
on naming conventions. Methods prefixed with "test" are considered tests in JUnit
3.x, Ruby’s Test::Unit, PHPUnit,4 and XCTest for Objective-C and Swift. Finally, in
some frameworks everything must be done “by hand,” like in CUnit, where the test
methods are added to the test suite programmatically. In general, the frameworks
make no guarantees about the order of execution of the individual test methods, and
some even randomize the execution order on purpose. This is a good thing, because it
makes it virtually impossible to create tests that are coupled to each other.
4. Actually PHPUnit uses a hybrid approach. It relies on naming conventions, but also supports
annotations.
84 Chapter 7 Unit Testing
Figure 7.1 The life cycle of a unit testing framework. Most frameworks don’t provide the
outermost initializers/cleanup methods.
that run the same or very similar setup code. Moving such code to a common initial-
izer eliminates duplication and enhances readability. Think of this initializer as the
tests’ constructor.
There’s one downside to test initializers: they spread out a test’s code across dif-
ferent locations. When reading a test, you must also take the code in the initializer
into account, and if it doesn’t fit on the screen together with the test’s code, you’ll
need to scroll back and forth. For many tests this won’t matter, whereas some will
really suffer. I can’t give a general pointer, but consider extracting common setup
code to well-named methods and call them at the beginning of the test if you feel that
it makes the test more readable and easier to understand.
After each test a cleanup method, commonly called teardown, is called (again
analogous to a destructor). A good rule of thumb is to avoid using cleanup methods
when writing unit tests. Since unit tests are supposed to run in isolation, the mere
presence of a cleanup method should raise suspicion, especially when working with a
language that has automatic garbage collection.
Naming Tests 85
Initializer methods are called once per test, although many unit testing frame-
works support initializers that are called once per class or even less frequently.5 Such
initializers are rarely needed by unit tests and are meant for tests that require lengthy
setup, like connecting to a database or setting up some lightweight server. Such tests
are, by this book’s definition, not unit tests.
Naming Tests
Naming tests is difficult. Coming up with a name that conveys both what’s specific
about the test and an expected outcome may often be quite a challenge. Furthermore,
the name of a test should make it distinguishable from other tests in the same suite or
category.
5. For example, MSTest supports the AssemblyInitialize attribute, which enables calling
an initializer method once for an entire assembly.
86 Chapter 7 Unit Testing
Most test names you’ll encounter will be influenced by one of the following nam-
ing conventions, or a variation thereof (see Kumar [2014] for more naming schemes).
Therefore, just pretend that the prefix isn’t there and use one of two remaining
naming conventions after the prefix.
ShouldAlertUserIfAccountBalanceIsExceeded
or
ShouldFailForNegativeAmount
Naming Tests 87
If you’re quite new to unit testing and just need a “do like this” pointer, then
use the third naming style. It forces you to think about both what makes the
test interesting (state under test) and the expected outcome.
Combine! After having written some tests you’ll realize that the rigid form
of the third naming style may not actually be the best for the type of test
you’re about to write, and you may start questioning the rationale behind the
“should” (“what if it doesn’t?”). I often find that the best test names are defini-
tive statements about the conditions of the test and the outcome. See the next
code snippet—the one about the magic hat—for an example.
Let the context decide. Often, the type of code that you’re writing tests for
and their design will push you toward a preferred naming standard for these
particular tests. Don’t be surprised if other code and tests in the same code-
base will make you want to choose another naming convention.
88 Chapter 7 Unit Testing
Experiment!
Don’t be afraid of experimenting with test names. I certainly wasn’t when writing
the sample code for this book. I ended up using different naming conventions and
variations on purpose throughout the book to illustrate how they play out.
Structuring Tests
A common way of organizing code in a test method is following the “triple A” struc-
ture: Arrange. Act. Assert. It helps in dividing a test into three distinct phases, where
the first is dedicated to setting things up, the second to executing the code to be
tested, and the third to verifying the outcome.
[TestMethod]
public void MagicHatConvertsRedScarfIntoWhiteRabbit()
{
// Arrange
var magicHat = new MagicHat();
magicHat.PutInto(new Scarf(Color.Red));
// Act
magicHat.TapWithMagicWand();
var itemFromHat = magicHat.PullOut();
// Assert
var expectedItem = new Rabbit(Color.White);
Assert.AreEqual(expectedItem, itemFromHat);
}
You might want to agree on the terminology in your team or organization, but the
important thing is the structure, not the name, especially because the words seldom
make it to the tests.
Assertion Methods
Because unit tests are self-verifying, they must somehow communicate success or
failure. Assertion methods provide a standardized way to express the outcome of
the test so that the checking can be automated by the test framework, while the test
remains readable to the developer (Meszaros 2007). An assertion method that fails
will make the framework fail the test—and produce the dreaded red bar in the major-
ity of frameworks. For a test to pass, none of its assertions6 may fail.
Types of Assertions
Assertions come in different flavors. The types and number of assertions vary from
framework to framework. Table 7.1 presents a lowest common denominator of one C#
and one Java-based framework.
Functionality starts to differ beyond this minimal subset, so it’s always worth-
while to read the framework’s documentation. For example, some frameworks have
AreNotEqual
AreNotSame assertNotSame
IsTrue assertFalse
IsNotNull assertNotNull
6. From now on the term assertion will be used instead of assertion method. Although it clashes
with how the word was used in Chapter 5, “Programming by Contract,” it does make the text
more fluent.
90 Chapter 7 Unit Testing
more “core” assertions, whereas others make use of helper classes, like MSTest’s
CollectionAssert and StringAssert classes. What’s important is that you
should use the assertions from your framework that best communicate your intent.
Finally, although assertion methods have been around from the dawn of time and
are the foundations of an absolute majority of unit testing frameworks, you can get
away without using them. Groovy’s Spock Framework (or just “Spock”) is designed
around blocks, such as given:, when:, then:, or expect:. This structure allows
it to treat everything in the then: and expect: blocks as assertions, which means
that Spock tests use normal comparisons (or any kind of predicates) where an xUnit
framework would employ an assertion method. The subsequent chapters contain
some tests written using Spock, but here’s a sneak peek:
Another common guard assertion is checking the size of a collection before examin-
ing its contents. For example, before examining some property of the second element
of a tested collection, a guard assertion is used to ensure that the collection indeed
contains two elements.
Assertion Methods 91
This example illustrates how one semantic concept may or may not require sev-
eral assertions depending on the syntax. In just a few paragraphs, I’ll describe the
AssertThat mechanism, which allows lumping together arbitrarily complex logic
into a single assertion. This is yet another reason for not striving for ending a test
with a single assertion slavishly.
A pragmatic developer may identify a third category of exceptions to the “one
assertion per test” guideline—the tedious tests, those that don’t exercise an intricate
piece of logic or a clever algorithm. Such tests are necessary, because they protect
from copy and paste mistakes, off-by-one errors, and other bugs easily introduced
when working with repetitive patterns. They’re usually not software engineering
masterpieces and may contain multiple assertions without suffering too much. Often
these tests verify one thing semantically, but the syntactic implementation may be
quite offending.
[TestMethod]
public void CreatePersonEntityFromTransferObject()
{
var dto = new PersonDTO { FirstName = "Brian", LastName = "Brown", Age = 25 };
var newEntity = PersonCreator.CreateEntity(dto);
Assert.IsNotNull(newEntity.Id);
92 Chapter 7 Unit Testing
Assert.AreEqual("Brian", newEntity.FirstName);
Assert.AreEqual("Brown", newEntity.LastName);
Assert.AreEqual(25, newEntity.Age);
Assert.AreEqual(DateTime.Now.ToShortDateString(),
newEntity.Created.ToShortDateString());
}
Verbosity of Assertions
Did you notice how the values of the names and the age were repeated in the previous
example? Due to their very nature, tests tend to contain some duplicated code. Con-
sider the following test of a method that loops through a list of people and puts their
first names in a comma-separated list:
[TestMethod]
public void CollectFirstNames_ThreePersons_ResultContainsThreeNames()
{
var adam = new Person { FirstName = "Adam", LastName = "Anderson" };
var brian = new Person { FirstName = "Brian", LastName = "Brown" };
var cecil = new Person { FirstName = "Cecil", LastName = "Clark" };
Notice that the first names appear in both the setup code and the verification.
Is this duplication annoying? Sometimes we might feel tempted to rewrite the test
to eliminate such duplication. In the preceding example, the line containing the
expected value could be rewritten to:
Surely this would eliminate the duplication, but it would introduce another prob-
lem. It so happens that the tested method looks like this:
Now, suppose that some time passes and in a few weeks another developer
decides to modify the method so that it also capitalizes the first names. Ignorant of
Assertion Methods 93
Command/Query Separation principle (Meyer 1997), lazy, or just human, that devel-
oper adds a line of seemingly clever code—and introduces a bug by modifying the
incoming names!
This modification doesn’t break the test, because the value of the expected
variable is a result of concatenating all first names after they have been accidentally
modified. With this behavior, the test is dubious at best, or simply utterly wrong. In
this particular case, putting the assignment of expected prior to the call would
repair the situation. This, however, would introduce temporal coupling in the test.
Instead, by allowing a small amount of duplication, we can protect the test from code
that introduces side effects that may fool the verification.
Is allowing a degree of duplication a rule then? No! At the end of the day, it boils
down to communicating intent. In some tests, it’s better to use constants in both
input and expected values to highlight correlated values, whereas others may be made
more readable and understandable by some duplication.
Asserting Equality
The most commonly used assertion is by far that which checks for object equality. In
many cases this is very unproblematic. For example:
But what would happen if we had to assert that two Person objects from one of
the previous examples were equal?
[TestMethod]
public void TwoPersonsWithIdenticalAttributesAreIdentical()
{
var aPerson = new Person { FirstName = "Adam",
LastName = "Anderson", Age = 21};
var anotherPerson = new Person { FirstName = "Adam",
LastName = "Anderson", Age = 21};
Assert.AreEqual(aPerson, anotherPerson);
}
94 Chapter 7 Unit Testing
Does the previous test succeed or fail? Whether it succeeds depends entirely on
whether the Person class has an Equals method with a reasonable implementa-
tion; one that tells whether two persons are equal in the context of the domain. For-
getting to provide the Equals method or its equivalent is an extremely common
source of errors in unit tests.
In some rare cases7 we can’t implement the equality method in a way that makes
it usable for testing. In other cases, initializing an object just to make a comparison,
as in the preceding example, seems to defeat the very purpose of the test. If the Per-
son class contained 10 more fields, like gender, address, and some flags that some-
how always make it to such classes, it would do more harm than good to set up such
an object and then rely on one assertion. In such cases, having multiple assertions per
test is quite acceptable. Or it could be an opportunity to make use of more sophisti-
cated assertions.
Specialized Assertions
Remember the Person class from the previous examples? It contained an Age attri-
bute. What if you wanted to test whether a person is an adult, that is, not underage or
retired? As a first test, an example with a reasonable adult age would do fine8:
[Test]
public void PersonAged45_IsAnAdult()
{
var person = new Person { Age = 45 };
Assert.IsTrue(person.Age >= 18 && person.Age < 65);
}
7. I’m mostly thinking of objects that are persisted in a database and where the database generates
a surrogate key. In such cases “equality” may become the subject of debate: Are the objects
equal if all their fields are equal, or are they equal if their “primary keys” are the same?
8. At the time of writing, Microsoft’s unit testing framework didn’t support custom constraint
assertions, so the tests in this section are written with NUnit.
Assertion Methods 95
But what if you wanted to make your test even more explicit? How about chang-
ing the test to this?
[Test]
public void PersonAged45_IsAnAdult()
{
var person = new Person { Age = 45 };
Assert.That(person, Aged.Adult);
}
When this test fails, it’s going to fail with a message like:
To get this rather detailed output, some work is required. First, we need a con-
straint based on the Constraint class.
}
return false;
}
}
In Java and JUnit 4, constructs with assertThat and matchers come out even
nicer. Because of static imports, the assertion would look like the following, given
that there was a Person object at hand:
assertThat(person, isAdult());
Fluent Assertions
Specialized assertions aren’t the most popular use of custom constraints. Most of us
actually start out by using fluent assertions. The fluency is achieved by switching the
Assertion Methods 97
order of arguments9 in the call to Assert.That style of assertions and the kind of
syntactic sugar we’ve seen so far. So
Assert.AreEqual(10, quantity);
becomes
Assert.That(quantity, Is.EqualTo(10));
Apart from increasing readability, which becomes evident when combining sev-
eral constraints, the fluent syntax produces better messages.
Assert.IsTrue("Hello World!".Contains("Worlds"));
fails with
Expected: True
But was: False
whereas
fails with
Different unit testing frameworks come with different fluent assertions. As the pre-
ceding example shows, they may contain some quite convenient features.
Tip
There are specialized fluent assertions libraries! In C#, extension methods provide a
very elegant way of implementing fluent assertions, which are utilized by the Fluent
Assertions library. In Java, AssertJ provides a set of custom assertThat methods
that return assertion objects with methods that can be chained to form fluent assertions.
9. To be precise, not all unit testing frameworks want the expected value as the first parameter
and the actual value as the second parameter to the assertion. In some, the order is the opposite,
and some don’t document any preference.
98 Chapter 7 Unit Testing
“Partial” Verification
A third area of use for custom constraints could be described as “partial” verification.
In an earlier example, a Person object was constructed by copying values from a
data transfer object (DTO). Then a GUID and a date were added. The test that veri-
fied the object had been constructed correctly coped with these two fields by using
rather loose assertions. The code is repeated here for convenience:
Assert.IsNotNull(newEntity.Id);
Assert.AreEqual("Adam", newEntity.FirstName);
Assert.AreEqual("Anderson", newEntity.LastName);
Assert.AreEqual(21, newEntity.Age);
Assert.AreEqual(DateTime.Now.ToShortDateString(),
newEntity.Created.ToShortDateString());
This code looks the way it does because there’s virtually no way to construct
a Person object that would be equal to the object created by the factory.10 The
GUID is “random” and there’s a time instance. On the other hand, these values may
not be very interesting from the perspective of the test. At least that’s what the test
indicates by just checking for a non-null GUID and performing coarse matching of
the creation time.
In such cases, a custom constraint might come in handy. Because we can’t make
persons equal (not in the sense of an equality method), we can at least try to make
them “similar.” The following test shows how to achieve that by ignoring the Id and
Created attributes in the comparison.
[Test]
public void AllValuesAreCopiedFromPersonDtoToNewEntity()
{
var personDto = new PersonDTO { FirstName = "Adam",
LastName = "Anderson", Age = 21};
Assert.That(PersonCreator.CreateEntity(personDto),
new IsSamePersonConstraint(expectedPerson));
}
The Matches method of the constraint is implemented the way one would
expect:
10. Of course, PersonCreator, the factory, could be “opened up” and its GUID and timestamp
functions controlled by the unit test in one way or another, but that’s not the point here.
Testing Exceptions 99
Testing Exceptions
Error conditions change the execution flow and must therefore be tested. Most lan-
guages used these days use exceptions to communicate that an error has occurred.
Not only has this the benefit of actually altering the flow of control so that there’s no
question whether the operation has succeeded or not, it also saves the developer from
clunky checks of return values, calling things like GetLastError, inspecting the
value of errno,11 or the like.
The generic way to test for an exception is:
[TestMethod]
public void OperationBlowsUpWithADramaticException()
{
try
{
DoSomethingThatBlowsUp();
11. GetLastError is a function in the Win32 API that returns the last-error code value on the
calling thread, whereas errno is a global variable or function used in UNIX C programs for
the same purpose.
100 Chapter 7 Unit Testing
Assert.Fail("Expected an exception");
}
catch (CrashBoomBangException e) { }
}
This is the oldest way of verifying that an exception has been thrown, and this
technique still has two benefits:
It’ll always work. Because nothing in the test uses any fancy features of the
unit testing framework, this technique can be applied in Java, C#, C++, Java-
Script, PHP, and Ruby (with slightly different keywords), to mention some
widely used languages.
It’s still the most flexible and intuitive way if you need to scrutinize the
caught exception, if you need to verify the exception message in a sophisti-
cated way, if you need to inspect a chain of nested exceptions, or if the excep-
tion carries some payload, like the offending object.
That said, this is the oldest way, and there are better options for most cases. Nowa-
days, frameworks come with annotations like @Test(expected=), [Expected-
Exception(...)], or @expectedExeption, which enable condensing tests of
exception code to something like this:
[TestMethod]
[ExpectedException(typeof(CrashBoomBangException))]
public void OperationBlowsUpWithADramaticException()
{
DoSomethingThatBlowsUp();
}
Because this book contains a lot of Java code, I feel obliged to mention that JUnit
has taken things in the right direction by introducing the ExpectedException
rule,12 which brings back the flexibility to do more advanced processing of the caught
exception (the second benefit of the generic approach). For example:
@Rule
public ExpectedException thrownException
= ExpectedException.none();
@Test
12. http://junit.org/apidocs/org/junit/rules/ExpectedException.html
Testing Exceptions 101
doSomethingThatBlowsUp();
}
This test not only verifies that the CrashBoomBangException has been
thrown, but also that the exception causing it is IllegalStateException and
that the exception message starts with a specific string. Because Hamcrest matchers
are used, arbitrarily sophisticated analysis of the exception is possible—something
that’s lost or limited when using an annotation.
Finally, languages that support higher-order functions offer yet another option.
In such languages you can pass a block of code that’s expected to fail with an excep-
tion to a function that will execute that block in a surrounding try-catch. This is what
the technique would look like if implemented by hand.
[TestMethod]
public void OperationBlowsUpWithADramaticException()
{
ExpectCrashBoomBang(() => DoSomethingThatBlowsUp());
}
13. At the time of writing, this assertion was in the alpha version of JUnit 5, so the final version
may differ somehow.
102 Chapter 7 Unit Testing
@Test
public void operationBlowsUpWithADramaticException() {
assertThrows(CrashBoomBangException.class, () ->
doSomethingThatBlowsUp());
}
Test Structure
BDD-style frameworks use a test structure that reminds the developer about focus-
ing on the behavior, rather than the details of the tested implementation. RSpec for
Ruby and Jasmine and Mocha for JavaScript do this by enclosing tests in a function
called it.
context "Bitcoin" do
# And digital currency here
end
end
Each context provides its own scope, and thus variables declared in differ-
ent contexts get different lifetimes relative to the tests. The order_to_pay vari-
able is created once and outlives the three payment method contexts and any tests
that would execute within them. Powerful as this may seem, I urge you to count to
12 before constructing a complex hierarchy of nested contexts with tests that depend
on different variable scopes. Not only is it easy to introduce temporal coupling in this
way, but such tests are hard to read and understand.
Test initializers also exist in BDD-style frameworks. They work quite similarly to
those of xUnit frameworks (per test method and per test class initialization). In addi-
tion, there are two caveats to keep in mind:
How does initialization/fixture setup interact with nested contexts?
Some frameworks provide more options for fixture initialization.14
Naming Tests
Using the it function encourages naming the tests in a certain way. Look at the test
name and the output of the framework to decide whether the name makes sense.
Because the name is just a string, it can contain whitespace and punctuation characters.
This isn’t the place to get too creative though. The test name should succinctly commu-
nicate the expected behavior, given the conditions that are specific to the test. If the name
becomes too long, we can consider using contexts to make them more concise.
Matchers
To make tests pass or fail, BDD-style frameworks use functions that are more verbose
and often read in a more natural way than assertion methods. To illustrate, I’ll revisit the
test of a simple utility function that just picks out the first names of the supplied persons.
14. RSpec, for example, provides two methods called subject and let, both of which in essence
evaluate a block and store the result between tests. subject is used to initialize the tested
object. This functionality is most useful when used implicitly, like in the coming magic wand
example, in which the magic wand becomes the subject. let may be used to change the context
of each test. This is a very superficial treatment of two quite powerful concepts, but the point is
that they can both compete with and complement initialization methods. This has the potential
to make the fixture setup very advanced and very complicated.
104 Chapter 7 Unit Testing
describe("NameUtils", function() {
describe("collectFirstNames()", function() {
it("creates a comma-separated list of first names", function()
{
var adam = new Person("Adam", "Anderson");
var brian = new Person("Brian", "Brown");
var cecil = new Person("Cecil", "Clark");
expect(NameUtils.collectFirstNames([adam, brian,
cecil])).toEqual("Adam,Brian,Cecil");
});
});
});
(a matcher) (a matcher)
(a matcher) (a matcher)
toBeFalse() be false
describe MagicWand do
it { is_expected.to be_doing_magic }
end
BDD-style frameworks aren’t that different from xUnit family frameworks, espe-
cially when it comes to unit test design and implementation. They’re built around a
different terminology and encourage thinking about behavior rather than implemen-
tation, but at the end of the day, they execute a comparison between an actual value
and an expected value.
Summary
Unit tests are created to
Allow scaling
Lead to better design
Enable change
Prevent regressions
106 Chapter 7 Unit Testing
If code can be unit tested, it can’t be too poor. Some bad constructs will simply
not make it into the codebase if unit tests are in place. Ultimately, if a feature isn’t
testable, it won’t be tested.
Defining unit tests isn’t uncontroversial. In this book, unit tests are fully auto-
mated, self-verifying, repeatable, consistent, and fast. They test a single logical con-
cept and run in isolation.
There are three common naming standards for test methods:
Equivalence Partitioning
Let’s say that you’re facing the daunting task of implementing an integer-based calcu-
lator—the kind of program one would write in an introductory programming class.
When it comes to checking that it works, is it meaningful to test whether it can com-
pute the sum of 5 + 5 if it computes the sums of 3 + 3 and 4 + 4 correctly? Or 10,000 +
20,000? Probably not, but why?
107
108 Chapter 8 Specification-based Testing Techniques
Males/females
Those aged 0 to 17, 18 to 28, 29 to 44, 45 to 69, and 70 to 110
Those whose national identification number is known
Those registered in the system before the year 2000 and those after
Prospects, regular, or premium customers
Those who pay with Visa, MasterCard, or PayPal
Those who have returned some merchandise and those who haven’t
There are pretty much endless possibilities, and it’s the specification and test sce-
nario that should guide the choice of relevant equivalence partitions.
Equivalence partitioning is a very helpful tool for the developer. Suppose we want
to ensure that a function that computes the risk premium for insured drivers works
1. Let’s not get academic and dig out some ancient 8-bit integer type. Let’s think 32-bit.
Equivalence Partitioning 109
Figure 8.1 Two ways of partitioning input to an integer calculator. Is there a way to
reach the partitions outside the range of the integer type? There could be, if the calculator
accepted its input as strings that would be converted to integers.
correctly. According to this function, young drivers run a higher risk of accidents, the
middle-aged have mastered driving, and older drivers tend to start getting involved
in accidents again. A simple version of this function could look like the following:
Armed with a new tool, we immediately see three valid partitions, hence three
tests. We also spot two partitions with illegal input. Ages below 18 and above, say,
100, don’t make any sense. Thus this particular function needs at least five tests.
Another benefit of this technique is that it allows us to think about input
visually, which hopefully lets us discover partitions that haven’t been covered by
tests yet. Sometimes drawing the input and partitions on paper or a whiteboard
really helps (see Figure 8.2).
Dividing data into equivalence partitions will only get us so far. In order to
achieve reliable test coverage, tests at the boundaries of the partitions are required.
110 Chapter 8 Specification-based Testing Techniques
Figure 8.2 Dividing input into equivalence partitions can sometimes be quite a visual
technique. Here, each partition has been illustrated with an avatar that could evolve into a
persona in other test cases.
Numbers
Finding boundary values for numbers is rather easy. If your input is valid for the
range m-n, check what happens at m − 1, m, n, and n + 1. In some cases, try m + 1,
and n − 1. Using 0 might or might not be a boundary value, but it’s usually a good
idea to investigate what happens around it.
For primitive integer types, it usually pays to look at what happens near the max-
imum represented by the data type, such as 231 − 1 or 263 − 1, and the minimum,
like −231 or −263, while remembering that the sign bit causes an asymmetry between
them. For certain types of programs, nasty bugs can be introduced because of integer
overflow.
Many languages have constants that represent minimum and maximum values
for their data types, for example, Integer.MAX _VALUE in Java and int.Max-
Value in C#. Use them or introduce your own with care if the language doesn’t have
them. Don’t put yourself in a position where you have to remember whether the max-
imum value of a signed 32-bit integer is 231 − 1 or 232 − 1.
In the case of floats, verify that a reasonable precision is used. A partition may
change as the precision of the floating point number is adjusted.
Strings
The empty string is an obvious edge case. It can usually be traced back to blank user
input or fixed-record file formats. It has a cousin called null2 that may be returned
from many standard libraries or functions in your legacy application. Irrespective of
your personal feeling about nulls, you have to be prepared for them. Whereas one
half of the code you’re working with may go to great lengths to avoid nulls by using
Null Objects and exceptions in creative ways, the other half, written by that other guy,
won’t exhibit this property and will throw nulls right in your face. Hence, add null
to your list of edge cases.
In languages where strings are allocated directly on the stack or stored in fixed-
size buffers on the heap, developers have to worry about memory corruption and buf-
fer overflow. In newer languages the developer doesn’t need to worry about strings
2. Or nil or undef.
112 Chapter 8 Specification-based Testing Techniques
overwriting part of the heap that belongs to another process, but checking around
maximum input length is still a good idea.
Strings, especially in Unicode, may contain all sorts of characters. But in an aver-
age system, the partition of allowed characters is rather small in comparison to the
entire Unicode character set. The challenge usually lies in the encoding. UTF-8, the
most widely used encoding on the Web,3 uses one byte to encode standard ASCII, but
may use up to four bytes when encoding less common Unicode characters. Make sure
that your parsing and string routines take this into account.
Collections
The empty collection is a common edge case worth checking (because you do use
empty collections and not nulls, right?). Too often, we encounter code that really
relies on a collection actually having one element, like this archetypal piece of older
code using Hibernate.
3. According to Wikipedia, 85 percent of all web pages were encoded using UTF-8 in 2015 (https://
en.wikipedia.org/wiki/UTF-8).
State Transition Testing 113
This code will fail miserably when there’s no customer with an id equal to
12345678. Also, there be dragons where developers balance on the edge between
fetching and iterating over a collection. A close cousin to the preceding code, the iter-
ation over a multivalued collection may at least have a fighting chance.
Constructs like the above aren’t really a problem if you check the sizes of your collec-
tions or just iterate over them (while being prepared for the fact that they may actu-
ally be empty), but even if you do this, then somebody else won’t have done it in the
legacy code that you’re maintaining. Ergo, paying extra attention to empty collections
and those with one element usually pays off. Iterations over collections may also suf-
fer from off-by-one errors if they rely on indexes and the collection’s size. All of this is
best summarized as 0-1-many.
For some more ideas, see Hendrickson, Lyndsay, and Emery (2006).
Figure 8.3 A simple installation wizard modeled as a state machine. In reality, there
would be more states before Installed.
exception of a disk space check performed by the installer. Actions are the result of
transitions. Again, in the example, most actions consist of showing a certain screen to
the user, except for the final action, in which files are copied and some configuration
is stored in the system.
Sometimes it may be helpful to rewrite the state transition diagram into a table.
Personally, I’ve always found the diagram more understandable, but for exhaustive
testing, a table might help.
State diagrams can be drawn at different levels of abstraction. That’s probably the
greatest strength of this technique. On one end, there’s the detailed diagram depict-
ing transitions between states in a regular expression matcher, where each encounter
with a letter is a state transition. On the other end is the huge business application
modeled with three states: logged in, working, and logged out. This flexibility trans-
lates directly to developer testing. Detailed, low-level state transitions fit nicely in unit
tests. A diagram will help determine what tests to write. Sometimes the number of
states and transitions will require using parameterized tests or theory tests (described
in Chapter 10, “Data-driven and Combinatorial Testing”) to avoid repetition. Coarse-
grained state diagrams help when developing high-level tests, like browser-based UI
tests, or just doing manual testing.
When working with state transition testing, we encounter the term switch cover-
age. 0-switch coverage refers to testing the individual transitions, 1-switch coverage
means that pairs of transitions are tested, and so on. Exercising various switch cover-
ages exhaustively may be very helpful in weeding out race conditions.
Decision Tables 115
Decision Tables
Let’s revisit the car insurance premium example one last time. This time the premium
is also affected by the driver’s gender. After all, statistics show women to be safer driv-
ers. In addition, certain combinations of age and gender trigger a fraud investigation
in the event of a claim.
To get an overview of these business rules, we can use decision tables, which cap-
ture all combinations of variables and possible outcomes.
Premium
N N N Y N N
factor 1
Premium
N N Y N N N
factor 1.05
Premium
N N N N N Y
factor 1.25
Premium
N N N N Y N
factor 1.35
Premium
N Y N N N N
factor 1.65
Premium
Y N N N N N
factor 1.75
Fraud
N N Y Y Y N
investigation
Premium
1.75 1.65 1.05 1 1.35 1.25
factor
Fraud
N N Y Y Y N
investigation
Why should developers care about decision tables? Obviously they can show gaps
or inconsistencies in business rules, but there’s another reason. Remember that the
different flavors of behavior-driven development emphasize shared understanding
and concrete examples. Tables, and among them decision tables, are a good format
for capturing such concrete examples. Hence, a good decision table, or parts thereof,
can be fed right into a tool like FitNesse, Concordion, or Cucumber as a first building
block of an automated acceptance test.
At the unit test level,4 turning the contents of a decision table into arguments to a
parameterized test is a good foundation for achieving exhaustive coverage of a busi-
ness rule.
Summary
Specification-based techniques are a great source of inspiration for developer tests. By
being aware that such techniques will constitute the first wave of testing, developers
can build software that is prepared to handle these tests. This increase in quality lets
testers engage in more high-value testing.
The key specification-based techniques to consider when developing software are
4. I recommend running parameterized tests at the unit level only, because of their execution
time. We don’t want a slow test running off a huge table of values.
Summary 117
Just as the name implies, specification-based techniques provide the fuel for
discussions about concrete examples when doing specification by example (pun
intended) or behavior-driven development.
This page intentionally left blank
Chapter 9
Dependencies
Developers who are new to unit testing and have just grasped its mere basics soon hit
a barrier. From their perspective, the systems they encounter bear no resemblance to
the examples in an introductory text or online tutorial on unit testing or test-driven
development. In my experience, this can be very demoralizing and lead to conclu-
sions like: “Our system can’t be tested” or “Unit testing/test-driven development only
works in green field projects.” There are numerous reasons for such beliefs, some
being a complex or botched architecture, inconsistent design, or simply code written
with everything but testability in mind. However, in the majority of cases, the prob-
lem is much simpler and is spelled dependencies. Different parts of a system depend
on each other in different ways, and the exact nature of these dependencies affects
testability.
A white box developer test—most often a unit test—exercises a very small part of
the system. It does this by creating the object it wants to exercise and calling meth-
ods on it. In object-oriented systems, the tested object will make use of other objects,
from now on called collaborators, to provide its services.1 Some collaborators are
heavyweight and deeply entrenched in the system; others are simple and provide very
narrow functionality. When dealing with either kind, we turn to test doubles, the
topic of Chapter 12, “Test Doubles,” but before skipping ahead, let’s look at different
kinds of dependencies and what challenges they present.
1. If the language isn’t object oriented, there will obviously be no objects and no collaborating
objects. However, a tested function will still call code from other modules or libraries. Such
dependencies will have to be dealt with within the constraints and functionality of the language
in question. Michael Feathers touches on this topic in Working Effectively with Legacy Code
(Feathers 2004).
119
120 Chapter 9 Dependencies
public Raffle()
{
tickets = new HashSet<int> { 3, 10, 6 };
}
}
Okay, I confess. This isn’t much of a raffle, but it’s my way of trying to make a
three-element set wrapped by another class appear exciting. An actual abstraction of
a raffle would most likely shuffle its tickets, assign prizes to them somehow, and do the
drawing. Here, I simplify all of this to just creating a fixed set of tickets and count-
ing them. The point here is to make the constructor create another object, and
thus rely on indirect input, to produce a class that’s small and yet hard to test. By
“hard to test,” I mean that there’s no way to write a unit test that would be able to
establish a relation between the object created in the constructor and the class’s public
interface—in this case the TicketCount property. So, although it’s plain to see that
three tickets are created, writing a test that would expect three tickets would be a bad
idea due to the nonexistent controllability.
In this example, there’s no obvious way to control the indirect input; the code
lacks a seam—a place in which the behavior of the code can be altered without editing
it (Feathers 2004). The bulk of making code testable is dealing with such constructs
in the most appropriate manner by adding seams at which dependencies can be bro-
ken. There are some generic ways of doing this, all of which can be applied to this
particular piece of code with varying degrees of success and complications. To gain
control of this dependency we need to make it explicit, which would involve one of
the following:
Let’s explore all three and learn what costs, benefits, and trade-offs each approach
brings to the table.
Passing in Collaborators
Making collaborators explicit by passing them around is the simplest and most obvi-
ous way to increase testability. The downside is the increase in complexity and some-
times decrease in intuitiveness, especially in trivial cases. In the current example,
instead of creating the set of tickets in the constructor, we can pass it as an argument.
Alternatively, it can be provided using a setter4 (property or method).
[TestMethod]
public void RaffleHasFiveTickets()
{
var testedRaffle = new Raffle
(new HashSet<int> { 1, 2, 3, 4, 5 });
Assert.AreEqual(5, testedRaffle.TicketCount);
}
4. One of my reviewers pointed out that this creates temporal coupling. This is usually not a
problem, unless you’re working with legacy spaghetti code, where it may be hard to find a good
spot for calling that setter.
122 Chapter 9 Dependencies
public Raffle()
{
tickets = CreateTickets();
}
The factory method would be made overridable so that any test code would be
able to provide its own implementation.
[TestClass]
public class RaffleWithFactoryMethodTest
{
[TestMethod]
public void RaffleHasFiveTickets()
{
var testedRaffle = new FiveTicketRaffle();
Assert.AreEqual(5, testedRaffle.TicketCount);
}
}
This approach often saves the day in legacy code, as it turns out to be a reasonable
trade-off between complexity and readability. In this case, though, it may become
catastrophic. Calling overridable methods from a constructor is bad practice because
such methods can easily reference uninitialized member variables and crash the
application by doing so. A static analysis tool would warn about this. That said, it’s
a fantastic example of constraints to think about when dealing with dependencies.
In classes with more functionality, this wouldn’t usually be a problem; the factory
method would be called after the object has been created.
Controversy Warning
Some people feel very strongly about any changes to code that are made solely to simplify
testing, such as changing the accessibility of some methods. In some cases, especially in
legacy code, this sometimes has to be done. Whenever I do this, I remind myself that the
code has two clients: the system that runs in production and the test code.
However, like everything else, this approach may be misused and lead to code
where everything is public or protected, which virtually makes access modifiers
meaningless.
[TestMethod]
public void RaffleHasFiveTickets()
{
var testedRaffle = new Raffle(new TicketsFactory(5));
Assert.AreEqual(5, testedRaffle.TicketCount);
}
Finally, for our tiny set of integers representing ticket numbers, employing the
Builder pattern would be way off target, but here’s what it would look like.5
5. This builder is slightly more elaborate than it needs to be. A minimal builder could have its
defaults set to starting at 1 and stopping at 5, but what’s the fun in using a builder if we’re just
going with the defaults?
System Resource Dependencies 125
[TestMethod]
public void RaffleHasFiveTickets()
{
var builder = new TicketsBuilder().StartingAt(1).EndingWith(5);
Raffle testedRaffle = new Raffle(builder);
Assert.AreEqual(5, testedRaffle.TicketCount);
}
Obviously, the small class with a three-element set didn’t improve from throw-
ing an external builder at it, so what designs do? Factories and builders are both cre-
ational patterns (Gamma et al. 1994). We normally turn to them when we need to
construct complex objects.
The previous examples have illustrated that the basic relation between two
objects can be handled in a number of ways. The solution will depend on the type
and complexity of the objects and their exact relation. In addition, this kind of depen-
dency will most likely be managed differently in new code, written with seams and
testability in mind, and legacy code.
Files
Nowadays not too many programs actually require direct access to files. Being Web
applications, mobile apps, or cloud friendly, they tend to fetch their data or configura-
tion in a different way. However, there are still lots of batch applications out there that
read and write raw files.
Consider the first lines of a method that parses a file containing some payment
transactions. Being written without testability in mind, it presents a tricky kind of file
dependency—a filename.
String line;
while ((line = reader.readLine()) != null) {
// Logic for parsing the file goes here...
This new abstraction may even hide the fact that there’s a file involved at all.
Although line is used as the abstraction (as in lines in a file) it could just as well be
changed to unparsed payment.
IOException {
return readFileContents(new FileInputStream(filename));
}
String line;
while ((line = reader.readLine()) != null) {
String[] values = line.split(";");
parsedPayments.add(new Payment(parseReference(values[0]),
parseAmount(values[1]),
parseDate(values[2])));
}
return parsedPayments;
}
The corresponding test would set up the file contents as a string and create a
stream from it:
@Test
public void parseLineIntoPayment() throws Exception {
String line = "912438784;1000.00;20151115\n";
A Newer Version
This solution looks roughly the same in any language that has an I/O stream
library, which is why I presented it here. Had this been a Java book, I’d have the
readFileContents method take a Stream<String> instead, and the test
would start with the following:
String line = "912438784;1000.00;20151115";
List payments = new PaymentFileReader()
.readFileContents(Arrays.stream(new String[]{line}));
128 Chapter 9 Dependencies
The standard way of dealing with this kind of dependency is introducing a sim-
ple “time source” that wraps the class that provides the time.
A test making use of such a time source would just set its date to match the date
of the payment. As always, there’s the price of complexity. Adding an interface and a
trivial implementation just to make code testable may increase the overall complex-
ity of the program. Depending on the implementation language and platform, there
may be other options. Ruby, for example, has several gems6 for controlling its primary
time abstraction, the Time class. In Java, testing of time-dependent code has finally
been simplified as of JDK 1.8 with the appearance of the abstract Clock class. The
purpose of this class is to make providing different clock implementations easy, and it
has been introduced with testing in mind. In the absence of such alternatives, intro-
ducing an abstraction for the time source is a simple technique, which will nearly
always work.
Layers present a twofold challenge to developer testing. The first problem is inter-
twining. For various reasons, often best summarized as technical debt, layers never
stack nicely on top of each other, as they would do in a design document.
Although a truly layered architecture enforces strict separation between the lay-
ers and dependencies in one direction, I’d say that in the majority of cases, such archi-
tectures tend to be more “flexible” and contain some bypasses (see Figure 9.1). Typical
examples are the circumvented business layer or the data access layer that knows the
workings of the presentation layer to the last bit.
For instance, consider the following data access method. Like pretty much every
single example in this book, this one is also “based on a true story.” In fact, it’s typical
legacy code, a decade old, and with more problems than just layer violations. What
kind of dependency is this?
Figure 9.1 The layers of a typical Web application: To the left, a textbook version. To the
right, something that resembles reality.
Dependencies between Layers 131
DbUtils.closeQuietly(rs);
DbUtils.closeQuietly(ps);
DbUtils.closeQuietly(conn);
}
}
Nasty, isn’t it? This old DAO knows that the customers will be presented in an
HTML list.
The second problem is that the quality of how layers are connected to each other
may vary greatly. Sometimes decoupling layers from each other will be a walk in the
park, and sometimes it’ll require extensive refactoring.
A way that I wholeheartedly recommend to save convoluted and fragile layered
designs is to start applying the Dependency Inversion Principle in conjunction with
conservative use of dependency injection. This is where dependency injection frame-
works come in handy. Such frameworks are put to best use when wiring together
components from different layers or even tiers (if the technology permits it). Although
dependency injection is a great pattern and the frameworks that support it are good
tools, they can be overused.
Figure 9.2 A layered version of “Hello World” implemented without and with
dependency inversion.
Summary
Various kinds of dependencies may make systems seem untestable. The trick is to rec-
ognize them and handle them in the right manner. This chapter speaks of four kinds
of dependencies:
Occasionally we end up writing a lot of tests that look strikingly similar. It almost
feels like we’ve turned a table containing inputs and expected outputs into identi-
cal test cases. In Chapter 8, “Specification-based Testing Techniques,” in the “Bound-
ary Value Analysis” section, there was an example of logic for computing a factor
that would determine the cost of car insurance premiums. It was a discontinuous
function, which means that thorough testing of it would involve several equivalence
classes and strict boundary values. Here’s the function again:
18–23 1.75
24–59 1.0
60+ 1.35
Given the importance age has on the final premium factor and the fact that the
devil is in the details, it would seem rather prudent to focus some tests on the bound-
aries of the age intervals. However, doing it with normal unit tests would just produce
a bunch of similar-looking examples and would quickly become repetitive and prone
to error.
To illustrate how this would play out, let’s revisit a slightly less trivial version of
the car insurance premium calculation engine. This time, it’s been extended to take
gender into account, but it still remains very simple:
135
136 Chapter 10 Data-driven and Combinatorial Testing
In this form, careful reading could provide enough confidence in the code. On
the other hand, most rule engines don’t come as 10-line methods, and their rules and
parameters tend to change. Assuming that the computed factor has a significant impact
on the final premium a customer would pay, off-by-one errors and simple arithmetic
miscalculations aren’t tolerated. Therefore, we would duly start by writing a test:
@Test
public void maleDriversAged18() {
assertEquals(1.75, new PremiumRuleEngine()
.getPremiumFactor(18, Gender.MALE), 0.0);
}
@Test
public void maleDriversAged23() {
assertEquals(1.75, new PremiumRuleEngine()
.getPremiumFactor(23, Gender.MALE), 0.0);
}
At this point, an observant reader may have noticed that the test names don’t
follow any of the naming conventions presented previously. Figuring that the tested
function only returns a floating point number with no special significance, I felt that
adding some expectation to the test name would feel contrived.
Actually, things got interesting already. When writing the second test, I stopped
for a second, thinking about whether it shouldn’t be something like this:
@Test
public void maleDriversAged23HaveTheSameFactorAsMaleDriversAged18() {
PremiumRuleEngine prl = new PremiumRuleEngine();
assertEquals(prl.getPremiumFactor(18, Gender.MALE),
prl.getPremiumFactor(23, Gender.MALE), 0.0);
}
This approach would have the superficial advantage of explicitly tying the two
factors together. Conversely, it could also lead to a cascade of bugs if the boundaries
were to change. In addition, it would hide the fact that the essence of the function is
to provide a numerical value.
Data-driven and Combinatorial Testing 137
Now, what about female drivers? They have a lower premium factor, which could
be expressed as yet another test, but would start to feel awkward because of the dupli-
cation and similar structure of the tests. Here, it could be tempting to dodge the “one
assert per test guideline” by grouping similar assertions into one test:
@Test
public void driversAged18() {
PremiumRuleEngine prl = new PremiumRuleEngine();
assertEquals(1.75, prl.getPremiumFactor(18, Gender.MALE), 0.0);
assertEquals(1.575, prl.getPremiumFactor(18, Gender.FEMALE), 0.0);
}
A better way of doing this—and this was done in times when unit testing frame-
works didn’t support parameterized tests—is to extract the code that’s common to
all test cases and let the tests contain only the different arguments and expectations:
@Test
public void maleDriversAged18() {
verifyPremiumFactor(1.75, 18, Gender.MALE);
}
@Test
public void maleDriversAged23() {
verifyPremiumFactor(1.75, 23, Gender.MALE);
}
@Test
public void femaleDriversAged18() {
verifyPremiumFactor(1.575, 18, Gender.FEMALE);
}
The invocation of the tested method is a one-liner, which makes this approach
overkill. However, the example illustrates the technique and applies equally to
cases where a bigger chunk of code is extracted into a parameterized method. This
technique can be used in practically any testing framework to achieve a degree of
parameterization.
138 Chapter 10 Data-driven and Combinatorial Testing
Parameterized Tests
Nowadays many unit testing frameworks come with support for parameterized tests
out of the box. Using Spock, a test that would cover 10 different premium factors
would look like this:
where:
age | gender || expectedPremiumFactor
18 | Gender.MALE || 1.75
23 | Gender.MALE || 1.75
24 | Gender.MALE || 1.0
59 | Gender.MALE || 1.0
60 | Gender.MALE || 1.35
18 | Gender.FEMALE || 1.575
23 | Gender.FEMALE || 1.575
24 | Gender.FEMALE || 0.9
59 | Gender.FEMALE || 0.9
60 | Gender.FEMALE || 1.215
}
This test works by expanding the table into 10 separate test instances (which is
made explicit through the @Unroll annotation). As illustrated, the values fed to
the test may be both primitive types and objects, and may be generated by arbitrary
Groovy constructs. The JUnit equivalent is much more verbose and clunky, which is
why I put it in the appendix.
NUnit’s implementation is also quite elegant. The unnamed parameters of the
TestCase attribute are fed directly to the method it annotates, and that method’s
return value is compared with the ExpectedResult parameter.
Theories
Parameterized tests are ideal when a bunch of inputs can be compared to a bunch
of known expected results. For example: 1 + 1 = 2, 2 + 3 = 5, 4 + 8 = 12, and so on.
The same was true for the premium factor computation, where it was quite easy to
determine the expected value. Thus, parameterized tests help in expressing tabular
examples in a compact way, but are constrained by the number of available examples
(rows in the parameter table).
Theories, on the other hand, offer a different approach. Instead of focusing on
parameters and expected results, they provide a way of verifying a statement about the
tested code (Saff & Boshernitsan 2006). This is extremely useful when the expected
result is unknown, hard to compute, or just irrelevant. In such cases, verifying a state-
ment, as opposed to an exact value, may be the most effective thing to do. Whereas
normal tests and parameterized tests rely on singular examples, theories express “for
all instances of . . .” type of reasoning.
So, how is the input determined? In reality, proving a theory on the entire input
domain can be time consuming and unnecessary. Doing exhaustive testing also
defeats the purpose of using equivalence classes. In practice, a theory test is executed
on a number of data points that represent interesting values for which proving the
theory is particularly important. It should be no surprise that boundary values make
good data points.
Running an unconstrained theory test on parameters from different input
domains is equivalent to verifying a statement on their Cartesian product.
Children from Europe, the United States, and Asia, blue-eyed, green-eyed, and
brown-eyed, both boys and girls, like candy.
This example talks about three inputs: countries (three of them), eye colors (three
as well) and genders (two). This theory would result in 3 × 3 × 2 = 18 verifications. It’s
unconstrained, because all combinations are tried. Conversely, expressing this seem-
ingly trivial test as a parameterized test would end up in a long and repetitive param-
eter table.
How would a theory test be applicable in the case of the premium factors? Let’s
assume that we want to verify that the premium factor always remains between 0.5
and 2.0 for a number of ages between 18 and 100 and for both genders. This would
be done by choosing some data points and running a theory test that matches all ages
with both genders and checks that the premium factor remains valid.
For example, if we sampled age at 18, 24, and 99 years, running a theory test
would result in the following combinations being checked:
140 Chapter 10 Data-driven and Combinatorial Testing
Gender Age
FEMALE 18
FEMALE 24
FEMALE 99
MALE 18
MALE 24
MALE 99
Both JUnit and NUnit support theory tests and both use the same nomenclature.
Theories rely on data points and use assumptions to establish conditions under which
the theory is relevant (i.e., to constrain input).
[Datapoints]
public int[] ages
= new int[]{17, 18, 19, 23, 24, 25,59, 60, 61, 100, 101};
[Theory]
public void PremiumFactorsAreBetween0_5and2_0(Gender gender, int age)
{
Assume.That(age, Is.GreaterThanOrEqualTo(18));
Assume.That(age, Is.LessThanOrEqualTo(100));
Assume.That(gender == Gender.Female || gender == Gender.Male);
var premiumFactor = new PremiumRuleEngine()
.GetPremiumFactor(age, gender);
Assert.That(premiumFactor, Is.InRange(0.5, 2.0));
}
}
Generative Testing 141
This example illustrates a theory that will be applied 18 times; there are nine valid
values for age and two genders.1 Sometimes not all combinations of data points make
sense, or we want to filter out input that’s irrelevant to the tested theory or handled in
a way that would break the test (i.e., the tested code throws an exception). This would
correspond to the case of the tested premium rule engine throwing exceptions if too
low or too high ages were supplied. By the same token, we don’t want to pass in null
or unknown genders to the tested algorithm.
Assumptions are used to achieve this kind of filtering. Syntactically they look
like assertions, but instead of failing a test, they just prevent it from running. Notice
that the data points in the example contain values that will immediately be filtered
out by the assumptions. This doesn’t make sense if there’s only one theory test that
runs against one set of data points, although it’s quite useful if different tests make
use of the same data. We could, for instance, write a negative theory test that would
“assume out” all valid ages and just run on the invalid ones and expect exceptions to
be thrown. Alternatively, assumptions also protect from combinations of parameters
that make no sense. Finally, one could also argue that stating the tested theory’s pre-
conditions as assumptions documents the test.
Assumptions are not unique to theory tests. They can be used whenever there’s
need to state a nonfailing precondition in a test.
Generative Testing
Theory tests are quite powerful. Still, they’re limited by the number of data points
and the way they’re selected. If bad data points are chosen, a theory test will do little
1. The code would translate directly to Java/JUnit if [Datapoints] were swapped for @DataPoints,
[Theory] for @Theory, and the class was annotated with @RunWith(Theories.class).
Adjusting Assume and Assert should be easy for the keen reader. If we’re willing to
implement our own annotations, we can get rid of @DataPoints altogether. An example of
this can be found in the appendix.
142 Chapter 10 Data-driven and Combinatorial Testing
good. Suppose that we want to verify that an encryption algorithm works correctly. If it’s a
symmetric algorithm,2 it can be verified by checking that decrypting encrypted plaintext
produces the plaintext again. Testing an algorithm like this by using a parameterized test
would require putting together a table of examples of interesting inputs.
Plaintext decrypt(encrypt(plaintext))
A A
BB BB
/()=^.-@%< /()=^.-@%<
Using a theory test would look more compact and mathematical, but would still
suffer from the limitations imposed by selecting a few samples.
Data points: empty string, a very long string, A, BB, CCC CCC CCC, Hello
World!, /()=^.-@%<
Theory: Given the data points, plaintext = decrypt(encrypt(plaintext)
In any case, when would we feel that we’ve provided enough samples to achieve
confidence in the algorithm? What are the equivalence classes and boundary values?
Does the mathematical nature of the algorithm require testing some inputs extra
carefully?
Besides parameterized tests and theory tests, there’s a third option: keep the the-
ory, but let the computer generate the data points. Tell it how many, using what con-
straints, and whether they should be generated deterministically (so that the test can
be repeated) or randomly (to cover different inputs for each test run).
@Test
public void encryptionRoundTrip() {
Generator<String> plainTextGenerator
= strings(integers(1, 128), characters());
2. Symmetric encryption algorithms use the same key to turn plaintext into ciphertext and vice
versa.
Generative Testing 143
In this example, a Java version3 of QuickCheck (Claessen & Hughes 2016) has
been used. In essence, this implementation of QuickCheck provides a simple way to
generate values, often randomized, in a convenient and controlled way. The test uses
a generator in conjunction with a loop to generate 100 random strings that will be
encrypted and decrypted.
A generator provides values in accordance with some rules, like minimum/
maximum length or size, range, or statistical distribution. The preceding test com-
bines three generators to produce randomized strings. The strings generator will
generate strings of the specified length using the supplied character generator. An
integers generator is used to produce a random value between 1 and 128, which
will determine each string’s length. The character generator will produce ran-
dom characters from the latin1 character set, unless configured differently. There are
many other generators in the library. There’s also another library called junit-quick-
check that extends JUnit theory tests with generator annotations.
Trying a similar approach on the premium rule engine example would make lit-
tle sense. After all, there are only roughly 80 interesting ages and two genders. Still,
this is what it would look like in NUnit, which has rudimentary support for data gen-
eration out of the box and can manage without extra libraries.
[Test]
public void PremiumFactorsAreBetween0_5and2_0(
[Values(Gender.Female, Gender.Male)] Gender gender,
[Random(18, 100, 100)] int age)
{
double premiumFactor = new PremiumRuleEngine()
.GetPremiumFactor(age, gender);
Assert.That(premiumFactor, Is.InRange(0.5, 2.0));
}
3. https://bitbucket.org/blob79/quickcheck
144 Chapter 10 Data-driven and Combinatorial Testing
something we don’t want, because we want to be able to rerun a test if it fails. On the
other hand, generative testing is a powerful technique, provided that we know how to
verify the results of tests that are based on generated values. Here are some strategies:
Combinatorial Testing
Until now, the assumption has been that executing tests in large numbers—in the
form of parameterized tests, theory tests, or with generated values—would be useful
and feasible. This would certainly be true for unit tests and a reasonable number of
test runs. Not all tests are unit tests, though! Some tests will remain manual, whereas
some tests written by developers may involve a slow resource, like a database, a file
system, or a network connection. In such cases the kind of close-to-exhaustive test-
ing presented so far in this chapter won’t work, which is why choosing which and how
many tests to run becomes the real issue.
To illustrate the point, let’s continue building on the premium rule engine exam-
ple and make it more realistic by having it take yearly mileage, car model, safety fea-
tures, and driving record into account. At this point the actual implementation isn’t
relevant. What’s relevant is the fact that bringing in more parameters increases the
complexity of the rule engine. To deal with this increased complexity, the new vari-
ables are divided into equivalence classes, just as the age was.
Yearly mileage
Only owner: 0 km
Sunday driver: 1–1000 km
Casual driver: 1001–3000 km
Car enthusiast: 3001–6000 km
Professional driver: 6001+ km
Safety features
For the sake of the example, they’re constrained to five classes:
No safety features
Airbag
Antilock Brake System (ABS)
Head Injury Protection (HIP)
Two or more safety features from the previous list
Car models
In a real application there would be hundreds; in this example only six:
Nissan
Volvo
Ferrari
Toyota
146 Chapter 10 Data-driven and Combinatorial Testing
Ford
Volkswagen
Driving record
Analyzing a driving record can be arbitrarily complex. Here, just a few simple
equivalence partitions are considered:
Model Driver (MD): no parking fines, no accidents, no other violations
Average Joe (AJ): 1–5 parking fines, no other violations, no accidents
Unlucky Ursula (UU): 1–2 parking fines, 1–2 accidents, no other violations
Bad Judgment Jed (BJJ): 1–2 parking fines, no accidents, drunk driving
Dangerous Dan (DD): >5 parking fines or >2 accidents or several cases of
drunk driving or any other car-related violation.
This slightly more realistic rule engine would produce quite a few test cases if we
went for total coverage:
Variable # values
Gender 2
Age interval 3
Yearly mileage 5
Safety features 5
Car model 6
Driving record 5
Single-mode Faults
A single-mode fault is a fancy name for a bug that occurs if a single variable’s state
isn’t handled correctly. In this context, it could mean that the rule engine froze when-
ever Volvos were fed to it, or it returned a negative value for drivers aged 75. To guard
Combinatorial Testing 147
against such faults, we need to ensure that every possible value is tried at least once.
This can be done by just listing all parameters and their values in a table. It’s usually
easier to put the ones with the largest number of possible values to the left.
Driving Safety
Car model record Mileage features Age Gender
Nissan No
MD 0 18–23 Male
features
Ford Two or
DD 6000+ – –
more
Volkswagen – – – – –
The table shows what combinations of parameters are needed to test for single-
mode faults (which is also called “achieving all singles”), and it says that six tests
are required to do it. Had there been no Volkswagens in the example, only five tests
would be required (because the last row only contains a value for the car model vari-
able—Volkswagen). This technique may seem painfully obvious, but tends to be for-
gotten in the heat of battle.
Double-mode Faults
Often enough a combination of two parameters triggers a bug. Not surprisingly, this
is called double-mode faults. Testing for double-mode faults is equivalent to testing
all pairs of values; hence the name of the technique is pairwise testing (Bolton 2007).
Finding all pairs is cumbersome in all but the simplest cases, and even a relatively
straightforward scenario, like the premium rule engine, would necessitate the help of
a computer. For instance, if we started out with Nissans as car models, we would need
to ensure that they were paired with all driving record types, mileage intervals, safety
features, and so on.
For a few variables with few values, finding all pairs can be done by hand. Let’s
look at this table made up of three binary variables, V1, V2, and V3:
148 Chapter 10 Data-driven and Combinatorial Testing
row V1 V2 V3
1 A X Q
2 A X R
3 A Y Q
4 A Y R
5 B X Q
6 B X R
7 B Y Q
8 B Y R
To find all pairs, let’s start from the top of the table and see how many rows can
be removed. The pairs in the first row, (A, X), (A, Q), and (X, Q), can be found in rows
2, 3, and 5, so row 1 can be deleted. Row 2 must remain, because there’s no other row
that contains the pair (A, X) once row 1 has been deleted. The pair (A, Y) is in both
rows 3 and 4, and (Y, Q) can be found in rows 3 and 7. However, (A, Q) only remains
in row 3 after row 1 has been dropped, so row 3 has to stay. Row 4 can be dropped.
(A, Y) has been kept in row 3, (A, R) in row 2, and (Y, R) can be found in row 8. Row
5 has to stay; after removing row 1, there’s no other row with (X, Q). Following this
procedure, rows 6 and 7 can be dropped. The final table looks like this:
row V1 V2 V3
2 A X R
3 A Y Q
5 B X Q
8 B Y R
This isn’t the only solution. If the same algorithm were applied from the bot-
tom of the table going up, different rows would remain (1, 4, 6, 7). Doing this exer-
cise by hand for a small table, like this one, quickly convinces us that performing
this for bigger tables (like the one for the extended premium rule engine) is a task
for the computer.
Writing a program to compute all pairs for larger tables is a fun exercise, but
if that isn’t what you want to spend your time doing, there’s both commercial and
Summary 149
free software that will do it for you. Two freely available programs are James Bach’s
pairwise.pl and ACTS from the National Institute of Standards and Technology.
Running these two tools on the updated premium rule engine reveals that somewhere
between 30 and 40 tests are needed to capture all pairs of variables. Compared to the
initial 4,500 tests, it makes quite a difference! Armed with yet another tool, we see
how valuable it is to be able to give parameterized tests a reasonably sized parameter
table or a theory test a manageable number of data points. In this light, finding all
singles and all pairs isn’t only a technique for keeping down the number of manual
tests, but also a technique for data selection in developer tests.
Summary
This chapter is about scenarios that require executing many tests. The first, more tool-
oriented part, talks about some features of the more mature unit testing frameworks.
Parameterized tests help when the tests are mainly about matching input values
with predefined expected values.
A theory is a statement about a property of the program. Theories answer the
question: “Given a function f(x), is property p true for some different values of x?”
Data points are used to provide the different values. Specialized libraries exist to sup-
ply generators that produce values for either theory tests or just normal unit tests. The
values can be randomized or deterministic.
The second part of the chapter describes what to do when not everything can be
tested. Single-mode faults occur when the handling of a single variable’s state fails.
Double-mode faults occur when a combination of two variables is handled incorrectly.
Pairwise testing is a technique for dealing with combinatorial explosions in sce-
narios where all combinations of several unrelated variables must be tested. In such
cases, testing only the unique pairs of variables tends to give a rather high payoff.
This page intentionally left blank
Chapter 11
Almost Unit Tests
Developers must do more than just write unit tests to ensure that their code indeed
works. In the first chapter, I mentioned several other activities, two of which were
to write integration tests and to automate tests in general. Such tests, which I’ll refer
to as “higher-level tests,” are discussed later in the book. In the meantime, it’s time to
visit a family of tests that shares some characteristics with unit tests and some with
higher-level tests, which tends to cause confusion and discussions among developers.
A common trait of such tests is that they’re not unit tests—at least not according to
the definition advocated by this book, but they execute just fast enough—in the range
of 1 to 2 seconds—to make it into the unit test suite. If it were up to me, I’d call them
“bastard tests,” the reason being that they look deceivingly simple and are fast, but
they’re often integration tests, or even system tests. In Google’s nomenclature,1 they’d
be typical “Medium” tests, although they’d execute far below the upper time bound-
ary, which is recommended to be 300 seconds for such tests. Therefore, I believe that
it’s only fair to call them fast medium tests. How do they make it into the unit test
suite? Here are some plausible reasons:
151
152 Chapter 11 Almost Unit Tests
Examples
The easiest way to get a feeling for what the tests I speak of look and feel like is to look
at some concrete examples. There are plenty to be found out there, and these are the
ones that have been popping up rather consistently in my projects over the years.
@Shared
private Connection conn
def setupSpec() {
Class.forName("org.hsqldb.jdbc.JDBCDriver")
conn = DriverManager.getConnection("jdbc:hsqldb:mem:db", "SA", "")
Sql.newInstance(conn).execute(
"CREATE TABLE users(id BIGINT IDENTITY, " +
"name VARCHAR(255), "+
"password_hash VARCHAR(255))")
}
expect:
new AuthenticationManager(conn).authenticate("joe", "secret")
}
Examples 153
This test assumes that whatever database is being used can be swapped out for
HSQLDB, a database that can run in memory only and is implemented in Java. This
is quite convenient if your code just relies on standard SQL statements without mak-
ing use of vendor-specific features and extensions.
Given that the authentication is complicated, this is quite a good test. It shows
that the AuthenticationManager class uses the database correctly and that
password hashing seems to work as expected. However, it’s not a unit test. It loads
classes, starts a database, and establishes a connection to it. At the time of writing, it
ran in less than a second.
[TestInitialize]
public void Setup()
{
smtpServer = SimpleSmtpServer.Start(25);
}
[TestCleanup]
public void TearDown()
{
smtpServer.Stop();
}
[TestMethod]
public void CompanyInformationIsPresentInEmail()
{
MailService testedService = new MailService("localhost");
testedService.SendMail(new MailAddress("[email protected]"),
"Dear customer", "We care!");
3. This example uses the Dumbster library, which is available in both Java and C#. See Appendix A,
“Tools and Libraries.”
154 Chapter 11 Almost Unit Tests
Assert.AreEqual(1, smtpServer.ReceivedEmailCount);
@BeforeClass
public static void setUpOnce() throws Exception {
server = new Server(8080);
final String pathToWarFile = "/tmp/myapp.war";
server.setHandler(new WebAppContext(pathToWarFile, "/webapp"));
server.start();
}
@Test
public void applicationIsUp() throws Exception {
HtmlPage mainPage = new WebClient()
.getPage("http://localhost:8080/webapp");
@AfterClass
public static void tearDownOnce() throws Exception {
server.stop();
}
This test is even “worse” than the previous two. Here an entire server is started,
a web application of arbitrary complexity contained in myapp.war is deployed, and
an HTTP request is made using HtmlUnit. On the other hand, these few lines of code
are sufficient to verify the deployment of an entire web application. In fact, it’s a great
Examples 155
test, but it’s just not a unit test. At the time of writing, this test took no more than two
seconds to execute.
@Rule
public WireMockRule wireMockRule = new WireMockRule()
given:
def notificationReceiver = new ContactInformation(
phoneNumber: '+1 202-555-0165', email: '[email protected]')
stubFor(post(urlMatching("/.*"))
.willReturn(aResponse().withStatus(200)));
stubFor(get(urlPathEqualTo("/quotes"))
.withHeader("Accept", equalTo("application/json"))
.withQueryParam("s", equalTo(monitoredStock))
.willReturn(aResponse()
.withStatus(200)
.withHeader("Content-Type", "application/json")
156 Chapter 11 Almost Unit Tests
when:
testedStockMonitor.pollMarket()
then:
verify(postRequestedFor(urlEqualTo("/alert"))
.withRequestBody(containing("[email protected]"))
.withRequestBody(containing(monitoredStock
+ " is cheap enough")))
}
This creation packs quite some power into relatively few lines of code by testing
the following: On behalf of a user identified by an e-mail address and a phone num-
ber, the stock monitor queries a service that provides a price quote that’s attractive
enough to trigger a notification. The service exposed as /alert is then invoked to
notify the user somehow. The WireMock library allows invoking two fake REST end-
points in less than one and a half seconds and provides constructs for both stubbing
and mocking. This test will work beautifully until the local firewall is reconfigured or
it’s executed in an environment that already runs another server on port 8080 (which
is the current default).
Impact
Any nontrivial application hosts a multitude of opportunities to create tests that run
almost as fast as unit tests but require a degree of environmental coupling. I hope that
the aforementioned examples have been inspiring and given you a sense of what such
a test may look like. The least common denominator of this chapter’s tests is that they
all start a server somehow. However, that server took relatively little time to start, so
waiting has been acceptable. Still, I’d argue that running such tests as unit tests is a
bad idea. Here are some reasons:
Slower developer feedback—These tests run fast, but they’re slower than unit
tests. Impatient developers, used to quick feedback, may stop running them
while writing code. This is not a good thing, especially if they abandon the
habit of using tests to get feedback about their code.
Summary 157
Running tests that are almost unit tests along with actual unit tests isn’t the end
of the world, but they do make the test suite slower, more brittle, and more sensitive to
environment settings. In time, such suites risk crumbling under their own weight as
they grow, and they’ll be abandoned eventually. That said, such tests are usually rela-
tively simple to write and they may give great bang for the buck, although I strongly
suggest that they be kept separate from tests that can never fail because of environ-
mental issues.
Summary
Some tests run almost as fast as unit tests but do things unit tests shouldn’t do and
pay the price by environmental dependence. Unless monitored and eventually moved
to another suite, they’ll devour the unit test suite and make it slow, sensitive to the
executing environment, and possibly brittle.
4. The first test takes one second because of initialization. The following nine tests don’t suffer
from this delay.
This page intentionally left blank
Chapter 12
Test Doubles
Stubs
The simplest and most generic test of an object that depends on another object looks
like what is shown in Figure 12.1.
This gives rise to an almost canonical test method:
[TestMethod]
public void CanonicalTest()
{
var tested = new TestedObject(new Collaborator());
Assert.AreEqual(?, tested.ComputeSomething());
}
1. For more in-depth descriptions and more rigorous definitions of the various types of test doubles,
see XUnit Test Patterns: Refactoring Test Code by Gerard Meszaros (2007). In this chapter, I try to
follow his nomenclature, but I do make some slight deviations and sometimes emphasize differ-
ent things.
159
160 Chapter 12 Test Doubles
Figure 12.1 1: The test code calls the tested object. 2: The tested object invokes its
collaborating object. 3: The collaborator performs a computation and returns a value.
4: The tested object uses that value and returns a result that can be derived from it.
This is the simplest case, where the tested object just calls its collaborator,
which returns some value. That value is in turn refined somehow and returned to
the calling test. From previous chapters we know that the value supplied by the
collaborator is called indirect input. To keep the example simple, this value is just
multiplied by a number.
To take control of a dependency like this, a stub is needed. The primary moti-
vation behind stubbing is to control the tested object’s indirect input. Because the
collaborator is injected in the constructor of the tested object, creating a stub is very
straightforward. All that’s needed is an implementation that returns a hard-coded
value.
This stub can now be used instead of the real object and the test can be rewritten
as follows:
[TestMethod]
public void CanonicalTestWithStub()
{
var tested = new TestedObject(new CollaboratorStub());
Assert.AreEqual(420, tested.ComputeSomething());
}
Stub Flexibility
A stub that returns a single value is the least complicated, but also the least intelligent
kind. Sooner or later, another test will require a different value to be returned. This is
a crossroads. From here one can either decide to implement a new stub that returns
another hard-coded value or to extend the existing one:
Once embarking on this journey, the possibilities are endless. For example, if a
test requires an exception to be thrown, a tiny if will save the day:
Fakes
In some cases, stubbing isn’t enough. The behavior that would be stubbed away is
required by the tested object. On the other hand, it comes with side effects and she-
nanigans that would break a unit test. In such cases, a fake object may be a reasonable
trade-off. Fake objects are lightweight implementations of collaborators, and their
primary purpose is to provide something that’s self-consistent from the perspective
of the caller.
In Figure 12.3, the tested object makes several calls to another object. These calls
not only affect its state, but also result in side effects. Afterward, the object expects a
Fakes 163
Figure 12.2 1: The test code calls the tested object. 2: The tested object invokes its
collaborator. 3: The collaborator performs an operation that results in one or more
unobservable side effects. 4: The tested object returns a value that’s relevant to the test,
but isn’t based on the interaction with the collaborating object.
Figure 12.3 1: The test code calls the tested object. 2, 4: The tested object invokes its
collaborator. 3, 5: The collaborator performs an operation that results in one or more
unobservable side effects. 6: The tested object queries the collaborator. 7, 8: The result of
that query is based on the internal state of the collaborator and is passed on to the caller of
the tested object.
nontrivial result that is somehow based on those calls. A typical example fitting this
structure would be a sequence of operations that first persist and manipulate data some-
how and then query it. In an average business application, it could be this type of code:
if (discount != null)
{
invoice.ApplyDiscount(discount);
}
return invoice;
}
making a purchase are persisted, a discount may optionally be applied. The Apply-
Discount method refreshes the invoice object based on the data in the database
and the supplied discount, and is thus equivalent to a query. Code like this usually
contains a lot of magic in a legacy system and makes a great candidate for faking. In
this example, purchaseFacade would be a fake implementation that would pro-
duce correct enough invoices, while avoiding persistence and all complicated busi-
ness rules that usually govern the creation of such entities.
Mock Objects
Stubs provide indirect input in a controlled manner. Fakes replace collaborators with
simpler self-consistent implementations. Given these, the missing piece of the puzzle
is the ability to verify indirect output. This is the purpose of a mock object, or more
commonly just “mock.”
Mock objects are game changers in a way, as they shift a test’s focus from state
to behavior. Tests that focus on state end with assertions that check return values or
somehow query the tested object’s state. They typically look like this:
assertEquals(expectedValue, tested.computeSomething())
or
Assert.AreEqual(expectedValue, tested.Value).
Figure 12.4 1: The test code calls the tested object. 2: The tested object invokes its
collaborator, which may not return anything or possibly just produce a side effect.
This scenario illustrates the primary use of mock objects: verification of interac-
tions. The simplest case is determining whether an interaction actually has occurred.
A typical interaction test verifies the arguments to the mock objects to some degree,
whereas less typical tests may focus on counting the number of times the interaction
has happened.
Let’s assume that we’re modeling a shopping workflow, the kind that you go
through when buying things online: You pick the items you want to buy, identify
yourself, and finally you apply a discount code (if you have one) before checking out.
In code, this sequence could be implemented like so:
Now, suppose that we want to test how this purchase flow interacts with the
object that represents a campaign. 2 We want to make sure that the campaign’s
applyDiscount method is indeed invoked and that its arguments are correct.
Thus, a test using a mock object instead of a real campaign object verifies the indi-
rect output of the PurchaseWorkflow class when applying a campaign discount.
The indirect output can be verified with a different amount of rigor, which will
be illustrated by three mock objects that become more and more elaborate. In this
chapter, these mock objects are implemented “by hand” to illustrate that there’s noth-
ing magic about interaction testing and that mocking frameworks aren’t mandatory.
2. The campaign object implements a simple interface, Campaign, which contains one
method—applyDiscount. When implemented, it’s responsible for modifying the price of
purchased items and updating the customer’s bonus points. In the preceding code snippet, the
name Books10PercentOffCampaign suggests that the campaign applies a discount to
any purchased books.
166 Chapter 12 Test Doubles
@Test
public void useLenientMock() {
LenientMock campaignMock = new LenientMock();
new PurchaseWorkflow(campaignMock)
.addItem(getBookByTitle("Developer Testing"))
.usingExistingCustomer(1234567)
.enterDiscountCode("DEAL");
campaignMock.verify();
}
The corresponding mock object confirms the interaction without caring about
the parameters passed to applyDiscount. Note that the verify method contains
an assertion! This is the mock object’s way of telling that it knows what to verify.
@Override
public void applyDiscount(Long customerNumber,
String discountCode,
Purchase purchase) {
wasInvoked = true;
}
Scenario 2—Here we want to verify that the interaction takes place and that the
indirect output of PurchaseWorkflow is within reasonable bounds.
@Test
public void useAverageMock() {
Purchase expectedPurchase
Mock Objects 167
= new Purchase(getBookByTitle("Refactoring"));
AverageMock campaignMock = new AverageMock(expectedPurchase);
new PurchaseWorkflow(campaignMock)
.addItem(getBookByTitle("Refactoring"))
.usingExistingCustomer(1234567)
.enterDiscountCode("WEEKEND DEAL");
campaignMock.verify();
}
The mock object verifies that the customer number is positive at least, that
the campaign code is propagated, and that the workflow actually adds items to the
purchase.
@Override
public void applyDiscount(long customerNumber, String discountCode,
Purchase purchase) {
assertThat(customerNumber, greaterThan(0L));
assertEquals("WEEKEND DEAL", discountCode);
assertEquals(expectedPurchase, purchase);
wasInvoked = true;
}
Scenario 3—The final test performs rather rigorous checks on the parameters
passed to applyDiscount and it counts the number of invocations.
@Test
public void useDemandingMock() {
DemandingMock campaignMock = new DemandingMock();
new PurchaseWorkflow(campaignMock)
.usingExistingCustomer(12345678)
.addItem(getTraining("TDD 101"))
168 Chapter 12 Test Doubles
This last mock has very precise expectations: applyDiscount should have
been called exactly three times with customer numbers in the range [1000000,
9999999], the discount code matching a regular expression, and the purchase being
approved by a custom argument matcher.3
@Override
public void applyDiscount(long customerNumber, String discountCode,
Purchase purchase) {
assertThat(customerNumber,
allOf(greaterThanOrEqualTo(1000000L),
lessThanOrEqualTo(9999999L)));
assertTrue(discountCode.matches("DISCOUNT_\\d{3,10}[X-Z]?"));
assertThat(purchase, new PremiumPurchaseMatcher());
timesInvoked++;
}
Do these mock objects make sense? How useful are they? It depends. The mock
from the first scenario just verifies whether applyDiscount has been invoked.
This is a pure interaction test. If you trust everything else, this might suffice. The
second mock adds some basic sanity checks. This makes sense if many things happen
in the tested object before it produces its indirect output or if the quality of the code
is low and you want to be extra defensive in your test. However, a test using this mock
no longer only fails if the interaction doesn’t happen, but may also fail for many other
reasons. Finally, the third mock starts applying business rules to its verification, like
the format of the customer number and discount code, and the composition of the
purchase. Verification like this leads to brittle tests and usually indicates problems
with the tested code or other tests. In this particular example, if the format of the
customer number and discount code really were that important, then they probably
would deserve their own classes. The last matcher would probably only be useful if
the goal of the test was to verify indirect input supplied by another collaborator.
When using mock objects, it’s very tempting to verify as much as possible and
as strictly as possible. The general rule of thumb for maintainable tests is: Don’t. Or
rather, understand the trade-off between strict and thorough verification and the
test’s sensitivity to changes to the code. This topic will be covered in greater detail in
the next chapter.
if (displayMode == DisplayMode.CELSIUS) {
Figure 12.5 1: The test code calls the tested object. 2–3: The tested object invokes a
collaborator (a test double or the actual implementation), which returns a value. 4: The
value is processed somehow by the tested object. 5: The value returned by the other
collaborator and processed by the tested object is used as a parameter when calling the
collaborator that’s of interest to the test.
170 Chapter 12 Test Doubles
display.output(formatForDisplay(temperature));
} else {
display.output(formatForDisplay(
celsiusToFahrenheit(temperature)));
}
}
Spies
The distinction between spies and mock objects is quite academic, in my opinion.
Whereas mock objects are implemented so that they fail a test if their expectations
aren’t met (I put various asserts in the mock objects in the previous section to empha-
size this), spies capture their interactions and the associated parameters for later use.
Mock objects are, in fact, spies too, because they record the behavior of the program
element involved in the interaction (Martin 2014). However, the difference is that the
mock itself uses the captured values to determine whether the interaction happened
correctly, whereas the spy leaves this decision to the test. As we’ll see in the next chap-
ter, this doesn’t necessarily apply to mocks created by a mocking framework. Spies
constructed dynamically by frameworks get less coupled to the tested code than do
mock objects, which reduces the likelihood of making tests brittle.
Time for an example. If the test making use of the “average” mock object were
rewritten to use a spy instead, it would look like this:
Dummies 171
@Test
public void demonstrateSpy() {
Purchase expectedPurchase
= new Purchase(getBookByTitle("Refactoring"));
CampaignSpy campaignSpy = new CampaignSpy();
new PurchaseWorkflow(campaignSpy)
.addItem(Inventory.getBookByTitle("Refactoring"))
.usingExistingCustomer(1234567)
.enterDiscountCode("WEEKEND DEAL");
assertThat(campaignSpy.customerNumber, greaterThan(0L));
assertEquals("WEEKEND DEAL", campaignSpy.discountCode);
assertEquals(expectedPurchase, campaignSpy.purchase);
}
@Override
public void applyDiscount(long customerNumber,
String discountCode,
Purchase purchase) {
this.customerNumber = customerNumber;
this.discountCode = discountCode;
this.purchase = purchase;
}
}
The test looks strikingly similar to its mock counterpart, except for the place-
ment of the assertions. In cases where I can’t use a framework to create a mock object
and I have to craft it by hand, I always resort to the spy-based approach, the reason
being that it allows me to keep the assertions in the test.
Dummies
Dummy is the final term in the test double nomenclature. Dummies are values you
don’t care about from the perspective of the test. They’re typically passed as argu-
ments, although they can be injected or referenced statically at times. There’s little
science around dummies, but I’d like to point out two things about them. First,
172 Chapter 12 Test Doubles
naming them appropriately often helps. If a test is all but trivial, its readability isn’t
increased by the presence of nulls, zeroes, or empty strings. It might be a matter of
taste, but personally I prefer:
[TestMethod, ExpectedException(typeof(ArgumentOutOfRangeException))]
public void ShouldFailForTooYoungCustomers()
{
int age = 10;
string ignoredFirstName = "";
string ignoredLastName = "";
CustomerVerifier.Verify(age, ignoredFirstName,
ignoredLastName);
}
// Method continues...
// Do something with the name parameters
. . . to the version following, or something similar with nulls instead of the empty
strings.
[TestMethod, ExpectedException(typeof(ArgumentOutOfRangeException))]
public void ShouldFailForTooYoungCustomers()
{
CustomerVerifier.Verify(10, "", "");
}
There is, of course, a middle ground, but it only works for strings:
Although one can guess that nulls and simple default values indicate a dummy, I
still think it’s worthwhile to highlight what’s not important. This is a matter of pro-
gramming language. If the language supports named arguments somehow, naming
dummies is less of an issue. Because the example happens to be C#, which happens to
support named arguments . . .
Verify State or Behavior? 173
Second, if you feel that you’re using too many dummies and that it doesn’t feel
right, then your instincts probably serve you well. Overuse of dummies often indi-
cates that the tested code probably does too much or that the test verifies something
irrelevant, most likely the former.
State Verification
State verification is employed when the final outcome of interacting with the object
of the test is best observed by examining a value or a data structure produced by that
object. A state-based test performs one or more operations on the target object and
then queries it and possibly some of its collaborators to assess whether the outcome
of the operations was correct. In an object-oriented environment, the simplest case of
state verification is invoking a mutator followed by an accessor (which many would
consider “too simple to break”), whereupon the result is fed to an assertion.
given:
Car testedCar = new Car()
when:
testedCar.setSpeed(40)
then:
testedCar.getSpeed() == 40
Apart from confirming that the tested car has the ability to accelerate to 40 mph
instantaneously (hmm), this example also shows that the speed is stored in the tested
object and is thus part of its state. If such a state is made up of many variables’ values,
a state-based test may easily fall victim to checking too many seemingly unrelated
values or digging too deeply into the tested object. Consider this:
given:
Car testedCar = new Car()
when:
174 Chapter 12 Test Doubles
testedCar.setSpeed(40)
then:
testedCar.getSpeed() == 40
testedCar.getGear() == 2
testedCar.getTachometer().getValue() == 2000
Behavior Verification
When the expected outcome of an operation cannot be observed by querying the
object of the test, behavior verification is used instead. Most often behavior verifi-
cation is synonymous with using mock objects verifying interactions. At times, the
tested object may store much of its state in collaborating objects. In such cases, what
would normally be a state-based test can turn into a test of behavior.
Behavior verification feels most natural when the tested object exposes no state;
nothing is returned and few or no methods expose whatever state it may have. This
is usually true for code that contains many command type of calls (as in Command-
Query Separation; Fowler 2005). Hence, interaction tests are often encountered in
larger systems made up of several layers, where some of the layers contain little logic
or state and are mostly responsible for orchestrating calls to other layers and compo-
nents, like in this BillingService class:
4. Although the framework used in this example, Spock, uses conditions rather than assertions.
Verify State or Behavior? 175
return invoice;
}
Unit-testing this method would amount to making sure that it indeed manages
to call SendInvoice for the correct customer with an invoice that reflects the sup-
plied products.
The Arguments
Those who argue against behavior testing will have a point when they say that such
tests won’t detect algorithmic errors. After all, checking that an algorithm has been
called with certain parameters doesn’t guarantee that it’s been implemented correctly.
The SendInvoice method in the last example could be completely wrong. It could
send the invoice to a print shop using some batch file transfer mechanism instead
of e-mailing it to the customer. If mailService were a mock object, this blunder
would pass unnoticed.
Another case against behavior testing is about the tests knowing too much about
the internals of the tested code, that is, being too tightly coupled. After all, if interac-
tions are to be verified, the tests need to know about them. Should some of the inter-
actions change, the tests will break. This argument is similar to that of poking too
extensively into the internal representation of an object. It, too, may change. A way
of making behavior-based tests more stable is to keep them coarse-grained. The test
wants to know that mailService was indeed called, but it doesn’t have to dissect
the invoice passed to SendInvoice and verify that it’s correct to the last bit.
176 Chapter 12 Test Doubles
Testing Behavior
The phrase “testing behavior” doesn’t always refer to verifying interactions using a
mock object. Instead, it refers to testing the actual behavior of a program element,
which was defined as “the outcome produced by its functionality under certain
preconditions” in an earlier chapter.
From this it follows that a program element’s behavior may be to return something
suitable for state verification, or to perform a number of invocations, which would be
tested by verifying interactions. Phew.
Summary
Different kinds of test doubles are used when dealing with dependencies in unit tests:
Stubs are used to control indirect input and sometimes to get rid of side
effects.
Fakes provide self-consistent implementations of collaborators, which in prac-
tice means that they’re lightweight implementations.
Mock objects are used to verify indirect output and occasionally indirect input
from other collaborators.
Spies record the interactions and their parameters for later checking.
Dummies are values that are irrelevant to the test—usually arguments.
The discussion of stubs, fakes, and mocks brings into the foreground the dis-
tinction between state and behavior testing. State verification is about querying the
tested object’s (and possibly its collaborators’) state after having invoked some of its
operations. Behavior verification is about checking whether a certain interaction has
occurred between a mock object and the tested object or other collaborators.
State-based tests are good for finding algorithmic errors, but they run the risk
of being too invasive. Behavior-based tests won’t find any algorithmic errors and
are vulnerable to being too coupled to the implementation. Both types of tests can
become brittle. State-based tests may look at too much state or dig too deeply into an
object, whereas behavior-based tests may be too strict when verifying the interaction.
Chapter 13
Mocking Frameworks
Today the number of cases in which we want to implement test doubles by hand is
quite limited. Mocking frameworks have been evolving for several years and have
reached full maturity by now. At the time of writing, they’ve gone through genera-
tions of evolution and have reached a point where they offer very rich functionality
and truly simplify many aspects of interaction testing. A case in point is the fact that
mocking frameworks, despite their names, not only construct mock objects, but also
stubs and spies.1 My experience is that this versatility often leads to confusing tests,
where the role of the test double is ambiguous and unclear. That’s why I emphasize
the type of test double wherever possible. To avoid this confusion altogether, some
people prefer the term isolation frameworks, because the name carries with it the
promise that the frameworks may create different kinds of test doubles that isolate
the code under test from its collaborators.
Frills aside, mocking frameworks provide three fundamental kinds of operations:
1. Although the mocking framework may use the term spy differently from how it was described
in the previous chapter.
177
178 Chapter 13 Mocking Frameworks
The two mocking frameworks used most prevalently in the examples in this sec-
tion don’t even distinguish between stubs and mocks during the construction stage.
This first example is based on Moq for C#. Using Java’s Mockito, the construction
would be almost identical.
A framework that does make the distinction between stubs and mocks is Spock,
but the syntax is still quite similar:
That’s it, if all we need is a stub that returns a simple default value, like 0 for
numerical types and null for objects. The previous one-liners indeed produce work-
ing stubs, and at this point only the variable name offers a clue about the type of
test double.
Test double creation has advanced way beyond simple proxying and has, with
time, been sugarcoated with extra features like
The list of nifty features varies among frameworks and changes and evolves con-
stantly. Spend some time reading your favorite framework’s documentation.2
2. This book doesn’t contain any details about the mocking frameworks it uses. I don’t want the
contents to become obsolete because of some latest and greatest API changes.
Setting Expectations 179
Once a test double has been created, we need to decide whether we’ll use it as a
mock, a stub, or both. The third option isn’t encountered that frequently, because it
implies that the test double will serve as a provider of indirect input and as an observer
of indirect output/interactions at the same time. In the majority of cases, this is some-
thing you don’t want, though you may find yourself doing this when testing legacy
code (which, unconstrained by things like Command-Query Separation or the Single
Responsibility Principle, may pile interaction on interaction in long sequences).
Setting Expectations
An expectation is a statement that tells the test double how to respond to an invoca-
tion. Historically, setting up expectations was a crucial step in configuring a mock.
Older mocking frameworks relied on first setting up, or “recording,” a number of
expectations, then having the test interact with the mock, and finally verifying that
the expectations were fulfilled. Creating true mock objects, they immediately failed
the test if they encountered an interaction with the mock that didn’t match any
expectations. Reusing one of the examples (the one using the “average” mock) from
the previous chapter, an interaction test using a true mocking framework (jMock)
would look like this:
@Test
public void discountCodeIsAppliedInThePurchaseWorkflow() {
final Campaign campaignMock = context.mock(Campaign.class);
final Purchase expectedPurchase
= new Purchase(getBookByTitle("Refactoring"));
context.checking(new Expectations() {{
oneOf(campaignMock).applyDiscount(
with(greaterThan(0L)),
with(equal("WEEKEND DEAL")),
with(equal(expectedPurchase)));
}});
new PurchaseWorkflow(campaignMock)
.addItem(getBookByTitle("Refactoring"))
.usingExistingCustomer(1234567)
.enterDiscountCode("WEEKEND DEAL");
context.assertIsSatisfied();
}
180 Chapter 13 Mocking Frameworks
This test would fail during the call to applyDiscount if the expectation weren’t
satisfied, that is, the parameters didn’t match, and it would fail during the verification
phase (context.assertIsSatisfied()) if the method wasn’t called at all.
Mockito, Moq, and Spock all construct mocks that behave like spies (or nice
mocks); that is, they just record the interactions and let them happen, which means
that no predefined expectations are required. Instead, the interactions are verified at
the end of the test. This will be apparent in the upcoming examples.
Stubbing
Expectations are typically associated with mocks, but I’ll be using the word in a
broader sense, which will allow me to speak of expectations as a means of configur-
ing stubs. Being just a proxy created on the fly, a stub without any expectations only
returns default values—in practice zero—for methods that return a primitive numer-
ical data type, and nulls for methods that return objects (and maybe even empty
collections for methods that return collections). Invocations of methods that don’t
return anything will just pass through. To become more usable, the stub needs to be
told how to behave, which is equivalent to implementing logic in a hand-coded stub.
Methods that don’t take any arguments are quite easy to set up. Using Moq, the
setup would look like the following:
when(dependencyStub.computeAndReturnValue()).thenReturn(10);
These expectations tell the stubs to return a fixed value and correspond to just
implementing a single line method with a return statement.
Setting Expectations 181
When the method to be stubbed takes one or more arguments, we have to start
thinking about what to do with them. Consider a single argument method. If “hand-
crafted,” it would look like this:
Mockito has its own methods for matching primitive data types and a simple
interface to Hamcrest matchers for more demanding cases. This gives developers the
freedom to implement any predicates they need.
when(dependencyStub.computeAndReturnValue(anyInt())).thenReturn(10);
Spock has the interesting capability to match any argument (or arguments), with-
out even caring about the type, which is quite powerful when you just want to get
your stub up and returning values.
dependencyStub.computeAndReturnValue(_) >> 10
The most common way of matching arguments is using the equals method. It’s
like saying: “if the method is called with an argument that is this value, then return
that value.” In fact, it’s so common that it’s implicit (in the frameworks used here at
least). To achieve the equivalent of
3. It’s the same mechanism as that used by the AssertThat method and has been covered in
Chapter 7, “Unit Testing.”
182 Chapter 13 Mocking Frameworks
return arg == 42 ? 10 : 0;
}
no argument matcher is needed—the expectation is set up using the exact value. Note
that zero, expected in all cases except when arg is 42, will be returned as a result of
the stub’s default behavior.
class Banana {
public String color = "yellow";
}
interface Monkey {
boolean likes(Banana banana);
}
@Test
public void monkeysLikeBananas() {
Monkey monkeyStub = mock(Monkey.class);
when(monkeyStub.likes(new Banana())).thenReturn(true);
assertTrue(monkeyStub.likes(new Banana()));
}
Sometimes—although less often than you might think—you need the stub to
return different values on consecutive calls. Mockito lets you stack thenReturn
directly:
when(dependencyStub.computeAndReturnValue(42))
.thenReturn(10).thenReturn(99);
Moq needs you to swap Setup for SetupSequence to allow this kind of
stacking. When using Spock’s stubbing facilities, you just need to specify a list of val-
ues to return.
Last, but not least, you’d want stubs to throw exceptions to allow you to verify
your waterproof error handling, right? Using Mockito’s short-hand syntax, a stub
that would throw an exception would be set up like so:
Dependency dependencyStub =
when(mock(Dependency.class).computeAndReturnValue(42))
.thenThrow(new IllegalArgumentException("42 isn't the answer!"))
.getMock();
Moq’s syntax resembles Mockito’s original syntax (which you can find in the
documentation).
These are the basics of setting up expectations. One can construct infinitely com-
plex custom constraints/matchers to set up stubs that reply very intelligently to a vari-
ety of invocations. However, just like with stubs implemented by hand (which were
discussed in the previous chapter), keep it simple. Overly intelligent stubs are a sign
of danger.
Verifying Interactions
The main purpose of a mock object is to verify interactions. A fundamental building
block of all mocking frameworks is a verify operation. Whereas a test that focuses on
state will end with an assertion method, a test that revolves around a mock object will
end with a verification.
Verifications also use constraints or matchers to decide whether the parameters
passed to the mock’s method are correct enough to qualify the invocation as a suc-
cessful interaction. Because matchers have been covered already, we’ll go straight
on to examples and revisit the discount scenarios from the previous chapter. Let’s
184 Chapter 13 Mocking Frameworks
see how they would be implemented using the three mock frameworks presented in
this chapter.
Scenario 1—Here we just want to verify that the PurchaseWorkflow class
indeed calls a campaign’s applyDiscount method. Mockito is used in this exam-
ple, whereas Moq and Spock equivalents have been put in Appendix B, “Source Code.”
@Test
public void useLenientMock() {
Campaign campaignMock = mock(Campaign.class);
new PurchaseWorkflow(campaignMock)
.addItem(getBookByTitle("Developer Testing"))
.usingExistingCustomer(1234567)
.enterDiscountCode("DEAL");
verify(campaignMock).applyDiscount(anyLong(),
anyString(), any(Purchase.class));
}
Scenario 2—Here we want to verify that the interaction takes place and that the
indirect output of PurchaseWorkflow is within reasonable bounds. This time, it’s
Moq’s time to shine, and Mockito and Spock have been deferred to Appendix B.
[TestMethod]
public void UseAverageMock() {
var campaignMock = new Mock<ICampaign>();
Purchase expectedPurchase = new Purchase(
Inventory.GetBookByTitle("Refactoring"));
new PurchaseWorkflow(campaignMock.Object)
.AddItem(Inventory.GetBookByTitle("Refactoring"))
.UsingExistingCustomer(1234567)
.EnterDiscountCode("WEEKEND DEAL");
Scenario 3—This last test performs rather rigorous checks on the parameters to
applyDiscount and it counts the number of invocations. Spock is used to demon-
strate this scenario (Moq and Mockito are in Appendix B yet again).5
when:
new PurchaseWorkflow(campaignMock)
.usingExistingCustomer(1234567)
.addItem(getTraining("TDD for dummies (5 days)"))
.addItem(getBookByTitle("TDD from scratch"))
.enterDiscountCode("DISCOUNT_123X")
.enterDiscountCode("DISCOUNT_234Y")
.enterDiscountCode("DISCOUNT_999Z");
then:
3 * campaignMock.applyDiscount(
{ it >= 1000000L && it <= 9999999L },
{ it =~ "DISCOUNT_\\d{3,10}[X-Z]?" },
{ it.getPrice() > 1000 && it.getItemCount() < 5 })
}
These examples show some capabilities of modern (at the time of writing) mock-
ing frameworks and illustrate how similar they are. Many features have been left
out, especially the framework-specific gold plating. Spend time getting to know your
framework! Once you’ve done that, read the next section that talks about misuse,
overuse, and other pitfalls.
Oververifying
Every time a verification is executed on a mock object, the test gets coupled to the
internal implementation of a program element, thus becoming sensitive to changes
and refactorings of that program element. Instead of remaining green during refac-
toring and acting as a safety net, it will turn red and break for seemingly mysteri-
ous reasons. This is, in fact, an argument in favor of spy-like or nice mocks. By not
expecting every single detail about every single interaction, they make the tests less
coupled to the internals of the tested code, and thus less sensitive to it changing. On
the whole, verification of interactions had better be kept coarse grained.
Just as keeping the number of assertions down in a state-based test is usually a
good thing, the same goes for verify statements. As a rule of thumb, a test involving
a mock object should verify only one interaction, and that should be the focal point
of the test. Thus, if it breaks, it will be obvious that something vital and important
has stopped working. A corollary to this is that the test should employ as few mocks
as possible—preferably only one. However, just as multiple assertions may verify one
logical concept, so can verifying multiple interactions. Tests of typical orchestration
methods will most likely need both several mocks and multiple verifications. Over-
verification comes in several forms, as the following sections explain.
The same goes for mock tests that verify too many interactions. When doing this,
they get coupled to multiple program elements, which makes them even more sensi-
tive to changes of those elements and makes error localization harder.
Tests that set up many expectations or engage in heavy verification may have a
hard time communicating their intent. A test that verifies this, then that, and finally
something else will probably just lock down the implementation while providing lit-
tle value.
A similar argument goes for mocks that are configured to expect interactions to
occur in a specific order. Somewhere deep in the codebase, there may be a piece of
code that truly benefits from having the order of interactions verified; however, in all
other cases—an overwhelming majority—this is the equivalent of inviting a vampire
into your home.
6. One of the creators of Mockito has written an interesting blog post on a similar topic (Faber 2008).
188 Chapter 13 Mocking Frameworks
Regardless of how an object of this class would be used in a test, a real object
should be created, not a mock. Sadly, this is what you might see instead:
This doesn’t only look bad. More objective arguments against doing this might
include the following:
Summary
Creating stubs and mocks is easy with today’s frameworks. They host functionality
for test double creation, expectation setup, and interaction verification. Constraints or
argument matchers are important building blocks, because they determine whether a
stub will respond to a query and whether a mock counts an interaction as successful.
Mocks come as nice, normal, and strict. Nice mocks tolerate unexpected interactions,
whereas strict mocks don’t, and additionally require that all expected interactions
occur—sometimes in a specific order.
Mocking frameworks provide their own matchers, and adding new ones is easy.
However, too complex matchers may know too much about the interactions they ver-
ify and thus make the tests unnecessarily rigid.
Mock tests can easily be overspecified. Overly restrictive verification doesn’t
automatically imply correctness, but is often a sign of poor code; things that should
be tested somewhere else end up being matched during verification. On the whole,
constantly be aware of the trade-off between depth of verification and coupling to the
implementation.
Finally, mocking concrete classes and mocks returning mocks should sound your
alarm bells.
7. http://docs.mockito.googlecode.com/hg/latest/org/mockito/Mockito.html#RETURNS_DEEP_STUBS
This page intentionally left blank
Chapter 14
Test-driven Development—
Classic Style
Test-driven development (TDD) is the practice of driving the design of code with
tests. In contrast to the traditional “write code – verify code” workflow, TDD man-
dates that the first task in any development undertaking be to write a test. Only then
can the code that will make that test pass be written. If faithfully applied, no produc-
tion code will ever come into existence unless it’s preceded and accompanied by at
least one test. This doesn’t “auto-magically” guarantee correctness of the code, and
many people would claim that TDD has nothing to do with testing. That said, test-
driven code is, by definition, testable, and after reading the opening paragraphs of
Chapter 4, “Testability from a Developer’s Perspective,” you know that such code stands
a better chance of being tested—either by developers, who would add more tests to it to
cover all equivalence partitions, edge cases, and possible error scenarios, or testers, who
would be able to focus on an observable and controllable part of the system.
Test-driven development is performed in short cycles, each consisting of three
phases—red, green, refactor. Red and green refer to the color of the bar (or any other
visual indicator of failure or pass) displayed by many testing frameworks and IDEs
when the test is executed. These are the steps of the workflow:
1. Red—Write a test. The test will fail because the functionality needed to make
it pass doesn’t exist. Often the test won’t even compile, because it’ll include
references to program elements that haven’t been created yet.
2. Green—Make the test pass. Take any shortcuts necessary, even if they make
your eyes and heart bleed.
3. Refactor—Remove the badness introduced when making the test pass.
Working like this pushes us toward very short iterations—in the order of magni-
tude of minutes or even seconds—which consequently results in instantaneous feed-
back about the state of the code and our progress.
Actually, this is all there is to it, but test-driven development is one of those prac-
tices that are simple in theory but that explode into a bunch of questions and techni-
calities when applied in practice. To illustrate some of them, I’ll use a TDD session
that happens to demonstrate different practical aspects of the technique.
191
192 Chapter 14 Test-driven Development—Classic Style
1. For the sake of the example, I decided on an in-memory implementation, although the design
would work for a disk-based solution as well with some adaptations.
2. “Very fast” means constant time with respect to the number of documents, that is, O(1).
Test-driving a Simple Search Engine 193
Figure 14.1 A simple index. The article “the” occurs in all three documents: twice in the
third document and once in the other two. In the index, this is represented as “the: [3,1 2],”
where the document with the most occurrences of “the” comes first—document 3. Note
that there’s a clash between documents 1 and 2 that both contain “the” once. It’s been
resolved by sorting the contending documents in ascending ID order.
@Test
void searchingWhenNoDocumentsAreIndexedGivesNothing() {
SearchEngine searchEngine = new SearchEngine()
assert [] == searchEngine.find("fox")
}
The test obviously didn’t compile, because it referenced a class and a method that
didn’t exist. However, it forced me to express in code what a part of the API would
look like—searching returns a list of something. Now, to make this test pass, I just
added the class and a next-to-empty method.
class SearchEngine {
List<Integer> find(String word) {
return []
}
}
At this point the objective was to make the test pass, even in a way hurtful to the
eyes and the heart. I made it pass using a hard-coded empty list, and I had completed
194 Chapter 14 Test-driven Development—Classic Style
two out of the three elements of the TDD cycle—write a failing test and make it pass.
Now it was time for refactoring. Alas, I didn’t find anything worth refactoring at this
point, so I moved on to the next test.
@Test
void searchingForADocumentsOnlyWordGivesThatDocumentsId() {
SearchEngine searchEngine = new SearchEngine()
searchEngine.addToIndex(1, "fox")
assert [1] == searchEngine.find("fox")
}
Here I made a rather significant decision that would affect the entire example:
I chose to represent the supposed documents as just strings. If this code were to live
outside the pages of this example, it would most likely work on streams. In produc-
tion, these streams would be file streams; in tests, they’d be in-memory streams feed-
ing off strings. Acknowledging this, I decided that there was little to be learned from
juggling streams and strings at this point, and it would just hurt the readability of the
example code.
class SearchEngine {
def index = []
void addToIndex(int documentId, String contents) {
index << 1
}
Another hard coding, and index was coming into existence. Progress! This “pro-
duction” code didn’t offer too many opportunities for refactoring. The test code,
on the other hand, could be improved by removing the duplicated creation of the
searchEngine object. Because I had the feeling that every test would start with
this same line of code, I decided to move it to a test initializer method.
@Before
void setUp() {
searchEngine = new SearchEngine()
}
@Test
void searchingWhenNoDocumentsAreIndexedGivesNothing() {
assert [] == searchEngine.find("fox")
}
@Test
void searchingForADocumentsOnlyWordGivesThatDocumentsId() {
searchEngine.addToIndex(1, "fox")
assert [1] == searchEngine.find("fox")
}
@Test
void allIndexedDocumentsAreSearched () {
searchEngine.addToIndex(1, "fox")
searchEngine.addToIndex(2, "dog")
assert [2] == searchEngine.find("dog")
}
Changing the list to a map and adding the storing of a one-element list of docu-
ment IDs in that map did the trick. The test passed.
class SearchEngine {
def index = [:]
void addToIndex(int documentId, String contents) {
index[contents] = [documentId]
}
I was just getting ready to move on, so I ran the entire test suite to make sure that
I was on solid ground, and boom! It turned out that the first test was now failing. It
complained about null being returned when a word wasn’t present in the index. It was
easy to fix. The lookup in the find method needed to return something reasonable if
there were no matches, like an empty list.
@Test
void documentsMayContainMoreThanOneWord() {
searchEngine.addToIndex(1, "the quick brown fox")
assert [1] == searchEngine.find("brown")
assert [1] == searchEngine.find("fox")
}
I didn’t even need to run this to know how miserably it would fail. There was no
reading multiple words in the code, so failure was imminent. The good news was that
it was easy to fix—just split the input up.
class SearchEngine {
def index = [:]
void addToIndex(int documentId, String contents) {
contents.split(" ").each { word -> index[word] = [documentId] }
}
Anything to refactor? Not really, but I wanted to help myself by spelling out what
the index actually was, so I introduced the type: Map<String, List<Integer>>
index = [:].
Test-driving a Simple Search Engine 197
@Test
void
searchingForAWordThatMatchesTwoDocumentsGivesBothDocumentsIds() {
searchEngine.addToIndex(1, "fox")
searchEngine.addToIndex(2, "fox")
assert [1, 2] == searchEngine.find("fox")
}
This looked quite intimidating at first . . . anticlimax. Only one line of code
needed changing. Resolving words to lists put me one step closer to the envisioned
design.
class SearchEngine {
Map<String, List<Integer>> index = [:]
void addToIndex(int documentId, String contents) {
contents.split(" ").each { word ->
index.get(word, []) << documentId
}
}
After having written the code, I realized that the test had passed out of sheer
luck. Nothing in the code implied any ordering of document IDs, so what I was get-
ting back was a list that reflected the insertion order. Had I started by adding “fox” to
the second document, the test would fail, because find would return [2, 1]. I had a
number of options here, ranging from a custom matcher that would ignore list order
to comparing the lists as sets, but I decided on the simplest one, which was to sort
the output before comparing. I just changed the assertion to assert [1, 2] ==
searchEngine.find("fox").sort()
@Test
void multipleMatchesInADocumentProduceOneMatch () {
searchEngine.addToIndex(1,
"the quick brown fox jumped over the lazy dog")
assert [1] == searchEngine.find("the")
}
How does one implement uniqueness? My first idea was that sets can’t hold dupli-
cates, so I quickly rushed ahead and changed the implementation of the index.
class SearchEngine {
Map<String, Set<Integer>> index = [:]
void addToIndex(int documentId, String contents) {
contents.split(" ").each { word ->
index.get(word, [] as Set) << documentId
}
}
All tests passed! Now about refactoring . . . the transformation of a set into a list
in find didn’t turn out too beautiful. Should something be done about that? This
was one of the most difficult moments in this session. The tests were all green, but
based on the design, I knew that I wouldn’t be able to make this work using sets.3
Therefore, I decided to refactor, not so much in response to the current state of the
code, but to prepare for the things to come. As a side effect, the find method became
uncluttered again.
class SearchEngine {
Map<String, List<Integer>> index = [:]
void addToIndex(int documentId, String contents) {
contents.split(" ").each { word ->
def documentIds = index.get(word, [])
if (!documentIds.find {i -> i == documentId} ) {
documentIds << documentId
}
}
}
Instead of using a set, I implemented the uniqueness “by hand” by only adding
a document ID to a word’s list of document IDs if it wasn’t already in that list. This
proved to be helpful in the upcoming step.
@Test
void documentsAreSortedByWordFrequency() {
searchEngine.addToIndex(1, "fox fox dog")
searchEngine.addToIndex(2, "fox fox fox")
searchEngine.addToIndex(3, "dog fox dog")
assert [2, 1, 3] == searchEngine.find("fox")
assert [3, 1] == searchEngine.find("dog")
}
This meant that the index had to store the number of times a word occurred
in a given document. The underlying data structure needed changing again. Here
I needed to stop. Even though I had just refactored the code in the previous step, I
decided that I needed another refactoring4; I wanted my implementation of the index
to support what I was about to do next. The good thing about this, though, was that
it allowed me to demonstrate an important aspect of TDD: Never refactor with a red
bar. Obediently, I @Ignored the failing test before proceeding.
Guided by my design idea, I knew roughly what to do. I wanted to store the word
frequencies somehow. I didn’t perceive the upcoming code change as entirely trivial,
so I implemented the most naïve solution that I could think of: instead of just storing
the document ID for each word, I started storing two values—the document ID and
the number of times the current word has appeared in the document with that ID. I
chose to call this class WordFrequency.5
class SearchEngine {
Map<String, List<WordFrequency>> index = [:]
void addToIndex(int documentId, String contents) {
contents.split(" ").each { word ->
def wordFrequencies = index.get(word, [])
if (!wordFrequencies.find {wf -> wf.documentId == documentId})
{
wordFrequencies << new WordFrequency(documentId, 1)
} else {
def wordFrequency = wordFrequencies.find
{ wf -> wf.documentId == documentId }
wordFrequency.count++
}
}
}
class WordFrequency {
int documentId
int count
Now the code reflected the design idea completely. It was neither aesthetically
pleasing nor efficient, but it worked! To implement ranking from this vantage point
was easy—all I had to do was to re-enable the test and sort the list of word fre-
quencies. I added the following line of code after the if-else in the addToIndex
method:
5. A smaller step here would be to represent the pair as an array. However, I’ve never been a
fan of arrays where the location of the element has a meaning, like it would here: arr[0] =
documentId, arr[1] = frequency. It’s just confusing.
Test-driving a Simple Search Engine 201
Red, green, refactor. Now, here were opportunities. I started by removing the
obvious duplication of wordFrequencies.find. Restructuring the code that
added a new word frequency to the list of frequencies allowed me to simplify the Word-
Frequency class’s constructor by dropping the count parameter. Finally, I pulled out all
of this code into a new method that I called bumpWordFrequencyForDocument.
Next, I did something that some might call “premature optimization.” Yes, a part
of me really suffered inside because of the superfluous sorting that was taking place,
though the main reason was that I wanted better readability. I moved away the sort-
ing from the loop and put it into its own method (with some minor adjustments to
the target of the sorting). This change made the addToIndex method quite small
and readable. It also had the advantage of raising the level of abstraction of addTo-
Index. Instead of dealing with rather atomic operations on maps and lists, it now
started to communicate its intent quite clearly.
private resortIndexOnWordFrequency() {
index.each { k, wfs -> wfs.sort
{ wf1, wf2 -> wf2.count <=> wf1.count } }
}
@Test
public void caseDoesNotMatter() {
searchEngine.addToIndex(1, "FOX fox FoX");
searchEngine.addToIndex(2, "foX FOx");
searchEngine.addToIndex(3, "FoX");
assert [1, 2, 3] == searchEngine.find("fox")
assert [1, 2, 3] == searchEngine.find("FOX")
}
Making this pass wasn’t very exciting. It was a matter of adding toUpper-
Case() in two places. In my eyes it didn’t break the code enough to mandate any
refactoring.
@Test
public void punctuationMarksAreIgnored() {
searchEngine.addToIndex(1, "quick, quick: quick.");
searchEngine.addToIndex(2, "(brown) [brown] \"brown\" 'brown'");
searchEngine.addToIndex(3, "fox; -fox fox? fox!");
I let the test spell out what punctuation marks I cared about. Again, I went with
what I thought was the obvious solution. After all, test-driven development isn’t about
taking tiny steps all the time. It’s about being able to (Beck 2002).
As soon as I had finished typing the regular expression for replacement, I saw
the refactoring I needed to do, but first I ran all tests and was rewarded with the
green bar. Now, what would the last refactoring of the session be? It struck me that I
had added similar logic in two different places. I had placed conversion to uppercase
after splitting the document into words, but for some reason, I had decided that the
stripping of punctuation marks should be done before breaking the document up into
words. Both of these operations are in fact preprocessing. I made that clear in code by
extracting them into a method.
}
resortIndexOnWordFrequency()
}
This concludes this book’s TDD session. Now I’ll bring in some TDD theory to
explain some decisions and turns I’ve made throughout it.
Note
All source code produced in this session can be found in Appendix B.
Order of Tests
Deciding in what order to write tests (and what tests to write) is often quite a chal-
lenge for developers new to test-driven development. Ironically, the order is rather
important. Your sequence of tests should not only help you make progress, but also
learn as much as possible and avoid the inherent risks of your implementation while
doing it. Conversely, if you have no strategy for picking the next test to write, you’re
likely to start spinning around interesting or easy cases, or you run out of ideas. Next
time, try writing your tests in the following order:
Challenges
When adopting test-driven development, a team faces some challenges that it must
overcome rather quickly. If most of the issues I’m about to describe aren’t swiftly
resolved, they turn the adoption into a painful process and a team trauma. Not con-
vinced? Try this scenario and send me a penny for every line you’ve heard at work.
Imagine Monday morning. Positive Peter and Negative Nancy are just getting
their morning coffee from the machine. Barry Boss bounces in . . .
Barry Boss: I went to this cool conference last week. They did TDD maaagic. So
must we! It’ll make us ten times as productive!
Positive Peter: Our team has been experimenting a little (without telling you),
but our codebase hasn’t been designed with testability in mind and is a mess.
We need to make some structural changes to it first, or start on a new system.
Barry Boss: What would that cost me?
Positive Peter: Well, we’ve always been rushing toward the next release and
accumulating technical debt without addressing it, so I’d say . . . a couple of weeks.
Challenges 207
Barry Boss: What? Weeks without productivity! Start doing this TDD thing on
the next project, which is due in eight months.
Positive Peter: (Sighs and starts walking away thinking about how to update his
resume)
Negative Nancy: That’s right! Our code is special. It’s like no other code in the
world. Our business rules are uniquely complex. Therefore, they cannot be
unit tested, so trying this test-driving thing is doomed to fail. Others can do
it, but their code isn’t as mission critical as ours.
Barry Boss (in a solemn voice): Indeed. Our code is special and mission critical.
Negative Nancy (feeling victorious): And besides, even if we had tried this thing,
it wouldn’t have given us complete testing anyway!
This short dialog embodies four very common challenges facing a team that’s on
its way to adopt TDD.
6. Monster method: A complicated method of high cyclomatic complexity with many areas of
responsibility. Most likely, at least 100 lines long.
208 Chapter 14 Test-driven Development—Classic Style
an undertaking. In such cases we can only opt for refactoring away one or
a few antitestability constructs and postpone 100 percent TDD for another
occasion. This is an incarnation of the Boy Scout Rule.7
Often, this challenge is of the chicken and the egg nature: in order to make code
testable, we need to write enough tests to get a feeling for what testable code looks
like. And conversely, in order to write tests, we need a testable codebase.
7. Boy Scouts are supposed to leave the campground cleaner than they found it. So should
developers do with code.
Test First or Test Last? 209
and pair programming, sometimes formal methods, and eventually various types of
manual testing. Test-driven development, with its emphasis on unit tests, provides a
good foundation for many quality assurance activities.
(By the way, did you notice that this book just happens to be about these topics?)
In such circumstances, taking the step toward test-driven development is an enor-
mous challenge. Many practices have to be learned, revised, and improved at once.
However . . .
Learning what testable code looks and feels like takes quite some time. Also
learning it in theory may be hard; it’s best experienced in practice. In this regard,
starting with test-driven development offers a gentle and stepwise introduction. In
addition, the practice helps in maintaining the discipline to get the tests written. Tests
written supposedly after the production code may be forgotten or omitted in the heat
of battle. This will never happen when working test first.
Then there’s the issue of applying TDD to drive the design of the system, not
the individual modules. Test-driving at this level competes with old-school design
work. Yes, a developer experienced in producing good interaction protocols and
interfaces is likely to get them right to some extent, but that might be a gamble with
no feedback loops.
On the other hand . . .
Test-driven development requires being able to visualize both the solution and
how to test the solution, which can be an obstacle with technologies that are new or
unfamiliar to the developer.
To summarize, code following reasonable contracts written in a testable way
may be just as “good” as code written using test-driven development. However,
working test first definitely makes achieving testability, correctness, and good design
a lot easier.
Summary
Test-driven development is a way of using tests to drive the design of the code. By
writing the test before the code, we make the code decoupled and testable.
Test-driven development is performed in a three-phase cycle:
The refactoring stage is crucial to the technique’s success, because this is where
many principles of good design are applied. When adding tests, the following order of
doing it usually helps:
1. Degenerate case
2. One or more happy path tests
3. Tests that provide more information
4. Negative tests
Summary 211
There’s nothing magical about code created using test-driven development. Such
code can be crafted without writing tests first. However, doing this requires a lot of
experience.
This page intentionally left blank
Chapter 15
Test-driven Development—
Mockist Style
The kind of test-driven development that was presented in the prior chapter will get
us far, but truth be told, there are situations in which it’s hard to apply. Many devel-
opers work with large enterprise systems—often much larger than necessary due to
overinflated design and accidental complexity—composed of several layers. Test-
driving a new feature starting at the boundary of an enterprise system using the tech-
niques we’ve seen so far is challenging, even for seasoned TDD practitioners. This
type of complexity is also demoralizing to those who are just beginning to learn test-
driven development.
A Different Approach
Let’s say that we’ve been tasked with implementing a simple web service for register-
ing new customers and their payment details. Such functionality is common enough
in a typical customer-facing enterprise system. The overall requirements for this
first version of the solution are that customers should be able to pay with direct bank
transfers and the major credit cards (PayPal and Bitcoin will appear in version 2.0).
A quick session at the whiteboard reveals the design idea shown in Figure 15.1,
guided by the system’s existing architecture and design conventions.
Now, suppose that we want to test-drive a customer registration endpoint, which
happens to be a RESTful web service that interacts with other services, which, in
turn, call repositories1 and client code that communicates with external parties.
What would the assertEquals of the first test look like? What if the customer
registration endpoint doesn’t even return anything except for HTTP status codes?
Fortunately, there is a solution.
The quick design session exposes a couple of components with different roles
and responsibilities. Some of them may already exist in the current system; some may
need adding. Nevertheless, the sketch tells us how the different objects should inter-
act and collaborate. From here we can test-drive this design, and the various interac-
tions between the objects, before getting to details such as persistence and external
213
214 Chapter 15 Test-driven Development—Mockist Style
Figure 15.1 Components required to implement customer registration, while staying true
to the system’s architecture and design guidelines.
integrations. The sketch also hints that the majority of operations are what one would
call “commands,” that is, instructions to do something, not to return something. In
other words, most of the design follows the “Tell, Don’t Ask” principle (or Law of
Demeter, if you will).
A situation like this is ideal for the mockist style of test-driven development,
which focuses on interfaces and interactions and favors the use of mock objects to do
so. It also encourages doing some design thinking before writing the tests.
endpoint, and focus on its interface and interactions with its closest collaborators
instead. Hence, the purpose of the first test is to drive these interactions.
@Test
public void personalAndCardDetailsAreSavedForCreditCardCustomers() {
CustomerRegistrationEndpoint testedEndpoint
= new CustomerRegistrationEndpoint();
CustomerService customerServiceStub = mock(CustomerService.class);
PaymentService paymentServiceMock = mock(PaymentService.class);
testedEndpoint.setCustomerService(customerServiceStub);
testedEndpoint.setPaymentService(paymentServiceMock);
when(customerServiceStub.registerCustomer(customer))
.thenReturn(newCustomerId);
testedEndpoint.registerCustomer(details);
CreditCardDetails cardDetails
= new CreditCardDetails(CreditCardType.VISA,
1111222233334444L, 123);
verify(paymentServiceMock)
.registerCreditCard(newCustomerId, cardDetails);
}
This is a gigantic test (it took around 15 minutes to write). True, it could have
been simpler, but because we already have a design idea, we don’t need to strive for the
simplest thing that could possibly work. Given some building blocks and a general
feeling for the solution, aiming for an intuitive API feels more natural, to me at least.
In an actual system, some of the classes would already exist and there would be
less work putting everything together, but nonetheless, the test would still require a
lot of work. What can we deduce from this first test?
216 Chapter 15 Test-driven Development—Mockist Style
Missing Fields
Some fields have been left out from the registration details, like address, maybe date of
birth, card holder’s name, and card expiration date. In real code they would be there,
but I wanted to keep the example short and relevant.
Now, notice that there’s only one verification, so to make this test pass, we could
just use the simplest of the red-green bar strategies—faking (Beck 2002).
This code will make the test pass; the customer details are completely ignored
and hard-coded credit card details are being registered. However, by initializing both
the registration details and customer details to consistent reasonable values in the
test, and by providing a stub of CustomerService that really uses them, I wanted
to create some maneuvering room for the upcoming production code.
Using faking to make the test pass is fine, but if you trust your design and are
comfortable with mock objects, I suggest nailing the entire chain of interactions in
one sweep.2 After all, this style of test-driven development is best suited for driv-
ing the interactions between objects, and a well-written test should provide enough
groundwork for an obvious implementation (the second red-green bar strategy). In
this case, it would be along these lines:
Not so scary, right? Nothing’s faked and all values are being faithfully shuffled
between the interacting objects. Still, this is pretty rough code. It contains no error
handling, and the parsing looks crude3 (which means that CustomerRegistra-
tionEndpoint definitely needs some more tests). However, it does illustrate the
interactions needed for registering a customer who pays with a credit card. In the
refactoring stage, I’d probably move the creation of the CreditCardDetails
domain object to a separate method to get rid of the parsing, which looks out of place
because it’s on a different level of abstraction than the rest of the code. What’s more
interesting is the next test!
What that would be is far from obvious. It could be one of these:
2. This again is a way of saying that we can take larger steps while doing test-driven development
if we feel secure.
3. So crude that one of my reviewers objected to even using the word parsing.
218 Chapter 15 Test-driven Development—Mockist Style
In the previous chapter, it was said that we should pick tests along the happy path
or tests that provide us with more information and knowledge. At this point, test-
ing registration of another payment type would provide little information. The design
sketch tells us that no new collaborators would be introduced (both services are already
used in the first test), so the test would be quite similar to the one for registering cus-
tomers paying with credit cards. Although it wouldn’t be wrong in any way to explore
that alley, going forward with one of the other tests should be more enlightening.
Discovering what persistence of insensitive data in the database would look like
seems easy enough, so a test of CustomerService it is.
@Test
public void validCustomerIsPersistedDuringRegistration() {
CustomerServiceImpl testedService = new CustomerServiceImpl();
CustomerRepository customerRepositoryMock
= mock(CustomerRepository.class);
testedService.setCustomerRepository(customerRepositoryMock);
Customer customer = new Customer("Joe", "Jones");
testedService.registerCustomer(customer);
verify(customerRepositoryMock).save(customer);
}
This is a typical “pass-through” test; it verifies that one layer calls another layer.
You’ll be writing a lot of these in enterprise applications (which should really make
you start thinking about design and architecture). Still, it takes us in the right direc-
tion. It brings the CustomerServiceImpl4 class to life and defines the interac-
tion between the service and the repository.
Because we’re still concerned with credit card registrations, the next test would
tease out a concrete implementation of PaymentService, which would be of a
pass-through nature as well.
@Test
public void registerNewCardDetailsAndStoreSecureIdentifier() {
final CustomerId customerId = new CustomerId(12345);
PaymentServiceImpl testedService = new PaymentServiceImpl();
4. Many people consider naming classes ending with “Impl” to be an antipattern, and I agree.
However, I’m not trying to present perfect code, but code that we’ve all seen time after time and
that we can relate to.
A Different Approach 219
CreditCardRepository cardRepositoryMock
= mock(CreditCardRepository.class);
CreditCardGateway creditCardGatewayStub
= mock(CreditCardGateway.class);
testedService.setCreditCardRepository(cardRepositoryMock);
testedService.setCreditCardGateway(CreditCardType.VISA,
creditCardGatewayStub);
CreditCardDetails cardDetails
= new CreditCardDetails(CreditCardType.VISA,
1111222233334444L, 123);
when(creditCardGatewayStub.registerCreditCard("1111222233334444",
"123")).thenReturn("FA04BC12");
testedService.registerCreditCard(customerId, cardDetails);
verify(cardRepositoryMock).save(
new SecureCreditCardId(customerId, "FA04BC12"));
}
Being more complex than the test for CustomerService, this test forces us to
start thinking about how data is represented. For example, the interface to the credit
card gateway seems to be string oriented, whereas our code uses domain objects like
SecureCreditCardId.
5. Actually, we could write a mock test here as well, but it would have to be accompanied by an
integration test.
220 Chapter 15 Test-driven Development—Mockist Style
Figure 15.2 When using the mockist style in a layered system, we are in practice adding
mock-based tests breadth first. Then, we switch strategy at the fringes.
Double-loop TDD
Developing code using only mock-based interaction tests should make you feel a little
uneasy. At the end of the day, such tests won’t determine whether the program works
Double-loop TDD 221
as a whole. It’s great that the design has been driven by tests and that all interactions
are verified, but does it all come together? Remember that each test only checks inter-
actions between collaborators in adjacent layers.
The authors of the book Growing Object-Oriented Software, Guided by Tests
describe a great solution to this (Freeman & Pryce 2009). Although they never use this
term themselves, if I recall correctly, they propose double-loop TDD.
It lets us verify that all interaction tests, and any other unit tests for that mat-
ter, add up to a working solution.
It tells us when a larger chunk of functionality is actually finished.
It forces us to deploy and invoke the new feature in a realistic way.
1. Start the framework or container that would provide a web service for per-
forming the registration
2. Deploy/start the registration endpoint
3. Post registration details to the endpoint
6. I’ve put the words in quotes, because the test is technical and has little to do with user acceptance.
222 Chapter 15 Test-driven Development—Mockist Style
Figure 15.3 Double-loop TDD: The outer feedback loop consists of an end-to-end
“acceptance test,” and classic and mockist TDD provide the inner loop.
Points 1, 2, and 5 are essential plumbing, although they force you to decide how
to deploy the endpoint.7 To get past point 3, you’d have to figure out whether the
test should use a framework to invoke the service, or take a more “raw” approach,
like a hand-crafted HTTP POST request. Point 4 is the interesting one. How would
you verify the result? In an end-to-end test, querying the underlying database for the
secure identifier would be cheating under normal circumstances. However, because
it’s a secure identifier, it might be the only way (unless the test grows even larger to
include a call to the credit card gateway where the identifier is present). Such conun-
drums are the topic of Chapter 18, “Beyond Unit Testing.”
7. This is no easy decision by any means. Many options have become available in recent years.
First, you need to decide on whether to deploy to the cloud, a local virtual machine, or good old
bare metal. Second, you may decide on using a virtual machine manager, like Vagrant. Third,
you may go for a tool that does the provisioning, like Chef or Puppet, or to use lightweight
virtualization. Docker is the de facto standard at the time of writing. Then there’s the choice of
bootstrapping the application . . .
Summary 223
Summary
Mockist TDD is an alternative approach to test-driven development (in contrast to
“classic” TDD). This style primarily focuses on the design of the system, not the imple-
mentation of individual classes. As the name suggests, mock objects play a large role,
as they are used to drive the interactions and to establish interfaces between objects.
Double-loop TDD means that unit tests are preceded by automated end-to-end
acceptance tests, which require the entire infrastructure and deployment process of
the feature to be in place. Such tests will fail until the entire feature is implemented.
Adding this safety net to test-driven development is particularly helpful if the major-
ity of the tests are based on mock objects and it’s hard to decide whether the sum of
all interactions equals a correct implementation of the feature.
This page intentionally left blank
Chapter 16
Duplication
1. Broken window theory (short and simplified): An abandoned building will start getting
vandalized once one of its windows gets broken. The broken window signals that nobody cares
and invites to doing more damage.
225
226 Chapter 16 Duplication
2. Then there’s the exception to the rule, as one of my reviewers pointed out. He followed this tip
once, and he removed the only code that had unit tests.
3. I’d wager that such organizations’ admiration of quality is proportionally inverse.
228 Chapter 16 Duplication
Mechanical Duplication
There are several ways to introduce duplication into code and stress a developer’s
short-term memory. Mechanical duplication is my fancy term for copy and paste pro-
gramming, which may be performed in adjacent sections of the source code or across
different modules. The outcome is still the same: suddenly two or more instances of
the same code must be maintained. The results of working with these copies may
range from irritation and bugs (as in the case of Ursula and David) to confusion and
misunderstanding. In the following pages, some examples of such duplication are
provided, along with examples of typical bugs.
The obvious opportunity to introduce bugs here lies in changing one instance of
the duplicated code and not the other. Such simple bugs would normally be caught
by unit tests, but systems where this kind of duplication is practiced usually don’t
impress when it comes to unit test coverage.
}
public void update(Customer customer) {
if (customer.getGender() == Gender.UNKNOWN
|| customer.getDateOfBirth() == null) {
throw new IllegalArgumentException(customer
+ " not fully initialized");
}
Large blocks not only increase the chance of diverging implementations, but are
also tiring to read (both in source code and in a book).
this.ipAddress = ipAddress;
this.netMask = netMask;
this.broadcast = broadcast;
this.defaultRoute = defaultRoute;
this.ipV6Address = ipV6Address;
this.ipV6NetMask = ipV6NetMask;
this.ipV6DefaultRoute = ipV6DefaultRoute;
}
The more assignments in the duplicated constructors and the more constructors,
the greater the chance of forgetting an assignment in one of them.
Method Duplication
This could also be called “method copy and paste.” It means that a method has been
copied from one context to another. In object-oriented systems, the most obvious con-
text would be a class. However, a context may also be a namespace, module, or project.
Typically, “utility” methods become victims of this flavor of duplication. The
method in the following example is one such. It’s actually a reconstruction of some-
thing I once found in a codebase in eight different classes.
The obvious problem here is divergence, and the danger lies in the methods having
the same name, while behaving differently. It’s not hard to imagine that someone
would try to “repair” the diffTime method to look like this . . .
. . . in seven out of eight places in the code. Imagine the kind of bugs this would give
rise to and how the end users would perceive the system’s behavior.
Knowledge Duplication
In contrast to mechanical duplication’s stamping of the same lines of code across the
system, knowledge duplication is the result of deliberate design decisions. It may be
the kind of duplication needed to achieve decoupling, independence, or maneuvering
space for redesign and rewriting. Unfortunately, it may also be a result of ignorance
and conflict, in which case the effects are the same as those of mechanical duplica-
tion, but on a larger scale.
Knowledge duplication is about reintroducing existing concepts and functional-
ity, but doing so not by copying existing code, but by writing new code. Such code
may use different names and abstractions, look different, be more testable, or just
better, but it still duplicates existing functionality. This has consequences for both
development and testing.
In codebases with true collective code ownership and teams taking turns work-
ing on different parts of the system depending on their current focus, developers
must both know about all instances and versions of the duplicated functionality and
actively choose how to act on this knowledge. Do they change or add things in one
instance or both? Do they try to delete one instance? Where do they write unit tests?
Not knowing of all the duplicates also has its costs, the obvious one being the risk of
introducing yet another one.
Testing of systems with a high degree of mental duplication also becomes harder,
especially from a black box perspective. Not knowing how many “solutions” there are
behind common functionality and implementation of business rules, more testing is
required—like in the copy and paste example in Chapter 4.
Next follow some variations of knowledge duplication, starting with the simple
cases and progressing to the more sophisticated ones.
There are several reasons why such methods may come into being. Some are
good, some are bad.
system, they’re even harder to get rid of than overlapping methods. Classes leave a
larger footprint and may get deeply entangled via their methods.
The reasons for creating competing and overlapping classes are the same as those
for creating duplicated methods, but they lean more toward ignorance and deliberate
design choices. If a concept has a totally alien name in an older part of a system, then
surely a new developer might create a new class for it with a more intuitive name.
Competing Implementations
This duplication lies in solving a similar problem differently in the same system. It’s
easiest to spot on the architectural or design level. For example
Module A uses this logging framework, and module B uses that logging
framework.
Module C relies on handwritten SQL, whereas module D uses an O/R
mapper.
Module E uses a date library, whereas date computations have been imple-
mented from scratch in module F.
Module G performs client-side validation, whereas server-side validation is
preferred by module H.
This list easily grows. From the developer’s point of view, this switching can be
interesting, dreary, or just a fact of life. However, I’d argue that it has a certain impact
on testing. Would you test a system that you knew was built using an O/R mapper
differently from one that relied on handwritten SQL?
Using different frameworks or idioms across the system doesn’t automatically
have to decrease testability. Here maintainability and consistency are more of an
issue. If the system is really loosely coupled and there’s a deliberate strategy that says
that teams get to pick their own stacks, then roll with it. If the system is more of
a monolith containing four generations of logging frameworks, three unit testing
frameworks, five web frameworks, and two dependency injection frameworks, obvi-
ously starting a conscious effort to reduce this fragmentation will benefit maintain-
ability, performance, and most likely testability as well.
later, they no longer support the business model or the needs of their users. Still, they
keep a decade of data hostage, and there’s no way that they’re going to be rewritten.
In such cases starting afresh with a new domain model, new concepts, new tech-
nology, and new everything may save the system and allow the business to func-
tion without any interruptions. This comes at the cost of the ultimate duplication of
knowledge: everybody working with the system—at least from the internal point of
view—must be aware of what model they’re working with. So when a new business
rule is introduced, care must be taken to implement it in either both models or just
the new one while deprecating or deleting functionality in the old one.
Naturally, both models and their related concepts will require different kinds of
testing, because they’ll have been built by different people using different technolo-
gies, which, no doubt, will make them have their own different quirks.
My advice on competing domain model duplication is that it shouldn’t drag on
forever. The process of transition between domain models is a slow one—in large sys-
tems in particular—and you don’t want to be stuck in the middle. Being there has
certain distinct disadvantages. In terms of testability, you’re most likely in a place
where you have to maintain two stacks of testing tools, and you have to be on your
toes when it comes to requirements: Which model supports which functionality?
Development-wise, all good things happen in the new code (both the production and
test code), and morale plummets when working with the old code. Bad morale is sel-
dom good for correctness. Therefore, make the transition as swift as possible.
Summary
From a testability point of view, duplication is the developer’s and the tester’s enemy.
It makes the codebase larger and more difficult to navigate, it breeds more duplica-
tion, it leads to specific kinds of bugs that are about changing something in x out of
y places—and forgetting about the remaining y – x instances—and it messes up met-
rics. Allowing a certain degree of duplication may increase a development organiza-
tion’s throughput, though, as bottlenecks may be removed.
Duplication can be divided into mechanical and knowledge. The former is the
result of copying and pasting code in various ways and is easy to fix. The latter is
about overlapping and competing concepts, and can be very challenging to get rid
of if unwanted, because it may reside in the very core of the system’s architecture.
Knowledge duplication may be fueled by ignorance, fear, laziness, conflict, or a com-
bination thereof. It may also be a result of deliberate actions taken to reduce a team’s
need for synchronization around hot spots in the code.
This page intentionally left blank
Chapter 17
Working with Test Code
Apart from following principles of good design, just like production code, test code
has an extra area responsibility—to explain and to describe what the production
code is supposed to be doing. Also, just as with production code, some people may
feel uncomfortable deleting it. This chapter contains some pointers about how to
work with existing test code, how to improve it, and when to delete it.
Commenting Tests
Should test code be commented? That depends. On one hand, the quality of the test
code should be on par with the quality of the production code. It should be well struc-
tured, follow all the principles of good design, and test names should be accurate and
descriptive (and so should the variable names) (Tarnowski 2010). On the other hand,
some tests will still be difficult to understand, even though they live in nicely named
methods with clean code and good variable names. In certain cases, cause-and-effect
relations cannot be deduced from good intent-revealing names alone. Sometimes
a specific combination of input and state will trigger a business rule that’s hard to
describe without using some well-placed comments. However, these cases should be
quite rare; if the production code is so cryptic that its test code must be commented to
explain the business rules, then some lights should go red.
237
238 Chapter 17 Working with Test Code
@Test
public void simpleMisspellingsAreTolerated() {
ParsedAddress address = addressParser.parse("Sesame streat 123", 1);
assertEquals("Sesame street", address.streetName);
}
Write:
@Test
public void simpleMisspellingsAreTolerated() {
String misspelledStreetAddress = "Sesame streat 123";
int toleratedNumberOfErrors = 1;
ParsedAddress address = addressParser.
parse(misspelledStreetAddress,toleratedNumberOfErrors);
assertEquals("Sesame street", address.streetName);
}
Write:
Needless to say, this can result in bloat as well. Adding obvious messages to asser-
tions just clutters the code, so pick your battles. Your general rule should be to use
neither comments nor assertion messages.
If you still really need an assertion message, make the test fail just to see what the
combined message looks like. Watch the phrasing to make sure that it’s informative
Commenting Tests 239
and that the string supplied by you concatenated with that of the assertion doesn’t
produce a confusing message. For example, this message
java.lang.AssertionError: IP address
This isn’t helpful at all, and a better message would be needed to actually help
you understand why the assertion failed. As a final note, don’t get fancy with these
messages; plaintext only. No clever logic to construct the message string.
@Test
public void productsInHistoryWithTotalPriceLessThan100_
NoFreeShipping() {
Customer customer = new Customer(1, "Mary", "King");
Purchase purchase = new Purchase();
// Not eligible for free shipping.
purchase.addProduct(new Product(1, "Product", new Money(99)));
customer.getPurchaseHistory().add(purchase);
assertFalse(customer.hasFreeShipping());
}
@Test
public void productsInHistoryWithTotalPriceGreaterThan100_
GetFreeShipping() {
Customer customer = new Customer(1, "Mary", "King");
Purchase purchase = new Purchase();
// This time the customer has passed the threshold
// for free shipping by exceeding $100.
purchase.addProduct(new Product(1,"Product", new Money(150)));
customer.getPurchaseHistory().add(purchase);
assertTrue(customer.hasFreeShipping());
}
240 Chapter 17 Working with Test Code
Write:
@Test
public void productsInHistoryWithTotalPriceLessThan100_NoFreeShipping() {
Customer customerWithoutFreeShipping
= customerWithTotalPurchaseAmount(99);
assertFalse(customerWithoutFreeShipping.hasFreeShipping());
}
@Test
public void productsInHistoryWithTotalPriceGreaterThan100_GetFreeShipping() {
Customer customerWithFreeShipping
= customerWithTotalPurchaseAmount(150);
assertTrue(customerWithFreeShipping.hasFreeShipping());
}
Factory methods, factory classes, and builders have certain effects on test code.
Occasional factory methods sprinkled throughout the codebase tend to introduce
duplication. Many tests will want to construct central objects or data structures in a
simple way, and you’ll end up with 10 different factory methods doing pretty much
the same thing. This, if not sooner, is a good time to refactor the code and create one
factory or builder that will be used by all tests. On the other hand, such helper classes
may introduce coupling between previously unrelated tests. This shouldn’t be a prob-
lem, but rather an opportunity to think about the design of the test code and some
more refactoring.
@Before
public void setUp() {
testedService = new PaymentService();
Even if this isn’t the case and the problem isn’t in the comments, splitting a test
class that mixes state tests with interaction tests into at least two test classes is usually
a step toward better maintainability.
Deleting Tests
Test code being regular code, it should be quite apparent when to refactor, redesign, or
delete tests. There’s nothing about test code that gives it permission to ignore design
principles and patterns or to disregard guidelines such as those in the book Clean
Code (Martin 2008) or the like. If this were true of test code out there, this entire sec-
tion would be superfluous. However, this is not the case, and in my personal experi-
ence, there’s something about deleting test code that sparks arguments that wouldn’t
be brought up in a conversation about “regular” code. With this in mind, I’m wrap-
ping up with some pointers about when to delete test code.
who wrote them was absorbed in ending the test with some kind of assertion
and forgot about actually testing anything useful. Such tests just confuse and
must go.
Tests that don’t compile—In some extreme cases that unfortunately exist,
some tests don’t even compile. This may not even be perceived as a problem,
because teams/organizations in which this happens usually don’t compile and
run their tests anyway. Such tests should be deleted. Making them compile is
often not worth the effort, because the compiled result will probably fall into
one of the preceding categories anyway.
Tests that are commented out—Do you keep code that’s commented out?
Then why keep tests that are?
Redundant tests—When two tests verify the same thing they are, by defini-
tion, redundant. Tests are usually not created redundant, but they become
redundant after rounds of refactoring and redesign. Compared to the earlier
points, this isn’t the worst that can happen to you. However, redundant tests
add to the overall number of tests and create a false feeling of safety. A bigger
concern is the fact that multiple tests may start failing because of a single bug.
Having many redundant tests in the codebase also encourages the existence
of tests that “test everything”—that is, that verify irrelevant state or inter-
actions because they can. After all, oververification is just another type of
redundancy. Needless to say, such tests are often useless for defect localiza-
tion. For these reasons, I really recommend that redundancy among tests be
reduced, which may imply refactoring, rewriting, or removing tests.
me). The difficulty lies in keeping the syntax and various quirks and oddi-
ties of the different frameworks in your head. Also, that’s demanding a lot
from developers that enter your team. So, if a dozen unit tests use EasyMock1
whereas several thousand rely on Mockito,2 either fix the ones that chain you
to EasyMock or delete them (and drop EasyMock from the project entirely) to
achieve consistency.
Outgrown tests—This is an interesting category of tests. These are tests that
were once useful, but that have been replaced by more useful tests. This is
often true of tests that were created when using triangulation (described in
the chapter on test-driven development). When trying to triangulate the solu-
tion, a number of tests come into existence, and once the algorithm is found,
they may no longer be needed. They’re not really 100 percent redundant, but
they feel awkward. Some people prefer to delete such tests.
Summary
Test code follows the same conventions as production code and should be of equal
quality. Use comments sparingly and only in cases where a well-written test may not
illuminate some intricacies of the tested code.
Before commenting test code, try these strategies:
As you grow accustomed to writing unit tests, you’ll most likely appreciate the secu-
rity and feedback they provide, and you’ll want the same for bigger building blocks
and their interactions.
Until now, many topics have been illustrated with unit tests. It’s quite natural,
because they constitute the basis of developer testing and embody many of the prin-
ciples behind more complex tests. Besides, they can be kept small and to the point,
which is rather helpful when explaining a concept or technique. If you get the low-level
unit tests right, shifting toward higher-level tests, like integration tests or end-to-end
tests, is relatively easy. Still, there are some differences and pitfalls worth mentioning.
This closing chapter will get you started on the journey toward advanced devel-
oper testing, for unit tests are but the first step. A word of caution: the topics covered
in the following few pages can easily fill an entire book. I’ve tried my best to cherry-
pick and highlight things that I consider important and helpful to the reader at this
point to the best of my ability.
1. Maybe settling for Medium and Large tests isn’t such a bad idea . . .
245
246 Chapter 18 Beyond Unit Testing
@ContextConfiguration(classes = {TestContextConfiguration.class})
public class CustomerRepositoryTest extends
AbstractTransactionalJUnit4SpringContextTests {
@Autowired
private CustomerRepository customerRepository;
@Test
public void readBackStoredCustomer() {
long newCustomerId = customerRepository.nextIdentity();
Customer customer = new Customer(newCustomerId, "John", "Smith",
"[email protected]", "100 Main St., Phoenix AZ 85236");
customerRepository.save(customer);
Customer savedCustomer =
customerRepository.findById(newCustomerId);
assertThat(savedCustomer, equalsIgnoringCreationDate(customer));
}
}
2. Technically, such tests work for message queues or any other artifacts that support transactions,
but let’s keep to the most common case.
Tests that Aren’t Unit Tests 247
This test is quite benign. It verifies that a customer saved using the Customer-
Repository class can be read back by the same repository. If a persistence frame-
work is used, such tests give relatively little return on investment, because they’re
mostly testing that framework. They start making sense if the persistence opera-
tions are implemented by hand (which they were here using JdbcTemplate). I can
witness first-hand that even something as trivial as saving a few fields in a straight-
forward table is prone to error because of misplaced commas and missing values in
constructors. My personal failures aside, tests like this really start to shine when they
exercise logic that’s hard to test in other ways—methods that call stored procedures,
database triggers, or persistence abstractions that hide business logic. All of this can
happen within the confinement of a transaction and traces of it disappear upon roll-
back. The magic happens when the AbstractTransactionalJUnit4Spring-
ContextTests class is given a transaction manager and data source that references
the test database. Setting this up is the responsibility of the TestContextConfig-
uration class.
@Configuration
@ComponentScan("repository")
public class TestContextConfiguration {
@Bean
public PlatformTransactionManager transactionManager(
DataSource dataSource) {
return new DataSourceTransactionManager(dataSource);
}
@Bean
public DataSource dataSource() {
DriverManagerDataSource dataSource
= new DriverManagerDataSource();
dataSource.setDriverClassName("com.mysql.jdbc.Driver");
dataSource.setUrl("jdbc:mysql://192.168.0.128/testdb");
dataSource.setUsername("tester");
dataSource.setPassword("secret");
return dataSource;
}
Short as it is, the test does rely on a database being available. On the whole, the actual
complexity of such tests usually lies in how the database is set up and what kind of
data it contains. This particular test is simple, because it doesn’t require any data to
be present in the database before it’s executed. In an actual integration test suite, such
248 Chapter 18 Beyond Unit Testing
tests would be in the minority, and many tests would start by populating tables or
require that the database contain some base dataset.
Either way, the build that would run the integration test suite would be respon-
sible for orchestrating both the execution of the tests and the availability of the data-
base. Depending on the infrastructure and database type, this may be relatively easy
or quite challenging.
@SpringApplicationConfiguration(TestContextConfiguration.class)
@WebIntegrationTest
class CustomerServiceTest extends Specification {
@Autowired
private CustomerTestRepository customerTestRepository;
private RestTemplate restTemplate = new TestRestTemplate()
given:
customerTestRepository.deleteAll()
when:
def newCustomer = new Customer(firstName: "John",
lastName: "Smith",
email: "[email protected]",
shippingAddress: "100 Main St., Phoenix AZ 85236")
URI location = restTemplate.postForLocation(
"http://localhost:8080/customers", newCustomer)
Tests that Aren’t Unit Tests 249
then:
location.path =~ /.*\/customers\/\d+$/
and:
customerTestRepository.customerCount() == 1
}
}
class Customer {
String firstName
String lastName
String email
String shippingAddress
}
Here I almost feel like cheating. Again, I’ve used the Spring framework to start an
entire server running a RESTful service by using one line of code—@WebIntegration-
Test. On the other hand, wrestling with server start-up and service deployment
isn’t the key focus here. Instead, let me direct your attention to the fact that this test
makes use of a repository3 tailored specifically for testing to delete all customers and
to count them. The implementation details of the deleteAll method aren’t impor-
tant. Its purpose is to delete all customers and their data (customers being the aggre-
gate roots), so that observing creation of a new customer is easy.
When it comes to invoking the actual web service, the test is satisfied if the HTTP
response contains a Location header that seems to be containing the URL of a new
customer resource. In this case, the location is verified using a regular expression.
Other alternatives would be inspecting the body of the response (as it could contain
a representation of the new customer resource), the HTTP response code, walk the
extra mile and GET the new customer resource, or pull the customer out from the
database. What would be the right thing to do? Bear with me through some more
examples, and we’ll revisit this issue. (Although the short answer is: “it depends.”)
3. Of course, it doesn’t have to be domain-driven design like a repository. A good old DAO or a
helper class that digs around in the database will do.
250 Chapter 18 Beyond Unit Testing
against an external party’s sandbox or test environment, and the ones that fake the
system they interact with. Both types come with their advantages and disadvantages.
Some types of services are best operated by vendors who have the know-how and
compliant environments. Payment gateways are a typical example. Not only is pro-
cessing of online payments most likely not the problem you want to solve, but stor-
ing card holder details also mandates compliance with a security standard called PCI
DSS,4 which is rather cumbersome to implement. Therefore, it’s quite natural to turn
to a third-party payment gateway provider and use their API to process payments.
The vendor will most likely provide a test environment—a sandbox—against which
you can test your integration. The sandbox will be very similar to the production
environment, but will run on test data and be totally safe to interact with.
Tests that span across hops to external parties have to be prepared for interacting
with environments they can’t control. In practice it means that such tests may fail if
the vendor’s sandbox is down, and that they have to adapt to the quirks and mechan-
ics of the third party’s test environment and API. In other words, they run with lim-
ited controllability.
If the vendor’s API is well designed, both using it and testing it shouldn’t be hard.
Therefore, both the production code and test code may look quite harmless. Have a
look at this test that incorporates PayPal and executes a credit card payment to their
test sandbox system.
def test_pay_with_visa_using_valid_payment_information
address = Address.new({:first_name => "John",
:last_name => "Smith",
:street => "100 Main St.", :city => "Phoenix",
:zip => "85236", :state=> "AZ"})
The test looks benevolent, and the code it tests isn’t much scarier:
4. https://www.pcisecuritystandards.org/pci_security/
Tests that Aren’t Unit Tests 251
payment.id
else
raise PaymentError, payment.error
end
end
However, PayPal’s API abstracts away a sequence of two calls to a REST end-
point. The first call retrieves an authorization token, whereas the second performs
the actual payment. In conclusion, the third-party API does all the heavy lifting.
In its present form, the test verifies that the request is constructed in a way that’s
acceptable to the endpoint and that the system that runs the test can establish a con-
nection to PayPal’s sandbox. Although it makes a succinct example, I’d probably
make use of its mechanics differently on a real project. Either I’d turn this into a
test that would ensure that PayPal’s API is called correctly—I’d have the tested code
return more than the ID and I’d check the response more thoroughly. Such a test
would protect from inadvertent changes to the PayPalPaymentMethod class and
the less likely scenario of PayPal changing its API. Or, I’d let this be the last phase
of an end-to-end test that would exercise an entire workflow ending with a PayPal
payment. In either case, the point is that nontrivial systems often need to contain
tests that are at the mercy of a third party and network connectivity with the outside
world. To further prove the point, I can reveal that this test timed out a few times
while I was trying it out.
Not all integrations will involve third-party systems beyond your control. Some
of the systems your application talks to will be other systems built in-house or third-
party software executed on premises. If a test touches code that invokes this kind of
external dependencies, they’ll have to be controlled somehow. Hence, the openness of
the protocol/API used for the integration will be of critical importance to the success
of such tests. If the protocol is open enough, the external system can be replaced by
something that the test can control. For instance, once I set out to emulate a physical
network switch in software to test code that provisioned it. In this case it was plain-
text over Telnet, so all I needed to do was write a server that responded to textual
commands.
When replacing an entire system with a test double, the wording becomes impor-
tant. It’s most likely going to be a fake in the terminology presented in the chapter on
test doubles, but it may equally well be a stub or a mock. If the interesting behavior is
confined to the tested component, the test double used to substitute the external sys-
tem will most likely be a stub. On the other hand, if it’s more important to verify how
the tested component interacts with the external system, the test double will obvi-
ously be implemented so that it records the interactions or behaves like a mock.
252 Chapter 18 Beyond Unit Testing
5. See the documentation of WebDriver’s WebDriverWait class to get a feeling for how
such waiting can be achieved (https://seleniumhq.github.io/selenium/docs/api/java/
org/openqa/selenium/support/ui/WebDriverWait.html).
Tests that Aren’t Unit Tests 253
imagine a test that tries to find a customer that doesn’t exist, or a test that
attempts to log in to the application using a blocked account. Such tests usu-
ally don’t have the word “stable” written all over them.
This next test exercises an entire online purchase workflow on a fictitious web
site that markets and sells a book about WebDriver testing. It relies solely on the out-
put of the user interface to determine whether it succeeds. It simulates a user’s jour-
ney through four web pages: a start page and three pages where the user selects the
number of books to buy, enters the shipping address, and finally the payment details.
Halfway throughout, the test checks that the price was computed correctly, and at the
end it looks for a confirmation message in the page and compares the presented ship-
ping address with the address details it provided.
[TestInitialize]
private IWebDriver webDriver;
addressDetailsPage.EnterZip(zip);
var paymentDetailsPage = addressDetailsPage.ClickPaymentButton();
paymentDetailsPage.EnterCardNumber("4417119669820331");
paymentDetailsPage.EnterCVV2("874");
paymentDetailsPage.SelectExpirationMonth(Month.January);
paymentDetailsPage.EnterExpirationYear("2020");
var confirmationPage = paymentDetailsPage.ClickPayButton();
Assert.IsTrue(confirmationPage.PaymentSuccessFul);
string expectedAddress = String.Format("{0}\n{1}\n{2}, {3} {4}",
name, streetAddress, city, state, zip);
Assert.AreEqual(expectedAddress, confirmationPage.Address);
}
[TestCleanup]
public void CleanUp()
{
webDriver.Quit();
}
a
The DriverFactory class that creates the driver is in Appendix B..
def setup() {
outputDirectory = new File(System.properties['java.io.tmpdir'],
"outgoing")
if (outputDirectory.exists()) {
FileSystemUtils.deleteRecursively(outputDirectory)
}
if (!outputDirectory.mkdir()) {
throw new IllegalStateException(
"Couldn't create output directory")
}
}
def ordersToExport = 2
def ordersToIgnore = 1
customerRepository.add(firstCustomer, secondCustomer,
thirdCustomer)
orderRepository.add(firstCustomersOrder, secondCustomersOrder,
ignoredOrder)
doing the actual maintenance of the suite should also agree that testing beyond unit
testing is valuable and must be allowed to take time.
So, in addition to understanding that a test infrastructure is required, a team
venturing into the fields of integration testing, system testing, workflow testing, or
end-to-end testing must also be prepared to tackle such tests’ quality attributes and
behavior, which will be different from that of unit tests.
The one and only master database, which nobody dares to touch, needs
to be broken down and its (re)creation automated so that instances of
it, running with a minimal set of reference data, can be started at will in
various test environments.
The database needs to be versioned and changes to it handled automatically
and consistently, so that deploying changes is painless and all environments
run against similar databases.
Server and container configuration need to be understood and standardized,
so that setting up new instances is easy.
Other parts of the infrastructure, like messaging middleware, load balancers,
or log servers, may need tuning and cloning, so that they, too, become
disposable and easy to spawn when needed.
The preceding activities usually result in an overhaul of the deployment
process and finally its automation.
Last but not least, the system may need some rewrites so that its
configurability improves and so that it can start in different environments
and on different infrastructures.
These are all activities that fall in the domain of continuous delivery and
DevOps, so I’ll leave in-depth treatment to other sources. However, do notice
that it’s the need for testing that pushes a team in that direction.
Characteristics of Tests that Aren’t Unit Tests 259
Complexity
The further away from low-level tests on individual program elements, the greater the
complexity. Higher-level tests contain more of everything. Often, they rely on a non-
trivial build that performs orchestration of various resources, and they may require
entire libraries to perform some specific aspects of their functionality. Selenium Web-
Driver, which I made use of in the fourth example, is one such library, and mastering
it fully is a science in itself. So is setting up test data by repopulating databases and
constructing test-specific entity graphs or stubbing out entire systems, to name some
prevalent drivers of complexity. There are more.
This inherent complexity also affects the composition and competency profile of
the development team. To cope with tests that alter and rely on the environment and
infrastructure, the developers must be no strangers to command-line magic, database
administration, virtualization, and server/container configuration. This a rather rel-
evant factor when recruiting new team members.
Also, given the many moving parts of complex tests, it’s of vital importance that
they be well written and that the test suite has an architecture that supports it. A hap-
hazard, shantytown test suite may easily devour much-needed development time or
even topple the project. Therefore, working with high-level tests, or at least setting up
the structure of the test suite, is best left to the more senior members of team.
Stability
Tests that are more complex than unit tests tend to get much less stable. In the vari-
ous preceding examples, we’ve seen that they’re affected by things like the file system,
server and application state, database contents, and network connectivity. In other
words, they come with environmental preconditions. There are two generic ways to
fulfill such preconditions: code for them or nuke and pave. These strategies aren’t
mutually exclusive and it’s quite context dependent when to use either or both.
Coding for stability means that the tests contain code that checks whether the
environment is sane. Such checks may include examining the file system, inspecting
the data in the database, or verifying that a server is up and starting it if it isn’t. Such
actions are typically performed in the test initializer methods. The most fundamental
checks, such as verifying that a directory exists or that a database connection is avail-
able, don’t need to be performed for every test, so putting them in initializers that run
once per test class (or module) or even less frequently is a good idea, as it also has a
positive impact on performance.
Nuking and paving comes from a different angle. Instead of putting effort into veri-
fying the environment, we reach a known state by resetting it; servers are restarted,
databases emptied and loaded with known data, directories removed and re-created.
The context sets the limits for what and how to reset. This is often where provision-
ing and virtualization come in. If sufficiently many or sufficiently complex resources
260 Chapter 18 Beyond Unit Testing
Error Localization
The more elaborate the test, the harder it is to achieve good error localization. The
reason is the decrease in observability, which is more or less unavoidable for tests of
increasing complexity. To be precise, the observability may still be quite adequate, but
the program logic needed to make sense of what’s actually happening may not. More
things can go wrong in a large heterogeneous application stack, and a computer may
have a hard time deciding what did. For example, let’s think of a few reasons for why
a test of a web application may fail:
Most of these error conditions will make the web browser output an HTTP error
code, some kind of error message, or more frequently than we’d care to admit, a stack
trace. This gives a human user with some knowledge of networking and web applica-
tions a fighting chance to make an educated guess about the cause of the problem. An
automated test, on the other hand, would have a very hard time truly understanding
what went wrong. It would need to interpret HTTP codes, parse error messages (or
even worse, stack traces), and cope with time-outs and dropped connections to arrive
at some sort of verdict.
Building intelligent automated error interpretation is something I’d really advise
against. Sure, you can program arbitrarily complex diagnostics of the environment’s
and application’s state and health, but should you? No! Any test that does this will be
bloated with extra code, and if you push this to your test infrastructure, it, too, will
become very complex. Tests with many moving parts will fail for reasons that may
be hard to understand, at least programmatically. Instead, aim for the second best
thing: take your time inspecting what went wrong in the high-level test, and write a
lower-level test, preferably a unit test that catches the bug. Conversely, if the problem
lies in the environment, investing some time in improving its stability by means of a
better setup, virtualization, or a better build process will generate higher pay-off than
complex logic in individual tests.
Performance
Tests outside the domain of unit tests tend to pay the price in performance. Integra-
tion tests against small databases on fast networks may run relatively fast, whereas
tests that run through the user interface may become painfully slow, especially if they
start with lengthy data setup and then get caught doing round trips through all layers
of the system. Tests working on larger batches of data will perform accordingly.
These differences in execution speed prompt us to divide tests into suites and
hierarchies. There’s no point in running slow tests unless the faster ones succeed first.
Slower tests also run the risk of not being executed frequently enough, so keeping
down the execution time of both the individual tests and the whole test suite will
require deliberate effort: pruning redundant tests, making slow tests run faster (by
reducing their footprint on the system), or by parallelizing the suite.
The following facets of performance don’t affect unit tests (apart from CPU per-
formance), but they need to be taken into account when working with more complex
tests and larger test suites:
Environmental Dependence
The bigger a chunk of functionality a test exercises, the greater the chance that this
functionality will rely on components that in turn rely on the environment. Although
you can always strive to build platform-agnostic and highly configurable software, in
262 Chapter 18 Beyond Unit Testing
truth, the average application usually makes assumptions about its execution envi-
ronment. What kind of database does it use? Is it a relational database, a document
database, or a key-value store? Does it rely on some vendor-specific functionality?
What services does the application’s server or container provide? Is some kind of
messaging technology involved, and how? What external resources does the applica-
tion access, and where are they located?
Even if you deploy the application to the cloud, you’ll still make assumptions
based on the quirks and capabilities of the particular cloud’s stack, unless the applica-
tion is very small or trivial.
All of this has a bearing on the tests. The more complex the execution environ-
ment, the more effort has to be put into making such an environment easily available
for testing. Then there’s the cost. It’s cheap to have a CI server running a couple of
agents capable of executing just unit tests; it’s a matter of virtualizing a simple setup.
At the other extreme are systems that contain a mainframe, a licensed database, and
a full stack with various integrations in between. End-to-end testing in such an envi-
ronment will be both complicated and costly.
Environmental dependence has direct impact on the breakdown of a team’s
work. Although a seasoned developer will crank out unit tests in tandem with pro-
duction code without even thinking about it, addressing the aforementioned issues
takes time, deliberate actions, and an understanding that writing tests that are more
complex than unit tests introduces new tasks and responsibilities.
Target Audience
Whereas unit tests live in symbiosis with the source code and are the developers’ pets,
tests that are further away from the code have the potential of attracting a broader
target audience. System and end-to-end tests (and integration tests to some extent)
verify behavior that nontechnical stakeholders understand. Stakeholders who care
about features and progress may feel very reassured by a human-readable suite of
tests that exercise functionality they can grasp. After all, who wouldn’t feel at least
somewhat secure if it were possible to determine whether the system supports busi-
ness rules like “when buying at least three books, the shopper is given a 20 percent
discount in the next campaign” or “direct bank payments with incorrect check digits
are sent to an error queue for manual inspection” at the click of a button? To get there,
you have two options:
You start by writing tests for important functionality that the stakeholders
care about and execute them using a BDD framework, which will produce
documentation8 readable by anybody within the organization (provided that
some effort has been put into authoring understandable tests).
Using the second approach is less collaborative and doesn’t give many of the ben-
efits of working in a BDD-like manner, but in certain settings it may be a good way of
selling the advantages of automated acceptance testing to a broader audience.
Either way, the key is to present the tests and their results in such a way that
everybody in the organization can comprehend them. If managers, the CTO, and, in
a perfect world, the CEO understands the advantages of developers automating veri-
fication of critical functionality, you’re more likely to get the support you need.
Test Independence
More complex tests should be independent of their surroundings and other tests, just
like unit tests. This rule of thumb comes with some caveats. Tests that require the sys-
tem or parts thereof to be available while they’re running will often be dependent on
the build that runs them. CI servers, with their plugins and scripts, are better suited
for orchestrating resources like databases, queues, or any other kind of middleware or
servers than home-grown utility classes in the test codebase.
Although this approach saves the tests from tinkering with peripheral, low-level
dependencies, it introduces certain coupling between the tests and the context in
which they run. In some of the examples, I avoided this to a degree by using Spring
Boot, but for older systems this won’t be an option.
Then there’s the issue of temporal coupling between tests. For tests that revolve
around some data’s life cycle, it may feel tempting to build a sequenced test suite:
I strongly advise against this. This approach makes the build complex and brittle,
and kills test isolation and independence. Then again, there are situations in which
this may be the only working approach. We had to do this once on a project where we
didn’t own the test database. We could neither empty it (because other teams relied
on it) nor insert tuples when we needed to, so this was the only way. On the whole, the
approach worked, but we paid the price in complexity.
Setup
A higher-level test’s setup is quite different from a unit test’s. It’s usually lengthier,
more elaborate, and may poke in several application layers. The exact steps will obvi-
ously be different for a business application that requires a lot of state in persistent
storage and a game that needs an interesting environment to verify some aspect of its
mechanics.
As said in the section on test independence, part of the setup may be performed
by the build that runs the tests, and it will hopefully ensure that the right environ-
ment is available when the test executes. From there, it’s the test’s responsibility to
produce the state it requires. Here are some tips.
By running tests with empty databases (or files, or queues), we gain certain
advantages. One is speed. Empty or next-to-empty things are fast. Adding a record
to an empty table or file will most likely not trigger indexing, rebalancing, garbage
collection, or the like. Another advantage is simplicity. If the test needs to fetch some-
thing from a table or document that has only one record, it doesn’t even have to know
how to find it; it just has to fetch that single record. I made use of this in the second
example, where I just counted the number of tuples. A third advantage is that the data
footprint is easier to debug. Not that we want that, but should the imperfections of
reality force us to check the contents of a database or file during a debugging session,
it’s going to be a much more pleasant experience if there’s only one tuple to examine.
The service may not be readily available to the test, which may be running
at another level of abstraction or lack access to the infrastructure needed to
invoke the service
If the setup necessitates the use of many different services, it quickly becomes
cumbersome and awkward
The service is unable to create entities with certain properties
10. Although some people may consider them “static” entity data and handle them like reference data.
11. It’s irrelevant whether this is done at the persistence abstraction level, UI level, or somewhere
between.
266 Chapter 18 Beyond Unit Testing
in Chapter 9, “Dependencies,” with the constraint that the created object is an entity
that can be persisted. In fact, this is the technique I prefer to reusing existing services.
As always, there are trade-offs to be made. The obvious disadvantage of utilities
for creating data is that they add extra code. Depending on whether they reuse the
existing entity model or not, they may get coupled to the database. Suddenly chang-
ing something in the database requires an update to the entity model and the utilities.
In addition, independent implementations may create invalid data. They may forget
to apply a business rule or piece of validation logic, thus bringing to life entities that
would never have been created by the system. Finally, builders and factories may get
quite complicated. And yes, they need unit testing . . .
On the plus side, they make it easy to create arbitrary variations of data. Entities
produced by test factories or builders may reflect state that would be hard to reach.
For example, consider a builder that’s able to create a customer whose password has
expired. Such a customer may not even be possible to create using the application’s
existing services (and that’s a good thing), because expiration is most likely a result of
actual time passing. In this case, a snippet like this would save the day:
var customerForPasswordUpdate
= customerBuilder.withExpiredCredentials().build();
They can also contain logic that allows setting up very complex state. Finally, I’d
say that a good implementation of data helpers will make the test very readable, ver-
bose, and explicit.
Verification
Whereas unit tests should strive to fail for a single reason, more complex tests may be
a bit more forgiving in that regard. Because they take longer to execute, a consider-
able amount of time may be saved if they’re allowed to check several different things
per test. The examples at the beginning of this chapter illustrated this in an almost
provocative manner.
Personally, I think it’s perfectly fine that more complex tests contain more asser-
tions and that these assertions may operate on different layers or components, as long
as they’re related to the same concept. If an order confirmation service returns a sta-
tus code, updates something in the database, and sends an e-mail, checking all three
may be the right thing to do, especially if no other tests do it. Likewise, if we test a
sequence of operations, adding a few guard assertions and intermediate checks here
and there does more good than harm. That said, authors of tests that have a lot going
on must always be mindful of the balance between error localization, test readability/
maintainability, and the execution time of the test suite. Just because we can touch
half of our system’s features with one gigantic test doesn’t mean that we should.
Deciding on a Developer Testing Strategy 267
Figure 18.1 To the left: the classic test automation pyramid. To the right: one of many
adaptations.
Because a pyramid’s base is much larger than its top, the model implies that there
are many more unit tests than UI tests, the motivation being that the latter tend to
be brittle, expensive to write, and time consuming. The number of service tests lies
somewhere between those two. This model may help a team visualize the test types it
uses. An ambitious team that performs both integration testing and testing at the API
level and has some smoke tests that go through the user interface may depict this as a
four-tier pyramid (the bottom tier being the unit tests).
I’ve never seen anybody put any hard figures on the pyramid’s tiers in real life,
but obviously there will be a ratio between the various types of tests. My experience
is that the system’s age is the biggest influencing factor behind this ratio. On a green-
field project, a team with a testing strategy along the lines of “all new code is devel-
oped test-first and we use acceptance test-driven development” will produce tests
with a ratio that corresponds closely to what the automation test pyramid suggests.
Such a team will obviously have many unit tests, a fair number of tests at the middle
tier—such tests will be driven by the executable specifications—and a smaller num-
ber of, or maybe even no, tests that work through the user interface.
Conversely, a team that sets out to rejuvenate a convoluted intertwined legacy sys-
tem may not even be able to visualize its tests using a pyramid. (Or using an inverted
one perhaps.) For example, testing legacy systems where no attention has been paid to
testability with unit tests ranges from unfeasible to unpractical and expensive. Retro-
actively adding unit tests takes a lot of time and often requires major refactoring that
may break untested functionality, while providing little benefits within the nearest
time frame. Instead, the team may be better off securing the critical functionality
via tests that operate through the user interface (while learning how to make such
tests stable, easy to write, and relatively fast) before thinking about how to address the
issue of limited unit test coverage and what types of service-level tests would make
sense. Teams in that position tend to adopt the stance: “develop new code with unit
tests and refactor/redesign the old code when you’re touching it to modify it.”
These are some of the bigger issues the developer testing strategy needs to
address, but there will be smaller ones too, which still need to be handled to avoid
Summary 269
Which tests give bang for the buck and which don’t?
What types of tests are we running and how do they overlap?
What types of tests are we avoiding (and why)?
How large should a test preferably be? (Size depends on the level of abstrac-
tion too.)
How many layers is a single test allowed to touch?
Do we optimize for speed of execution or test simplicity?
How do we handle test data and its setup?
How do we approach integrations with external systems?
What testing frameworks and libraries do we use?
What trade-offs are we willing to make in the spirit of working with
legacy code?
These are but examples, and I’m sure that your team can come up with many
more questions of this sort. Answering them will help you define the context and
boundaries for your tests, and no doubt a developer testing strategy will emerge.
Make it available on an information radiator, and revisit and revise at intervals or
when something interesting happens to the test suite or the system.
Summary
Tests that aren’t unit tests—more complex tests—include integration tests, system
tests, and end-to-end tests. In Google’s simplified terminology this would be Medium
and Large tests.
For typical business applications, these are fairly common types of complex tests:
In this final chapter, I gather advice and pointers about what to actually test in a com-
pact format. Bits and pieces of this information are scattered throughout the book,
but they usually appear in their own contexts, where other things may be the key
focus. Here’s the big picture. Hopefully, this material will help you to cherry-pick and
prioritize your tests, because there’s always time pressure on real projects, and “test-
ing everything” is practically impossible.
High-level Considerations
There are many decisions a team and the individual developers need to make when
choosing what to focus on when writing tests. This section should provide some fuel for
discussions about where to start and what to do, as well as some ideas about test design.
Test Effectiveness
Depending on the state of the system, a certain type of test may be more effective
than another.
End-to-end or system tests (possibly integration tests) may prove more effec-
tive, that is, provide coverage of critical functionality and catch regressions
sooner, when dealing with older systems with convoluted code that’s hard or
time consuming to unit-test or that lacks any distinguishable components.
271
272 Chapter 19 Test Ideas and Heuristics
Test Recipe
A test recipe1 helps you to pick what to test, and is especially helpful when working
with unit tests (because they contain the highest amount of detail). The three test
recipes in this section are phrased differently and maybe one of them will tickle your
fancy in particular. If so, I encourage you to pursue the original source to get an accu-
rate and exhaustive description of the recipe in question.
Conditionals
Loops
1. I’ve borrowed this term from Stephen Vance’s book Quality Code: Software Testing Principles,
Practices, and Patterns.
High-level Considerations 273
Operations
Polymorphism
Unit tests (and possibly integration tests) should cover all low-level mechan-
ics, like different variations of input, boundary values, data-driven testing,
input validation, and exhaustive branch coverage. Such tests may use techni-
cal terminology in the test code, but they should still attempt to test behavior
that’s meaningful from a user’s point of view.
System or end-to-end tests should exercise the bigger picture and make sure
that the system works as a whole. Such tests shouldn’t concern themselves
with details and variations. They should span scenarios or use cases and use
the language of the business.
Archetype
What format does the test follow, and how many cases does it cover?
Source of Truth
How does the test know that the result is correct?
Set—There are multiple correct values, and they correspond to a set of finite size.
Predicate—Whether the value is correct can be determined by a function
that says yes or no.
Cross-check—An alternative implementation can be used to determine
whether the value is correct.
Inverse function—Applying an inverse function to the result produced by the
tested code produces the input.
Low-level Considerations
This section contains things to be mindful of when working with some common ele-
ments of a program. The list is by no means exhaustive, but if you remember these
points, your tests should cover a good majority of cases.
Zero-one-many
Make sure that the tests cover the following:
Nulls
Stick null/nil/undef wherever you can if the type/array/collection permits it to
see what happens.
Ranges
For a range m–n, check the behavior at the following:
m−1
m
n
n+1
Low-level Considerations 275
Collections
Consider the following:
Empty
With one element
With multiple elements
Containing duplicates
Alternative ordering of elements
Numbers
Keep in mind the following:
Zero
Negative
Overflow of primitive types
Floating point precision
Other representations (like hexadecimal, octal, or scientific)
Commas, periods, and spaces when represented as strings for parsing
Strings
Don’t let this surprise you:
One space
Several spaces
Special characters like \n, \r, \t, etc.
Heading/trailing whitespace or special characters
HTML entities
Non-ASCII characters
Encoding
Overflow of fixed-size string buffers
Dates
Be mindful of the following:
Different formats
Number of days in each month
Leap years
Time zones
Daylight saving time
Accuracy (does a date have a time component?)
Timestamp formats
Summary
When considering what test to implement next and how, think about the following:
Common data types and abstractions all come with their specific gotchas that
need to be addressed when authoring tests.
Appendix A
Tools and Libraries
277
278 Appendix A Tools and Libraries
junit-quickcheck, https://github.com/pholser/junit-quickcheck
Mocha, https://mochajs.org/
Mockito, https://github.com/mockito/mockito
Moq, https://github.com/Moq/moq4
netDumbster, http://netdumbster.codeplex.com/
NModel, https://nmodel.codeplex.com/
NUnit, http://nunit.org/
PowerMock, https://github.com/jayway/powermock
Puppet, https://puppet.com/
QuickCheck, https://bitbucket.org/blob79/quickcheck/
RSpec, http://rspec.info/
Sikuli, http://www.sikuli.org/
Spec#, http://research.microsoft.com/en-us/projects/specsharp/
Specflow, http://www.specflow.org/
Spock Framework, https://github.com/spockframework/spock
Spring Boot, http://projects.spring.io/spring-boot/
Timecop, https://github.com/travisjeffery/timecop
Vagrant, https://www.vagrantup.com/
WireMock, http://wiremock.org/
xUnit.net, https://github.com/xunit/xunit
Appendix B
Source Code
Test Doubles
Listing B.1 PremiumPurchaseMatcher: A custom matcher that matches specific
business rules.
import org.hamcrest.Description;
import org.hamcrest.TypeSafeMatcher;
@Override
public boolean matchesSafely(Purchase purchase) {
return purchase.getPrice() > 1000 && purchase.getItemCount() < 5;
}
@Override
public void describeTo(Description desc) {
desc.appendText("A purchase with the " +
"total price > 1000 and fewer than 5 items");
}
}
@RunWith(Parameterized.class)
public class PremiumAgeIntervalsTest {
@Parameter(value = 0)
public double expectedPremiumFactor;
@Parameter(value = 1)
public int age;
@Parameter(value = 2)
279
280 Appendix B Source Code
@Parameters(name = "Case {index}: Expected {0} for {1} year old {2}s")
public static Collection<Object[]> data() {
return Arrays.asList(new Object[][]{
{1.75, 18, Gender.MALE},
{1.75, 23, Gender.MALE},
{1.0, 24, Gender.MALE},
{1.0, 59, Gender.MALE},
{1.35, 60, Gender.MALE},
{1.575, 18, Gender.FEMALE},
{1.575, 23, Gender.FEMALE},
{0.9, 24, Gender.FEMALE},
{0.9, 59, Gender.FEMALE},
{1.215, 60, Gender.FEMALE}}
);
}
@Test
public void verifyPremiumFactor() {
assertEquals(expectedPremiumFactor, new PremiumRuleEngine()
.getPremiumFactor(age, gender), 0.0);
}
}
Listing B.3 Theory test with custom ParameterSupplier. This test uses both
a user-defined parameter supplier and @TestedOn (which is the only supplier that
comes with JUnit).
import org.junit.experimental.theories.Theories;
import org.junit.experimental.theories.Theory;
import org.junit.experimental.theories.suppliers.TestedOn;
import org.junit.runner.RunWith;
import util.supplier.AllGenders;
@RunWith(Theories.class)
public class PremiumFactorsWithinRangeTestUsingTestedOn {
@Theory
public void premiumFactorsAreBetween0_5and2_0(
@AllGenders Gender gender,
Data-driven and Combinatorial Testing 281
assumeThat(age, greaterThanOrEqualTo(18));
assumeThat(age, lessThanOrEqualTo(100));
assumeThat(gender, isOneOf(Gender.FEMALE, Gender.MALE));
double premiumFactor
= new PremiumRuleEngine().getPremiumFactor(age, gender);
assertThat(premiumFactor,
is(both(greaterThan(0.5)).and(lessThan(2.0))));
}
}
import java.util.Arrays;
import java.util.List;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
@Retention(RetentionPolicy.RUNTIME)
282 Appendix B Source Code
@ParametersSuppliedBy(GenderSupplier.class)
public @interface AllGenders {
}
Test-driven Development
JUnit Version
Listing B.6 The nine tests from the sample TDD session.
@Test
void searchingWhenNoDocumentsAreIndexedGivesNothing() {
assert [] == searchEngine.find("fox")
}
@Test
void searchingForADocumentsOnlyWordGivesThatDocumentsId() {
searchEngine.addToIndex(1, "fox")
assert [1] == searchEngine.find("fox")
}
@Test
void allIndexedDocumentsAreSearched () {
searchEngine.addToIndex(1, "fox")
searchEngine.addToIndex(2, "dog")
assert [2] == searchEngine.find("dog")
}
@Test
void documentsMayContainMoreThanOneWord() {
searchEngine.addToIndex(1, "the quick brown fox")
assert [1] == searchEngine.find("brown")
assert [1] == searchEngine.find("fox")
}
@Test
void
searchingForAWordThatMatchesTwoDocumentsGivesBothDocumentsIds() {
searchEngine.addToIndex(1, "fox")
searchEngine.addToIndex(2, "fox")
assert [1, 2] == searchEngine.find("fox").sort()
}
@Test
void multipleMatchesInADocumentProduceOneMatch () {
Test-driven Development 283
searchEngine.addToIndex(1,
"the quick brown fox jumped over the lazy dog")
assert [1] == searchEngine.find("the")
}
@Test
void documentsAreSortedByWordFrequency() {
searchEngine.addToIndex(1, "fox fox dog")
searchEngine.addToIndex(2, "fox fox fox")
searchEngine.addToIndex(3, "dog fox dog")
assert [2, 1, 3] == searchEngine.find("fox")
assert [3, 1] == searchEngine.find("dog")
}
@Test
void caseDoesNotMatter() {
searchEngine.addToIndex(1, "FOX fox FoX");
searchEngine.addToIndex(2, "foX FOx");
searchEngine.addToIndex(3, "FoX");
assert [1, 2, 3] == searchEngine.find("fox")
assert [1, 2, 3] == searchEngine.find("FOX")
}
@Test
void punctuationMarksAreIgnored() {
searchEngine.addToIndex(1, "quick, quick: quick.");
searchEngine.addToIndex(2, "(brown) [brown] \"brown\" 'brown'");
searchEngine.addToIndex(3, "fox; -fox fox? fox!");
private resortIndexOnWordFrequency() {
index.each { k, wfs -> wfs.sort
{ wf1, wf2 -> wf2.count <=> wf1.count } }
}
WordFrequency(int documentId) {
this.documentId = documentId
}
}
Spock Version
Listing B.9 The nine tests from the sample TDD session, using Spock this time.
def "searching when no documents are indexed gives nothing"() {
expect:
searchEngine.find("fox") == []
}
def "searching for a document's only word gives that document's id"() {
Test-driven Development 285
setup:
searchEngine.addToIndex(1, "fox")
expect:
searchEngine.find("fox") == [1]
}
expect:
searchEngine.find("dog") == [2]
}
expect:
searchEngine.find(word) == [documentId]
def "searching for a word that matches two documents gives both documents' ids"() {
setup:
searchEngine.addToIndex(1, "fox")
searchEngine.addToIndex(2, "fox")
expect:
searchEngine.find("fox").sort() == [1, 2]
}
expect:
searchEngine.find("the") == [1]
286 Appendix B Source Code
expect:
searchEngine.find("fox") == [2, 1, 3]
searchEngine.find("dog") == [3, 1]
}
expect:
searchEngine.find("fox") == [1, 2, 3]
searchEngine.find("FOX") == [1, 2, 3]
}
expect:
searchEngine.find("quick") == [1]
searchEngine.find("brown") == [2]
searchEngine.find("fox") == [3]
}
Beyond Unit Testing 287
Adzic, Gojko. 2011. Specification by Example: How Successful Teams Deliver the Right
Software. New York, NY: Manning Publications.
Adzic, Gojko. 2013. “Let’s Break the Agile Testing Quadrants.” http://gojko.net/
2013/10/21/lets-break-the-agile-testing-quadrants/.
Alspaugh, Thomas A. 2015. “Kinds of Software Quality (“Ilities”).” http://www
.thomasalspaugh.org/pub/fnd/ility.html.
Bach, James. 2013. “Testing and Checking Refined.” http://www.satisfice.com/blog/
archives/856.
Bach, James. 2015. “Heuristics of Software Testability.” http://www.satisfice.com/
tools/testable.pdf.
Bath, Graham and McKay, Judy. 2008. The Software Test Engineer’s Handbook: A
Study Guide for the ISTQB Test Analyst and Technical Analyst Advanced Level Certifi-
cates. Santa Barbara, CA: Rocky Nook.
Beck, Kent. 2002. Test-driven Development: By Example. Boston, MA: Addison-Wesley.
Beck, Kent and Andres, Cynthia. 2004. Extreme Programming Explained: Embrace
Change, 2nd ed. Boston, MA: Addison-Wesley.
Bolton, Michael. 2007. “Pairwise Testing (version 1.5, November, 2007).” http://www
.developsense.com/pairwiseTesting.html.
Bolton, Michael. 2014. “The REAL Agile Testing Quadrants (As We Believe They
Should Have Always Been).” http://www.slideshare.net/EuroSTARConference/
306284037-2014-06dublinrst-agiletesting.
Borysowich, Craig. 2007. “Design Principles: Fan-In vs Fan-Out.“ http://it.toolbox
.com/blogs/enterprise-solutions/design-principles-fanin-vs-fanout-16088.
Cimperman, Bob. 2006. UAT Defined: A Guide to Practical User Acceptance Testing.
New York, NY: Addison-Wesley.
Claessen, Koen and Hughes, John. 2016. “QuickCheck - Automatic Specification-
based Testing.” http://www.cse.chalmers.se/~rjmh/QuickCheck/.
289
290 Bibliography
Cohn, Mike. 2009. Succeeding with Agile: Software Development Using Scrum. Upper
Saddle River, NJ: Addison-Wesley.
Duvall, Paul M., Matyas, Steve, and Glover, Andrew. 2007. Continuous Integration:
Improving Software Quality and Reducing Risk. Upper Saddle River, NJ:
Addison-Wesley.
Evans, Eric. 2003. Domain-Driven Design: Tackling Complexity in the Heart of Software.
Boston, MA: Addison-Wesley.
Faber, Szczepan. 2008. “Should I Worry about the Unexpected?” http://monkeyisland
.pl/2008/07/12/should-i-worry-about-the-unexpected/.
Feathers, Michael C. 2004. Working Effectively with Legacy Code. Upper Saddle
River, NJ: Prentice Hall.
Foote, Brian and Yoder, Joseph. 1999. “Big Ball of Mud.” http://www.laputan.org/
mud/.
Fowler, Martin. 1999. Refactoring: Improving the Design of Existing Code. Boston,
MA: Addison-Wesley.
Fowler, Martin. 2004. “JUnit New Instance.” http://martinfowler.com/bliki/
JunitNewInstance.html.
Fowler, Martin. 2005. “Command Query Separation.” http://martinfowler.com/bliki/
CommandQuerySeparation.html.
Fowler, Martin. 2007. “Mocks Aren’t Stubs.” http://martinfowler.com/articles/
mocksArentStubs.html.
Fowler, Martin, 2014. “Unit Test.” http://martinfowler.com/bliki/UnitTest.html.
Freeman, Steve and Pryce, Nat. 2009. Growing Object-Oriented Software, Guided by
Tests. Upper Saddle River, NJ: Addison-Wesley.
Gamma, Erich, Helm, Richard, Johnson, Ralph, and Vlissides, John. 1994. Design
Patterns: Elements of Reusable Object-Oriented Software. Upper Saddle River, NJ:
Addison-Wesley.
Gregory, Janet and Crispin, Lisa. 2008. Agile Testing: A Practical Guide for Testers
and Agile Teams. Upper Saddle River, NJ: Addison-Wesley.
Gregory, Janet and Crispin, Lisa. 2014. More Agile Testing: Learning Journeys for the
Whole Team. Upper Saddle River, NJ: Addison-Wesley.
Bibliography 291
Hendrickson, Elisabeth, Lyndsay, James, and Emery, Dale. 2006. “Test Heuristics
Cheat Sheet, Data Type Attacks & Web Tests.” http://testobsessed.com/wp-content/
uploads/2011/04/testheuristicscheatsheetv1.pdf.
Humble, Jez and Farley, David. 2010. Continuous Delivery: Reliable Software
Releases through Build, Test, and Deployment Automation, Upper Saddle River, NJ:
Addison-Wesley.
Hunt, Andrew and Thomas, David. 1999. Pragmatic Programmer: From Journeyman
to Master. Reading, PA: Addison-Wesley.
Hunt, Andrew and Thomas, David. 2003. Pragmatic Unit Testing: In Java with JUnit.
Raleigh, NC: The Pragmatic Programmers.
International Software Qualifications Board (ISTQB). 2011. “Foundation Level Syl-
labus.” http://www.istqb.org/downloads/finish/16/15.html.
Java Community Process (JCP). 2006. “JSR 305: Annotations for Software Defect
Detection.” https://jcp.org/en/jsr/detail?id=305.
JetBrains. 2016. “Code Quality Analysis - Code Annotations.” https://www.jetbrains
.com/resharper/features/code_analysis.html#Annotated_Framework.
Kaner, Cem, Bach, James, and Pettichord, Brat. 2001. Lessons Learned in Software
Testing: A Context-Driven Approach. New York, NY: Wiley.
Kuhn, D. Richard, Kacker, Ranghu N., and Lei, Yu. 2010. “Practical Combinatorial
Testing” NIST Special Publication 800-142. http://nvlpubs.nist.gov/nistpubs/Legacy/
SP/nistspecialpublication800-142.pdf.
Kumar, Ajitesh, 2014. “7 Popular Unit Test Naming Conventions.” https://dzone.com/
articles/7-popular-unit-test-naming.
Langr, Jeff, Hunt, Andy, and Thomas, Dave. 2015. Pragmatic Unit Testing in Java 8
with JUnit, Dallas: The Pragmatic Programmers.
Marick, Brian. 2003. “My Agile Testing Project.” http://www.exampler.com/old-blog/
2003/08/21/#agile-testing-project-1.
Martin, Robert C. 2002. Agile Software Development: Principles, Patterns, and
Practices. Upper Saddle River, NJ: Prentice Hall.
Martin, Robert C. 2008. Clean Code: A Handbook of Agile Software Craftsmanship.
Upper Saddle River, NJ: Prentice Hall.
Martin, Robert C. 2010. “The Transformation Priority Premise. “ http://
blog.8thlight.com/uncle-bob/2013/05/27/TheTransformationPriorityPremise.html.
292 Bibliography
Martin, Robert C. 2011. The Clean Coder: A Code of Conduct for Professional
Programmers. Upper Saddle River, NJ: Prentice Hall.
Martin, Robert C. 2014. “The Little Mocker.” http://blog.8thlight.com/uncle-bob/
2014/05/14/TheLittleMocker.html.
Meszaros, Gerard. 2007. XUnit Test Patterns: Refactoring Test Code. Upper Saddle
River, NJ: Addison-Wesley.
Meszaros, Gerard. 2011. XUnit Test Patterns, http://xunitpatterns.com.
Meyer, Bertrand. 1997. Object-Oriented Software Construction, 2nd ed. New York,
NY: Prentice Hall.
Microsoft Corporation. 2013. “Code Contracts User Manual (August 14, 2013).”
http://research.microsoft.com/en-us/projects/contracts/userdoc.pdf.
Microsoft. 2016a. “Isolating Code Under Test with Microsoft Fakes.” http://msdn
.microsoft.com/en-us/library/hh549175.aspx.
Microsoft. 2016b. “Refactoring into Pure Functions.” http://msdn.microsoft.com/
en-us/library/bb669139.aspx.
North, Dan. 2006. “Introducing BDD.” http://dannorth.net/introducing-bdd.
Oracle, 2013, “The Java Language Specification: Java SE 7 Edition - section 14.10.”
http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.10.
Osherove, Roy. 2005. “Naming Standards for Unit Tests.” http://osherove.com/
blog/2005/4/3/naming-standards-for-unit-tests.html.
Osherove, Roy. 2009. The Art of Unit Testing: With Examples in .NET. Greenwich,
CT: Manning Publications.
OWASP, 2013. “OWASP Top 10—2013: The Ten Most Critical Web Application
Security Risks.” http://owasptop10.googlecode.com/files/OWASP%20Top%2010%20
-%202013.pdf.
Palermo, Jeff. 2006. “Guidelines for Test-Driven Development.” http://msdn.microsoft
.com/en-us/library/aa730844(v=vs.80).aspx.
Poppendieck, Mary and Poppendieck, Tom. 2006. Implementing Lean Software
Development: From Concept to Cash. Upper Saddle River, NJ: Addison-Wesley.
Pugh, Ken. 2011. Lean-Agile Acceptance Test-Driven Development: Better Software
Through Collaboration. Upper Saddle River, NJ: Addison-Wesley.
RiSE (Microsoft). 2015. “Code Contracts for .NET.” http://visualstudiogallery.msdn
.microsoft.com/1ec7db13-3363-46c9-851f-1ce455f66970.
Bibliography 293
Ritchie, Stephen D. 2011. Pro .Net Best Practices. Berkeley, CA: Apress.
Saff, David and Boshernitsan, Marat. 2006. “The Practice of Theories: Adding
“For-all” Statements to “There-Exists” Tests.” http://shareandenjoy.saff.net/tdd-
specifications.pdf.
Skeet, Jon. 2010. “Code Contracts in C#.” http://www.infoq.com/articles/
code-contracts-csharp.
Stallings, William and Brown, Lawrence. 2007. Computer Security - Principles and
Practice. Upper Saddle River, NJ: Prentice Hall.
Stewart, Simon. 2010. “Test Sizes.” http://googletesting.blogspot.se/2010/12/test-sizes
.html.
Sutherland, Jeff and Schwaber, Ken. 2013. “The Scrum Guide (July 2013).” http://
www.scrumguides.org.
Tarnowski, Alexander. 2010. “Why Must Test Code Be Better than Production
Code.” Agile Record 4:24–25.
Vance, Stephen. 2013. Quality Code: Software Testing Principles, Practices, and
Patterns. Upper Saddle River, NJ: Addison-Wesley.
Weinberg, Gerald M. 1998. The Psychology of Computer Programming. New York,
NY: Dorset House.
Woodward, Martin R. and Al-Khanjari, Zuhoor A. 2000. “Testability, Fault Size and
the Domain-to-Range Ratio: An Eternal Triangle.” ACM SIGSOFT Software Engi-
neering Notes 25(5):168–172.
This page intentionally left blank
Index
295
296 Index
Assertions (continued) B
methods, 89–90 Behavior
one per test, 90 benefits of testable software, 39
overview of, 89 in characterization testing, 34–35
removing need for comments, 238–239 defining component, 27
specialized, 94–96 mock objects testing. See Mock objects
in state-based tests, 173–174 naming unit test for expected, 87
test-driving search engine, 196–197 unit tests specifying tested code, 81
verification, 174–176
verbosity of, 92–93
Behavior-driven development (BDD)
verifying in more complex tests, 266
frameworks
AssertThat method
double-loop TDD as, 222
data-driven and combinatorial testing, matchers, 103–105
280–281 more fluent syntax of, 104
defined, 91 naming tests, 103
fluent assertions, 97 overview of, 102
mock objects, 167–168 test structure, 102–103
specialized assertions, 96 testing style, 15–17
spies, 171 unit testing in some languages with,
tests enclosed in transactions, 246 103–106
Assumptions, theory testing, 140–141 The Big Ball of Mud, testable software vs.,
Asynchronicity, UI tests failing, 252 37–39
Black box testing
ATDD (Acceptance test-driven development),
implementing system tests, 26
15–17
integration test vs., 25
Attacks, CIA security triad for resilience to,
overview of, 22–23
29 when singularity has been neglected,
Audit, security testing as, 28 52–53
Authentication, in-memory database, Block copy and paste, 229–230
152–153 Blocks, Spock framework, 90
Author bias, critique-based testing and, 11 Blueprint, construction phase in traditional
Automation testing, 12
acceptance test, 17 Boundary value testing
agile testing, 14 defined, 116
of checks, 9–10 edge cases/gotchas for some data types,
deployment, 50 111–113
specification-based technique, 110
providing infrastructure for, 5
Broken window syndrome, in duplication,
smoke test, 34
225, 233
as support testing, 11
Brown-field business applications, testing,
unit test, 82 258
Availability Buffer overflow
of CI servers, 157 developer understanding of, 5
in CIA security triad model, 29 from lenient/missing parameter
enforcing contracts, 62 checking, 61
micro-services across tiers for, 132 strings and, 111
Index 297
Math package, testing, 47–48 misuse, overuse, and other pitfalls, 185
Maximum values for data types, 111 mocking concrete classes, 187–188
Mechanical duplication mocking value-holding classes, 188
block copy and paste, 229–230 mocks returning mocks, 189
constructor copy and paste, 230–231 oververifying, 186–187
copy and paste, 228–229 overview of, 177
method duplication, 231–232 setting expectations, 179–180
overview of, 228 stubbing, 180–183
summary, 235 summary, 189
Medium tests, 35, 151 verifying interactions, 183–185
Memory corruption, 111 Mockist style TDD. See Test-driven
Messaging middleware, 258 development (TDD) - mockist style
Metadata, unit test methods via, 83 Mockito, 180–184
Method duplication, 231–232 Modifications, increasing observability, 44
Methods Modularity, isolability and, 51
assertion, 89–90 Moq for C#
cleanup, 84 configuring stubs in mocking framework,
controlling dependency using factory, 180–183
122–123 constructing test doubles, 178
duplication of similar functionality in mocks behaving like spies in, 180
different, 232–233 verifying interactions in mocking
limitations of testing with formal, 42–43 framework, 184
test, 83–84 MSTest unit testing framework, 83, 89–90
Metrics, duplication messing up, 226 Multitiered applications, dependency across,
Micro-services, dependencies across 133
tiers, 133 Mutator, state-based tests, 173–174
Mind-set, in critique-based testing, 10–11
Minimum values for data types, 111
Mirroring business logic, complex stubs, 162 N
Misuse, of mocking framework, 185–189 Naming conventions
Mobile applications, UI tests for, 252–254 BDD-style tests, 103
Mocha for Java Script, BDD-style test, 102 duplication of similar functionality in
Mock objects different methods, 232–233
for behavior verification, 174 method duplication dangers, 231
defined, 176 removing need for comments, 237
implementing with mocking frameworks. Naming conventions, unit tests
See Mocking frameworks behavior-driven development-style, 86
oververifying in mocking frameworks, mandated by framework, 86
186–187 overview of, 85–86
response to expectations, 179–180 picking naming standard, 87–88
returning mocks, 189 structuring unit tests, 88–89
spies vs., 170–171 test methods using, 83
as test doubles, 164–170 unit of work, state under test, expected
verifying interactions in mocking behavior, 87
framework, 183–185 Nasty test cases, 5, 6
Mocking frameworks Negative testing, 35, 85
constructing test doubles, 177–179 Nested contexts, RSpec for Ruby, 102–103
306 Index
Network performance, tests outside domain Outcome, naming unit tests to convey
of unit tests, 261 expected, 85
Nice mocks, 180 Outgrown tests, deleting, 243
Nomenclature, contract programming, 58 Output
Nonfunctional testing, 28, 30 of developers, 1
Normal mocks, 180 observability via developer, 44
Nuking, coding stability for tests that are not Overprocessing waste, incurring in
unit tests, 259 testing, 42
Null check, enforcing contracts, 65 Overuse, mocking framework, 185–189
Null value, boundary values for strings, 111 Overuse, of dummies, 173
Nulls Oververifying, in mocking frameworks,
indicating dummy, 172 186–187
low-level test considerations, 274
Numbers
finding boundary values for, 111 P
low-level test considerations, 275 Page Objects, UI tests, 254
NUnit testing framework Pair programming, and legacy code, 4
constraints and assertions, 94–96 Pairwise testing
exception testing, 101 beyond, 149
parameterized tests, 138–139 for combinatorial explosions, 147–149
test methods, 83 defined, 149
theory tests, 140 Pairwise.pl program, 149
Parallel implementations, 227
Parameterized tests
O defined, 149
Object equality overview of, 138–139
asserting in BDD-style tests, 104 reporting results from, 141
unit test assertion checking for, 93–94 theories vs., 139–141
Object-oriented languages using parameterized stubs, 161–162
contracts blending with, 61 Parentheses, expressing intervals, 109
data types/testability in, 72–73 Partial verification, unit tests, 98–99
data types/testability in non, 74–76 Partitioning
raising level of abstraction, 53 boundary value analysis of, 110
temporal coupling in, 72 equivalence, 107–110
Objectives. See Testing objectives knowledge duplication with deliberate,
Objects, dependencies between collaborating, 233
119–125, 133 Pass-through tests, mockist style TDD,
Observability 218–219
defined, 55 Patching, by developers, 3–4
test first or test last, 209–210 Paving, 259–260
as testability quality attribute, 44–48 Payment gateways, 250–251
Obvious implementation, classic style TDD, PCI DSS security standard, 250
205, 211 Penetration tests, 28
Optimization, ranking, 201–202 Performance testing
Oracles, 144 impact of assertions on, 63
Order of tests, TDD, 204 nonfunctional testing of, 28
Index 307
Spock version, TDD, 284–287 mock objects shifting focus to, 164
test doubles, 279 setting up higher-level tests, 264–265
test levels express proximity to, 23–26 temporal coupling vs., 71–72
white box vs. black box testing, 22 unit testing from known, 83–84
Special code, in test-driven development, 207 verification of, 173–174, 176
Specialized assertions, unit tests, 94–96 State-based tests, 173–174
Specification-based testing techniques State transition testing, 113–114, 116
based on decision tables, 115–116 State under test, 87
boundary value analysis, 110 Statements, verifying tested code with
edge cases/gotchas for some data types, theories, 139–141
111–113 Static analysis, contracts, 65
equivalence partitioning, 107–110 Stderr (standard err), 255
overview of, 107 Stdin (standard input), 255
state transition testing, 113–114 Stdout (standard output), 255
summary, 116–117 Steady pace of work, in unit tests, 80
Specification by example Storage performance, tests outside domain of
as double-loop TDD, 222 unit tests, 261
testing style, 15–17 Stored procedures, tests enclosed in
as tests exercising services/components, transactions, 247
248–249 Stress testing, of performance, 28
Speed. See Execution speed Strict mocks, 180
Spies Strings
defined, 176 finding boundary values for, 111–112
implementing with mocking frameworks, low-level test considerations, 275–276
177 Structuring
as test doubles, 170–171 BDD-style tests, 102–103
Spike testing, performance, 28 unit testing frameworks, 88–89
Spock framework Stubs
differentiating stubs and mocks, 178 configuring in mocking framework,
mocks behaving like spies in, 180 180–183
parameterized tests, 138 defined, 176
source code for TDD, 284–287 flexibility of, 161–162
using blocks as assertions, 90 getting rid of side effects with, 162
verifying interactions in mocking implementing with mocking frameworks,
framework, 185 177–179
Spring Boot, starting embedded containers, as test doubles, 159–162
155 Subsystems, TDD for legacy code, 206
SQL-compliant in-memory databases, Suppliers, in contract programming
152–153, 156 contract building blocks and, 59–60
Square brackets, expressing intervals, 109 implementing contracts, 60–62
Stability, tests that are not unit tests, 259–260 overview of, 57–58
Stacking, stubs in mocking framework, 182 Support, testing to, 11
Startup, complex test, 264 Switch coverage, state transition testing, 114
State Syntax
controllability and, 48 BDD-style frameworks with fluent, 105
as driver of testability, 70–71 number of assertions per unit test, 91–92
310 Index
V overview of, 21
Validation, contracts not replacing, 57 positive testing, 35
Value-holding classes, 188 putting test levels/test types to work, 31
Values small, medium, and large tests, 35
dummies indicated by simple, default, smoke testing, 33–34
171–172 summary, 36
high-level test considerations, 273–274 test levels, 23–26
stubs, 160, 161–162 test types, 26–31
Variable delays, UI tests failing, 252 white box testing, 22–23
Variables, removing need for comments, 238
Verbosity, of assertions in unit tests, 92–93
Verification. See also Developer testing W
of behavior, 174–175 Waste, elimination of, 41–42
The Big Ball of Mud preventing, 38–39 Wasteful, tests as, 41–43
of contracts, 62–63 Web applications
in generative testing, 143–144 reality of layers in, 130
of indirect output with mock objects, UI tests for, 252–254
164–169 Web frameworks, raising level of abstraction,
in mocking framework, 183–187 53
in more complex tests, 266 Web services, almost unit tests of, 155–156
of state, 173–174 WebDriver testing, 253–255, 259
in testable software, 39 “What,” functional tests targeting, 28
with theories, 139–141 White box testing, 22–23, 52
in traditional testing, 11–13 Word frequency, and ranking, 200–202
in unit testing, 82, 98–99 Working Effectively with Legacy Code
Verify method, 164–169 (Feathers), 3
Virtualization, tests that are not unit tests,
259–260
Vocabulary, test key terms X
Agile Testing Quadrants, 32–33 XCTest unit testing framework, 83
black box testing, 22–23 XUnit.net framework, 85
characterization testing, 34–35
end-to-end testing, 34
errors, defects, and failures, 22 Z
negative testing, 35 Zero-one-many, test coverage of, 274
This page intentionally left blank
REGISTER YOUR PRODUCT at informit.com/register
Access Additional Benefits and SAVE 35% on Your Next Purchase
• Download available product updates.
• Access bonus material when applicable.
• Receive exclusive offers on new editions and related products.
(Just check the box to hear from us when setting up your account.)
• Get a coupon for 35% for your next purchase, valid for 30 days. Your code will
be available in your InformIT cart. (You will also find it in the Manage Codes
section of your account page.)
Registration benefits vary by product. Benefits will be listed on your account page
under Registered Products.
Addison-Wesley • Cisco Press • IBM Press • Microsoft Press • Pearson IT Certification • Prentice Hall • Que • Sams • VMware Press