2021.clean Craftsmanship - Disciplines, Standards, and Ethics
2021.clean Craftsmanship - Disciplines, Standards, and Ethics
“Bob’s Clean Craftsmanship has done a great job explaining the purposes of
agile technical practices, along with a deep historical basis for how they
came into existence, as well as positioning for why they will always be
important. His involvement in history and formation of agility, thorough
understanding of practices, and their purposes reflect vividly throughout the
manuscript.”
“Bob’s writing style is excellent. It is easy to read and the concepts are
explained in perfect detail for even a new programmer to follow. Bob even
has some funny moments, which pleasantly snap you out of focus. The true
value of the book is really in the cry for change, for something better … the
cry for programmers to be professional … the realization that software is
everywhere. Additionally, I believe there is a lot of value in all the history
Bob provides. I enjoy that he doesn’t waste time laying blame for how we
got to where we are now. Bob calls people to action, asking them to take
responsibility by increasing their standards and level of professionalism,
even if that means pushing back sometimes.”
—Heather Kanser
“Bob’s one of the very few famous developers with whom I’d like to work
on a tech project. It’s not because he’s a good developer, famous, or a good
communicator; it’s because Bob helps me be a better developer and a team
member. He has spotted every major development trend, years ahead of
others, and has been able to explain its importance, which encouraged me to
learn. Back when I started—apart from being honest and a good person—the
idea of craftsmanship and ethics was completely missing from this field.
Now, it seems to be the most important thing professional developers can
learn, even ahead of coding itself. I’m happy to see Bob leading the way
again. I can’t wait to hear his perspective and incorporate it into my own
practice.”
Robert C. Martin
Foreword
Preface
Acknowledgments
About the Author
Chapter 1 Craftsmanship
Chapter 5 Refactoring
What Is Refactoring?
The Basic Toolkit
Rename
Extract Method
Extract Variable
Extract Field
Rubik’s Cube
The Disciplines
Tests
Quick Tests
Break Deep One-to-One Correspondences
Refactor Continuously
Refactor Mercilessly
Keep the Tests Passing!
Leave Yourself an Out
Conclusion
Chapter 9 Productivity
We Will Never Ship S**T
Inexpensive Adaptability
We Will Always Be Ready
Stable Productivity
Chapter 10 Quality
Continuous Improvement
Fearless Competence
Extreme Quality
We Will Not Dump on QA
The QA Disease
QA Will Find Nothing
Test Automation
Automated Testing and User Interfaces
Testing the User Interface
Chapter 11 Courage
We Cover for Each Other
Honest Estimates
You Must Say NO
Continuous Aggressive Learning
Mentoring
PART III The Ethics
The First Programmer
Seventy-Five Years
Nerds and Saviors
Role Models and Villains
We Rule the World
Catastrophes
The Oath
Chapter 12 Harm
First, Do No Harm
No Harm to Society
Harm to Function
No Harm to Structure
Soft
Tests
Best Work
Making It Right
What Is Good Structure?
Eisenhower’s Matrix
Programmers Are Stakeholders
Your Best
Repeatable Proof
Dijkstra
Proving Correctness
Structured Programming
Functional Decomposition
Test-Driven Development
Chapter 13 Integrity
Small Cycles
The History of Source Code Control
Git
Short Cycles
Continuous Integration
Branches versus Toggles
Continuous Deployment
Continuous Build
Relentless Improvement
Test Coverage
Mutation Testing
Semantic Stability
Cleaning
Creations
Maintain High Productivity
Viscosity
Managing Distractions
Time Management
Chapter 14 Teamwork
Work as a Team
Open/Virtual Office
Estimate Honestly and Fairly
Lies
Honesty, Accuracy, Precision
Story 1: Vectors
Story 2: pCCU
The Lesson
Accuracy
Precision
Aggregation
Honesty
Respect
Never Stop Learning
Index
Foreword
I remember meeting Uncle Bob in the spring of 2003, soon after Scrum was
introduced to our company and technology teams. As a skeptical, fledgling
ScrumMaster, I remember listening to Bob teach us about TDD and a little
tool called FitNesse, and I remember thinking to myself, “Why would we
ever write test cases that fail first? Doesn’t testing come after coding?” I
often walked away scratching my head, as did many of my team members,
and yet to this day I distinctly remember Bob’s palpable exuberance for code
craftsmanship like it was only yesterday. I recall his directness one day as he
was looking at our bug backlog and asking us why on earth we would make
such poor decisions about software systems that we did not in fact own
—“These systems are company assets, not your own personal assets.” His
passion piqued our curiosity, and a year and half later, we had refactored our
way to about 80 percent automated test coverage and a clean code base that
made pivoting much easier, resulting in much happier customers—and
happier teams. We moved lightning fast after that, wielding our definition of
done like armor to protect us from the always-lurking code goblins; we had
learned, in essence, how to protect us from ourselves. Over time, we
developed a warmth for Uncle Bob, who came to truly feel like an uncle to
us—a warm, determined, and courageous man who would over time help us
learn to stand up for ourselves and do what was right. While some kids’
Uncle Bobs taught them how to ride bicycles or fish, our Uncle Bob taught us
to not compromise our integrity—and to this day, the ability and desire to
show up to every situation with courage and curiosity has been the best
lesson of my career.
I brought Bob’s early lessons with me on my journey as I ventured into the
world as an agile coach and quickly observed for myself that the best product
development teams figured out how to package up their own best practices
for their unique contexts, for their particular customers, in their respective
industries. I remembered Bob’s lessons when I observed that the best
development tools in the world were only as good as their human operators
—the teams who would figure out the best applications of those tools within
their own domains. I observed that, sure, teams can reach high percentages of
unit test coverage to check the box and meet the metric, only to find that a
large percentage of those tests are flaky—metric was met, but value was not
delivered. The best teams didn’t really need to care about metrics; they had
purpose, discipline, pride, and responsibility—and the metrics in every case
spoke for themselves. Clean Craftsmanship weaves all of these lessons and
principles into practical code examples and experiences to illustrate the
difference between writing something to meet a deadline versus actually
building something sustainable for the future.
Clean Craftsmanship reminds us to never settle for less, to walk the Earth
with fearless competence. This book, like an old friend, will remind you of
what matters, what works, what doesn’t, what creates risk, and what
diminishes it. These lessons are timeless. You may find that you already
practice some of the techniques contained within, and I bet you’ll find
something new, or at least something that you dropped because at some point
you caved to deadlines or other pressures in your career. If you are new to
the development world—whether in business or technology—you will learn
from the very best, and even the most practiced and battle weary will find
ways to improve themselves. Perhaps this book will help you find your
passion again, renew your desire to improve your craft, or rededicate your
energy to the search for perfection, regardless of the impediments on your
horizon.
Software developers rule the world, and Uncle Bob is here again to remind
us of the professional discipline of those with such power. He picks up
where he left off with Clean Code; because software developers literally
write the rules of humankind, Uncle Bob reminds us that we must practice a
strict code of ethics, a responsibility to know what the code does, how
people use it, and where it breaks. Software mistakes cost people their
livelihoods—and their lives. Software influences the way we think, the
decisions we make, and as a result of artificial intelligence and predictive
analytics, it influences social and herd behavior. Therefore, we must be
responsible and act with great care and empathy—the health and well-being
of people depend on it. Uncle Bob helps us face this responsibility and
become the professionals that our society expects, and demands, us to be.
As the Agile Manifesto nears its twentieth birthday at the writing of this
foreword, this book is a perfect opportunity to go back to basics: a timely
and humble reminder of the ever-increasing complexity of our programmatic
world and how we owe it to the legacy of humankind—and to ourselves—to
practice ethical development. Take your time reading Clean Craftsmanship.
Let the principles seep into you. Practice them. Improve them. Mentor others.
Keep this book on your go-to bookshelf. Let this book be your old friend—
your Uncle Bob, your guide—as you make your way through this world with
curiosity and courage.
Before we begin, there are two issues we need to deal with in order to
ensure that you, my gentle reader, understand the frame of reference in which
this book is presented.
For Yourself
If you are a programmer of several years’ experience, you probably know the
satisfaction of getting a system deployed and working. There is a certain
pride you feel at having been part of such an accomplishment. You are proud
of getting the system out the door.
But are you proud of the way you got that system out the door? Is your pride
the pride of finishing? Or is your pride the pride of workmanship? Are you
proud that the system has been deployed? Or are you proud of the way you
built that system?
When you go home after a hard day of writing code, do you look at yourself
in the mirror and say, “I did a good job today”? Or do you have to take a
shower?
Too many of us feel dirty at the end of the day. Too many of us feel trapped
into doing substandard work. Too many of us feel that low quality is
expected and is necessary for high speed. Too many of us think that
productivity and quality are inversely related.
In this book, I strive to break that mindset. This is a book about working
well. This is a book about doing a good job. This is a book that describes the
disciplines and practices that every programmer should know in order to
work fast, be productive, and be proud of what they write every single day.
For Society
The twenty-first century marks the first time in human history that our society
has become dependent, for its survival, on a technology that has acquired
virtually no semblance of discipline or control. Software has invaded every
facet of modern life, from brewing our morning coffee to providing our
evening entertainment, from washing our clothes to driving our cars, from
connecting us in a world-spanning network to dividing us socially and
politically. There is literally no aspect of life in the modern world that is not
dominated by software. And yet those of us who build this software are little
more than a ragtag batch of tinkerers who barely have any idea what we are
doing.
If we programmers had had a better grasp on what we do, would the 2020
Iowa Caucus results have been ready when promised? Would 346 people
have died in the two 737 Max crashes? Would Knight Capital Group have
lost $460 million in 45 minutes? Would 89 people have lost their lives in
Toyota’s unintended acceleration accidents?
Every five years, the number of programmers in the world doubles. Those
programmers are taught very little about their craft. They are shown the tools,
given a few toy projects to develop, and are then tossed into an exponentially
growing workforce to answer the exponentially growing demand for more
and more software. Every day, the house of cards that we call software
insinuates itself deeper and deeper into our infrastructure, our institutions,
our governments, and our lives. And with every day, the risk of catastrophe
grows.
Of what catastrophe do I speak? It is not the collapse of our civilization nor
the sudden dissolution of all the software systems at once. The house of cards
that is due to collapse is not composed of the software systems themselves.
Rather, it is the fragile foundation of public trust that is at risk.
Too many more 737 Max incidents, Toyota unintended acceleration incidents,
Volkswagen California EPA incidents, or Iowa Caucus incidents—too many
more cases of high-profile software failures or malfeasance—and the lack of
our discipline, ethics, and standards will become the focus of a distrustful
and enraged public. And then the regulations will follow: regulations that
none of us should desire; regulations that will cripple our ability to freely
explore and expand the craft of software development; regulations that will
put severe restrictions on the growth of our technology and economy.
It is not the goal of this book to stop the headlong rush into ever-more
software adoption. Nor is it the goal to slow the rate of software production.
Such goals would be a waste of effort. Our society needs software, and it
will get it no matter what. Attempting to throttle that need will not stop the
looming catastrophe of public trust.
Rather, it is the goal of this book to impress upon software developers and
their managers the need for discipline and to teach those developers and
managers the disciplines, standards, and ethics that are most effective at
maximizing their ability to produce robust, fault-tolerant, effective software.
It is only by changing the way that we programmers work, by upping our
discipline, ethics, and standards, that the house of cards can be shored up and
prevented from collapse.
Robert C. Martin (Uncle Bob) wrote his first line of code at the age of 12
in 1964. He has been employed as a programmer since 1970. He is
cofounder of cleancoders.com, offering online video training for software
developers, and founder of Uncle Bob Consulting LLC, offering software
consulting, training, and skill development services to major corporations
worldwide. He served as the Master Craftsman at 8th Light, Inc., a Chicago-
based software consulting firm.
Mr. Martin has published dozens of articles in various trade journals and is a
regular speaker at international conferences and trade shows. He is also the
creator of the acclaimed educational video series at cleancoders.com.
Mr. Martin has authored and edited many books, including the following:
Designing Object-Oriented C++ Applications Using the Booch Method
Patterns Languages of Program Design 3
More C++ Gems
Extreme Programming in Practice
Agile Software Development: Principles, Patterns, and Practices
UML for Java Programmers
Clean Code
The Clean Coder
Clean Architecture: A Craftsman’s Guide to Software Structure and Design
Clean Agile: Back to Basics
A leader in the industry of software development, Mr. Martin served three
years as the editor-in-chief of the C++ Report, and he served as the first
chairperson of the Agile Alliance.
1 Craftsmanship
The dream of flying is almost certainly as old as humanity. The ancient Greek
myth describing the flight of Daedalus and Icarus dates from around 1550
BCE. In the millennia that followed, a number of brave, if foolish, souls have
strapped ungainly contraptions to their bodies and leapt off cliffs and towers
to their doom in pursuit of that dream.
Things began to change about five hundred years ago when Leonardo
DaVinci drew sketches of machines that, though not truly capable of flight,
showed some reasoned thought. It was DaVinci who realized that flight could
be possible because air resistance works in both directions. The resistance
caused by pushing down on the air creates lift of the same amount. This is the
mechanism by which all modern airplanes fly.
DaVinci’s ideas were lost until the middle of the eighteenth century. But then
began an almost frantic exploration into the possibility of flight. The
eighteenth and nineteenth centuries were a time of intense aeronautical
research and experimentation. Unpowered prototypes were built, tried,
discarded, and improved. The science of aeronautics began to take shape.
The forces of lift, drag, thrust, and gravity were identified and understood.
Some brave souls made the attempt.
And some crashed and died.
In the closing years of the eighteenth century, and for the half century that
followed, Sir George Cayley, the father of modern aerodynamics, built
experimental rigs, prototypes, and full-sized models culminating in the first
manned flight of a glider.
And some still crashed and died.
Then came the age of steam and the possibility of powered manned flight.
Dozens of prototypes and experiments were performed. Scientists and
enthusiasts alike joined the gaggle of people exploring the potential of flight.
In 1890, Clément Ader flew a twin-engine steam-powered machine for 50
meters.
And some still crashed and died.
But the internal combustion engine was the real game-changer. In all
likelihood, the first powered and controlled manned flight took place in 1901
by Gustave Whitehead. But it was the Wright Brothers who, on December 17,
1903, at Kill Devil Hills, North Carolina, conducted the first truly sustained,
powered, and controlled manned flight of a heavier-than-air machine.
And some still crashed and died.
But the world changed overnight. Eleven years later, in 1914, biplanes were
dogfighting in the air over Europe.
And though many crashed and died under enemy fire, a similar number
crashed and died just learning to fly. The principles of flight might have been
mastered, but the technique of flight was barely understood.
Another two decades, and the truly terrible fighters and bombers of World
War II were wreaking havoc over France and Germany. They flew at extreme
altitudes. They bristled with guns. They carried devastating destructive
power.
During the war, 65,000 American aircraft were lost. But only 23,000 of those
were lost in combat. The pilots flew and died in battle. But more often they
flew and died when no one was shooting. We still didn’t know how to fly.
Another decade saw jet-powered craft, the breaking of the sound barrier, and
the explosion of commercial airlines and civilian air travel. It was the
beginning of the jet age, when people of means (the so-called jet set) could
leap from city to city and country to country in a matter of hours.
And the jet airliners tore themselves to shreds and fell out of the sky in
terrifying numbers. There was so much we still didn’t understand about
making and flying aircraft.
That brings us to the 1950s. Boeing 707s would be flying passengers from
here to there across the world by the end of the decade. Two more decades
would see the first wide-body jumbo jet, the 747.
Aeronautics and air travel settled down to become the safest and most
efficient means of travel in the history of the world. It took a long time, and
cost many lives, but we had finally learned how to safely build and fly
airplanes.1
1. The 737 Max notwithstanding.
In 1972, at Bell Labs in New Jersey, Ken Thompson and Dennis Ritchie
were between projects. They begged time on a PDP 7 from a different
project team and invented UNIX and C.
Now the pace picked up to near breakneck speeds. I’m going to give you a
few key dates. For each one, ask yourself how many computers are in the
world? How many programmers are in the world? And where did those
programmers come from?
1970—Digital Equipment Corporation had produced 50,000 PDP-8
computers since 1965.
1970—Winston Royce wrote the “waterfall” paper, “Managing the
Development of Large Software Systems.”
1971—Intel released the 4004 single-chip microcomputer.
1974—Intel released the 8080 single-chip microcomputer.
1977—Apple released the Apple II.
1979—Motorola released the 68000, a 16-bit single-chip microcomputer.
1980—Bjarne Stroustrup invented C with Classes (a preprocessor that
makes C look like Simula).
1980—Alan Kay invented Smalltalk.
1981—IBM released the IBM PC.
1983—Apple released the 128K Macintosh.
1983—Stroustrup renamed C with Classes to C++.
1985—The US Department of Defense adopted waterfall as the official
software process (DOD-STD-2167A).
1986—Stroustrup published The C++ Programming Language (Addison-
Wesley).
1991—Grady Booch published Object-Oriented Design with Applications
(Benjamin/Cummings).
1991—James Gosling invented Java (called Oak at the time).
1991—Guido Van Rossum released Python.
1995—Design Patterns: Elements of Reusable Object-Oriented Software
(Addison-Wesley) was written by Erich Gamma, Richard Helm, John
Vlissides, and Ralph Johnson.
1995—Yukihiro Matsumoto released Ruby.
1995—Brendan Eich created JavaScript.
1996—Sun Microsystems released Java.
1999—Microsoft invented C#/.NET (then called Cool).
2000—Y2K! The Millennium Bug.
2001—The Agile Manifesto was written.
Between 1970 and 2000, the clock rates of computers increased by three
orders of magnitude. Density increased by four orders of magnitude. Disk
space increased by six or seven orders of magnitude. RAM capacity
increased by six or seven orders of magnitude. Costs had fallen from dollars
per bit to dollars per gigabit. The change in the hardware is hard to visualize,
but just summing up all the orders of magnitude I mentioned leads us to about
a thirty orders of magnitude increase in capability.
And all this in just over fifty years since Alan Turing’s lecture.
How many programmers are there now? How many lines of code have been
written? How good is all that code?
Compare this timeline with the aeronautical timeline. Do you see the
similarity? Do you see the gradual increase in theory, the rush and failure of
the enthusiasts, the gradual increase in competence? The decades of not
knowing what we were doing?
And now, with our society depending, for its very existence, on our skills, do
we have the Sullenbergers whom our society needs? Have we groomed the
programmers who understand their craft as deeply as today’s airline pilots
understand theirs? Do we have the craftsmen whom we shall certainly
require?
Craftsmanship is the state of knowing how to do something well and is the
outcome of good tutelage and lots of experience. Until recently, the software
industry had far too little of either. Programmers tended not to remain
programmers for long, because they viewed programming as a steppingstone
into management. This meant that there were few programmers who acquired
enough experience to teach the craft to others. To make matters worse, the
number of new programmers entering the field doubles every five years or
so, keeping the ratio of experienced programmers far too low.
The result has been that most programmers never learn the disciplines,
standards, and ethics that could define their craft. During their relatively
brief career of writing code, they remain unapprenticed novices. And this, of
course, means that much of the code produced by those inexperienced
programmers is substandard, ill structured, insecure, buggy, and generally a
mess.
In this book, I describe the standards, disciplines, and ethics that I believe
every programmer should know and follow in order to gradually acquire the
knowledge and skill that their craft truly requires.
I The Disciplines
Extreme Programming
In 1970, Winston Royce published the paper that drove the waterfall
development process into the mainstream. It took almost 30 years to undo that
mistake.
By 1995, software experts started considering a different, more incremental
approach. Processes such as Scrum, feature-driven development (FDD),
dynamic systems development method (DSDM), and the Crystal
methodologies were presented. But little changed in the industry at large.
Then, in 1999, Kent Beck published the book Extreme Programming
Explained (Addison-Wesley). Extreme Programming (XP) built upon the
ideas in those previous processes but added something new. XP added
engineering practices.
Enthusiasm grew exponentially for XP between 1999 and 2001. It was this
enthusiasm that spawned and drove the Agile revolution. To this day, XP
remains the best defined and most complete of all the Agile methods. The
engineering practices at its core are the focus of this section on disciplines.
The Circle of Life
In Figure I.1, you see Ron Jeffries’ Circle of Life, which shows the practices
of XP. The disciplines that we cover in this book are the four in the center
and the one on the far left.
The four in the center are the engineering practices of XP: test-driven
development (TDD), refactoring, simple design, and pairing (which we shall
call collaborative programming. The practice at the far left, acceptance
tests, is the most technical and engineering focused of the business practices
of XP. These five practices are among the foundational disciplines of
software craftsmanship.
Test-Driven Development
TDD is the lynchpin discipline. Without it, the other disciplines are either
impossible or impotent. For that reason, the two upcoming sections
describing TDD represent nearly half the pages of this book and are intensely
technical. This organization may seem unbalanced. Indeed, it feels that way
to me too, and I struggled with what to do about that. My conclusion,
however, is that the imbalance is a reaction to the corresponding imbalance
within our industry. Too few programmers know this discipline well.
TDD is the discipline that governs the way a programmer works on a
second-by-second basis. It is neither an up-front discipline nor an after-the-
fact discipline. TDD is in-process and in-your-face. There is no way to do
partial TDD; it is an all-or-nothing discipline.
The essence of the TDD discipline is very simple. Small cycles and tests
come first. Tests come first in everything. Tests are written first. Tests are
cleaned up first. In all activities, tests come first. And all activities are
broken down into the tiniest of cycles.
Cycle times are measured in seconds, not minutes. They are measured in
characters, not lines. The feedback loop is closed almost literally as soon as
it is opened.
The goal of TDD is to create a test suite that you would trust with your life. If
the test suite passes, you should feel safe to deploy the code.
Of all the disciplines, TDD is the most onerous and the most complex. It is
onerous because it dominates everything. It is the first and last thing you think
about. It is the constraint that lays thick over everything you do. It is the
governor that keeps the pace steady regardless of pressure and environmental
stresses.
TDD is complex because code is complex. For each shape or form of code,
there is a corresponding shape or form of TDD. TDD is complex because the
tests must be designed to fit the code without being coupled to the code and
must cover almost everything yet still execute in seconds. TDD is an
elaborate and complex skill that is very hard won but immeasurably
rewarding.
Refactoring
Refactoring is the discipline that allows us to write clean code. Refactoring
is difficult, if not impossible, without TDD.1 Therefore, writing clean code is
just as difficult or impossible without TDD.
1. There may be other disciplines that could support refactoring as well as TDD. Kent Beck’s test &&
commit || revert is a possibility. At the time of this writing, however, it has not enjoyed a high degree
of adoption and remains more of an academic curiosity.
Simple Design
Life on Earth could be described in terms of layers. At the top is ecology, the
study of systems of living things. Below that is physiology, the study life’s
internal mechanisms. The next layer down might be microbiology, the study
of cells, nucleic acids, proteins, and other macromolecular systems. Those
are in turn described by the science of chemistry, which in turn is described
by quantum mechanics.
Extending that analogy to programming, if TDD is the quantum mechanics of
programming, then refactoring is the chemistry and simple design is the
microbiology. Continuing that analogy, SOLID principles, object-oriented
design, and functional programming are the physiology, and architecture is
the ecology of programming.
Simple design is almost impossible without refactoring. Indeed, it is the end
goal of refactoring, and refactoring is the only practical means to achieve that
goal. That goal is the production of simple atomic granules of design that fit
well into the larger structures of programs, systems, and applications.
Simple design is not a complex discipline. It is driven by four very simple
rules. However, unlike TDD and refactoring, simple design is an imprecise
discipline. It relies on judgment and experience. Done well, it is the first
indication that separates an apprentice who knows the rules from a
journeyman who understands the principles. It is the beginning of what
Michael Feathers has called design sense.
Collaborative Programming
Collaborative programming is the discipline and the art of working together
in a software team. It includes subdisciplines such as pair programming, mob
programming, code reviews, and brainstorms. Collaborative programming
involves everyone on the team, programmers and nonprogrammers alike. It is
the primary means by which we share knowledge, assure consistency, and gel
the team into a functioning whole.
Of all the disciplines, collaborative programming is the least technical and
the least prescriptive. Nevertheless, it may be the most important of the five
disciplines, because the building of an effective team is both a rare and
precious thing.
Acceptance Tests
Acceptance testing is the discipline that ties the software development team
to the business. The business purpose is the specification of the desired
behaviors of the system. Those behaviors are encoded into tests. If those tests
pass, the system behaves as specified.
The tests must be readable and writeable by representatives of the business.
It is by writing and reading these tests and seeing them pass that the business
knows what the software does and that it does what the business needs it to
do.
2 Test-Driven Development
Overview
Zero. It’s an important number. It’s the number of balance. When the two
sides of a scale are in balance, the pointer on the scale reads zero. A neutral
atom, with equal numbers of electrons and protons, has a charge of zero. The
sum of forces on a bridge balances to zero. Zero is the number of balance.
Did you ever wonder why the amount of money in your checking account is
called its balance? That’s because the balance in your account is the sum of
all the transactions that have either deposited or withdrawn money from that
account. But transactions always have two sides because transactions move
money between accounts.
The near side of a transaction affects your account. The far side affects some
other account. Every transaction whose near side deposits money into your
account has a far side that withdraws that amount from some other account.
Every time you write a check, the near side of the transaction withdraws
money from your account, and the far side deposits that money into some
other account. So, the balance in your account is the sum of the near sides of
the transactions. The sum of the far sides should be equal and opposite to the
balance of your account. The sum of all the near and far sides should be zero.
Two thousand years ago, Gaius Plinius Secundus, known as Pliny the Elder,
realized this law of accounting and invented the discipline of double-entry
bookkeeping. Over the centuries, this discipline was refined by the bankers
in Cairo and then by the merchants of Venice. In 1494, Luca Pacioli, a
Franciscan friar and friend of Leonardo DaVinci, wrote the first definitive
description of the discipline. It was published in book form on the newly
invented printing press, and the technique spread.
In 1772, as the industrial revolution gained momentum, Josiah Wedgwood
was struggling with success. He was the founder of a pottery factory, and his
product was in such high demand that he was nearly bankrupting himself
trying to meet that demand. He adopted double-entry bookkeeping and was
thereby able to see how money was flowing in and out of his business with a
resolution that had previously escaped him. And by tuning those flows, he
staved off the looming bankruptcy and built a business that exists to this day.
Wedgwood was not alone. Industrialization drove the vast growth of the
economies of Europe and America. In order to manage all the money flows
resulting from that growth, increasing numbers of firms adopted the
discipline.
In 1795, Johann Wolfgang von Goethe wrote the following in Wilhelm
Meister’s Apprenticeship. Pay close attention, for we will return to this
quote soon.
“Away with it, to the fire with it!” cried Werner. “The invention does not deserve the
smallest praise: that affair has plagued me enough already, and drawn upon yourself
your father’s wrath. The verses may be altogether beautiful; but the meaning of them is
fundamentally false. I still recollect your Commerce personified; a shrivelled, wretched-
looking sibyl she was. I suppose you picked up the image of her from some miserable
huckster’s shop. At that time, you had no true idea at all of trade; whilst I could not think
of any man whose spirit was, or needed to be, more enlarged than the spirit of a genuine
merchant. What a thing it is to see the order which prevails throughout his business! By
means of this he can at any time survey the general whole, without needing to perplex
himself in the details. What advantages does he derive from the system of book-keeping
by double entry! It is among the finest inventions of the human mind; every prudent
master of a house should introduce it into his economy.”
Have you seen any code that matches that description? I’m sure you have. So
have I. Indeed, if you are like me, then you have seen far, far too much of it. If
you are like me, you have written far, far too much of it.
Now, one last look at Goethe’s words:
What a thing it is to see the order which prevails throughout his business! By means of
this he can at any time survey the general whole, without needing to perplex himself in
the details.
It is significant that Goethe ascribes this powerful benefit to the simple
discipline of double-entry bookkeeping.
Software
The maintenance of proper accounts is utterly essential for running a modern
business, and the discipline of double-entry bookkeeping is essential for the
maintenance of proper accounts. But is the proper maintenance of software
any less essential to the running of a business? By no means! In the twenty-
first century, software is at the heart of every business.
What, then, can software developers use as a discipline that gives them the
control and vision over their software that double-entry bookkeeping gives to
accountants and managers? Perhaps you think that software and accounting
are such different concepts that no correspondence is required or even
possible. I beg to differ.
Consider that accounting is something of a mage’s art. Those of us not versed
in its rituals and arcanities understand but little of the depth of the accounting
profession. And what is the work product of that profession? It is a set of
documents that are organized in a complex and, for the layperson,
bewildering fashion. Upon those documents is strewn a set of symbols that
few but the accountants themselves can truly understand. And yet if even one
of those symbols were to be in error, terrible consequences could ensue.
Businesses could founder and executives could be jailed.
Now consider how similar accounting is to software development. Software
is a mage’s art indeed. Those not versed in the rituals and arcanities of
software development have no true idea of what goes on under the surface.
And the product? Again, a set of documents: the source code—documents
organized in a deeply complex and bewildering manner, littered with
symbols that only the programmers themselves can divine. And if even one of
those symbols is in error, terrible consequences may ensue.
The two professions are deeply similar. They both concern themselves with
the intense and fastidious management of intricate detail. They both require
significant training and experience to do well. They both are engaged in the
production of complex documents whose accuracy, at the level of individual
symbols, is critical.
Accountants and programmers may not want to admit it, but they are of a
kind. And the discipline of the older profession should be well observed by
the younger.
As you will see in what follows, TDD is double-entry bookkeeping. It is the
same discipline, executed for the same purpose, and delivering the same
results. Everything is said twice, in complementary accounts that must be
kept in balance by keeping the tests passing.
If you are a programmer of any years’ experience, this law may seem foolish.
You might wonder what test you are supposed to write if there’s no code to
test. This question comes from the common expectation that tests are written
after code. But if you think about it, you’ll realize that if you can write the
production code, you can also write the code that tests the production code. It
may seem out of order, but there’s no lack of information preventing you from
writing the test first.
Again, if you are an experienced programmer, then you likely realize that the
very first line of the test will fail to compile because that first line will be
written to interact with code that does not yet exist. And that means, of
course, that you will not be able to write more than one line of a test before
having to switch over to writing production code.
And now the cycle is complete. It should be obvious to you that these three
laws lock you into a cycle that is just a few seconds long. It looks like this:
You write a line of test code, but it doesn’t compile (of course).
You write a line of production code that makes the test compile.
You write another line of test code that doesn’t compile.
You write another line or two of production code that makes the test
compile.
You write another line or two of test code that compiles but fails an
assertion.
You write another line or two of production code that passes the
assertion.
And this is going to be your life from now on.
Once again, the experienced programmer will likely consider this to be
absurd. The three laws lock you into a cycle that is just a few seconds long.
Each time around that cycle, you are switching between test code and
production code. You’ll never be able to just write an if statement or a while
loop. You’ll never be able to just write a function. You will be forever
trapped in this tiny little loop of switching contexts between test code and
production code.
You may think that this will be tedious, boring, and slow. You might think that
it will impede your progress and interrupt your chain of thought. You might
even think that it’s just plain silly. You may think that this approach will lead
you to produce spaghetti code or code with little or no design—a haphazard
conglomeration of tests and the code that makes those tests pass.
Hold all those thoughts and consider what follows.
Documentation
If you’ve ever integrated a third-party package, you know that included in the
bundle of software you receive is a PDF written by a tech writer. This
document purports to describe how to integrate the third-party package. At
the end of this document is almost always an ugly appendix that contains all
the code examples for integrating the package.
Of course, that appendix is the first place you look. You don’t want to read
what a tech writer wrote about the code; you want to read the code. And that
code will tell you much more than the words written by the tech writer. If you
are lucky, you might even be able to use copy/paste to move the code into
your application where you can fiddle it into working.
When you follow the three laws, you are writing the code examples for the
whole system. Those tests you are writing explain every little detail about
how the system works. If you want to know how to create a certain business
object, there are tests that show you how to create it every way that it can be
created. If you want to know how to call a certain API function, there are
tests that demonstrate that API function and all its potential error conditions
and exceptions. There are tests in the test suite that will tell you anything you
want to know about the details of the system.
Those tests are documents that describe the entire system at its lowest level.
These documents are written in a language you intimately understand. They
are utterly unambiguous. They are so formal that they execute. And they
cannot get out of sync with the system.
As documents go, they are almost perfect.
I don’t want to oversell this. The tests are not particularly good at describing
the motivation for a system. They are not high-level documents. But at the
lowest level, they are better than any other kind of document that could be
written. They are code. And code is something you know will tell you the
truth.
You might be concerned that the tests will be as hard to understand as the
system as a whole. But this is not the case. Each test is a small snippet of
code that is focused on one very narrow part of the system as a whole. The
tests do not form a system by themselves. The tests do not know about each
other, and so there is no rat’s nest of dependency in the tests. Each test stands
alone. Each test is understandable on its own. Each test tells you exactly
what you need to understand within a very narrow part of the system.
Again, I don’t want to oversell this point. It is possible to write opaque and
complex tests that are hard to read and understand, but it is not necessary.
Indeed, it is one of the goals of this book to teach you how to write tests that
are clear and clean documents that describe the underlying system.
Fun
When you follow the three laws, something very different happens. First of
all, it’s fun. One more time, I don’t want to oversell this. TDD is not as much
fun as winning the jackpot in Vegas. It’s not as much fun as going to a party or
even playing Chutes and Ladders with your four-year-old. Indeed, fun might
not be the perfect word to use.
Do you remember when you got your very first program to work? Remember
that feeling? Perhaps it was in a local department store that had a TRS-80 or
a Commodore 64. Perhaps you wrote a silly little infinite loop that printed
your name on the screen forever and ever. Perhaps you walked away from
that screen with a little smile on your face, knowing that you were the master
of the universe and that all computers would bow down to you forever.
A tiny echo of that feeling is what you get every time you go around the TDD
loop. Every test that fails just the way you expected it to fail makes you nod
and smile just a little bit. Every time you write the code that makes that
failing test pass, you remember that once you were master of the universe and
that you still have the power.
Every time around the TDD loop, there’s a tiny little shot of endorphins
released into your reptile brain, making you feel just a little more competent
and confident and ready to meet the next challenge. And though that feeling is
small, it is nonetheless kinda fun.
Design
But never mind the fun. Something much more important happens when you
write the tests first. It turns out that you cannot write code that’s hard to test if
you write the tests first. The act of writing the test first forces you to design
the code to be easy to test. There’s no escape from this. If you follow the
three laws, your code will be easy to test.
What makes code hard to test? Coupling and dependencies. Code that is easy
to test does not have those couplings and dependencies. Code that is easy to
test is decoupled!
Following the three laws forces you to write decoupled code. Again, there is
no escape from this. If you write the tests first, the code that passes those
tests will be decoupled in ways that you’d never have imagined.
And that’s a very good thing.
Fear
Programming is hard. It may be the most difficult thing that humans have
attempted to master. Our civilization now depends upon hundreds of
thousands of interconnected software applications, each of which involves
hundreds of thousands if not tens of millions of lines of code. There is no
other apparatus constructed by humans that has so many moving parts.
Each of those applications is supported by teams of developers who are
scared to death of change. This is ironic because the whole reason software
exists is to allow us to easily change the behavior of our machines.
But software developers know that every change introduces the risk of
breakage and that breakage can be devilishly hard to detect and repair.
Imagine that you are looking at your screen and you see some nasty tangled
code there. You probably don’t have to work very hard to conjure that image
because, for most of us, this is an everyday experience.
Now let’s say that as you glance at that code, for one very brief moment, the
thought occurs to you that you ought to clean it up a bit. But your very next
thought slams down like Thor’s hammer: “I’M NOT TOUCHING IT!”
Because you know that if you touch it, you will break it; and if you break it, it
becomes yours forever.
This is a fear reaction. You fear the code you maintain. You fear the
consequences of breaking it.
The result of this fear is that the code must rot. No one will clean it. No one
will improve it. When forced to make changes, those changes will be made
in the manner that is safest for the programmer, not best for the system. The
design will degrade, and the code will rot, and the productivity of the team
will decline, and that decline will continue until productivity is near zero.
Ask yourself if you have ever been significantly slowed down by the bad
code in your system. Of course you have. And now you know why that bad
code exists. It exists because nobody has the courage to do the one thing that
could improve it. No one dares risk cleaning it.
Courage
But what if you had a suite of tests that you trusted so much that you were
confident in recommending deployment every time that suite of tests passed?
And what if that suite of tests executed in seconds? How much would you
then fear to engage in a gentle cleaning of the system?
Imagine that code on your screen again. Imagine the stray thought that you
might clean it up a little. What would stop you? You have the tests. Those
tests will tell you the instant you break something.
With that suite of tests, you can safely clean the code. With that suite of tests,
you can safely clean the code. With that suite of tests, you can safely clean
the code.
No, that wasn’t a typo. I wanted to drive the point home very, very hard. With
that suite of tests, you can safely clean the code!
And if you can safely clean the code, you will clean the code. And so will
everyone else on the team. Because nobody likes a mess.
Therefore, we add a new law to the three laws of TDD: the law of
refactoring. First you write a small amount of failing test code. Then you
write a small amount of passing production code. Then you clean up the mess
you just made.
The traffic light gets a new color: red → green → refactor (Figure 2.1).
The Basics
It is very hard to create effective examples of TDD in text. The rhythm of
TDD just doesn’t come through very well. In the pages that follow, I try to
convey that rhythm with appropriate timestamps and callouts. But to actually
understand the true frequency of this rhythm, you just have to see it.
Therefore, each of the examples to follow has a corresponding online video
that will help you see the rhythm first hand. Please watch each video in its
entirety, and then make sure you go back to the text and read the explanation
with the timestamps. If you don’t have access to the videos, then pay special
attention to the timestamps in the examples so you can infer the rhythm.
Simple Examples
As you review these examples, you are likely to discount them because they
are all small and simple problems. You might conclude that TDD may be
effective for such “toy examples” but cannot possibly work for complex
systems. This would be a grave mistake.
The primary goal of any good software designer is to break down large and
complex systems into a set of small, simple problems. The job of a
programmer is to break those systems down into individual lines of code.
Thus, the examples that follow are absolutely representative of TDD
regardless of the size of the project.
This is something I can personally affirm. I have worked on large systems
that were built with TDD, and I can tell you from experience that the rhythm
and techniques of TDD are independent of scope. Size does not matter.
Or, rather, size does not matter to the procedure and the rhythm. However,
size has a profound effect on the speed and coupling of the tests. But those
are topics for the advanced chapters.
Stack
Watch related video: Stack
Access video by registering at informit.com/register
We start with a very simple problem: create a stack of integers. As we walk
through this problem, note that the tests will answer any questions you have
about the behavior of the stack. This is an example of the documentation
value of tests. Note also that we appear to cheat by making the tests pass by
plugging in absolute values. This is a common strategy in TDD and has a
very important function. I’ll describe that as we proceed.
We begin:
Click here to view code image
// T: 00:00 StackTest.java
package stack;
import org.junit.Test;
It’s good practice to always start with a test that does nothing, and make sure
that test passes. Doing so helps ensure that the execution environment is all
working.
Next, we face the problem of what to test. There’s no code yet, so what is
there to test?
The answer to that question is simple. Assume we already know the code we
want to write: public class stack. But we can’t write it because we don’t
have a test that fails due to its absence. So, following the first law, we write
the test that forces us to write the code that we already know we want to
write.
Rule 1: Write the test that forces you to write the code you already know you want to
write.
This is the first of many rules to come. These “rules” are more like
heuristics. They are bits of advice that I’ll be throwing out, from time to time,
as we progress through the examples.
Rule 1 isn’t rocket science. If you can write a line of code, then you can
write a test that tests that line of code, and you can write it first. Therefore,
Click here to view code image
// T:00:44 StackTest.java
public class StackTest {
@Test
public void canCreateStack() throws Exception {
MyStack stack = new MyStack();
}
}
I use boldface to show code that has been changed or added and highlight to
depict code that does not compile. I chose MyStack for our example because
the name Stack is known to the Java environment already.
Notice that in the code snippet, we changed the name of the test to
communicate our intent. We can create a stack.
Now, because MyStack doesn’t compile, we’d better follow the second law
and create it, but by the third law, we’d better not write more than we need:
// T: 00:54 Stack.java
package stack;
Ten seconds have passed, and the test compiles and passes. When I initially
wrote this example, most of that 10 seconds was taken up by rearranging my
screen so that I can see both files at the same time. My screen now looks like
Figure 2.2. The tests are on the left, and the production code is on the right.
This is my typical arrangement. It’s nice to have screen real estate.
Figure 2.2 Rearranged screen
MyStack isn’t a great name, but it avoided the name collision. Now that
MyStack is declared in the stack package, let’s change it back to Stack. That
took 15 seconds. The tests still pass.
Click here to view code image
// T:01:09 StackTest.java
public class StackTest {
@Test
public void canCreateStack() throws Exception {
Stack stack = new Stack();
}
}
// T: 01:09 Stack.java
package stack;
public class Stack {
}
Writing code that works is hard enough. Writing code that works and is clean
is even harder. Fortunately, we can break the effort up into two steps. We can
write bad code that works. Then, if we have tests, we can easily clean up the
bad code while keeping it working.
Thus, every circuit around the TDD loop, we take the opportunity to clean up
any little messes we might have made.
You may have noticed that our test does not actually assert any behavior. It
compiles and passes but asserts nothing at all about the newly created stack.
We can fix that in 15 seconds:
Click here to view code image
// T: 01:24 StackTest.java
public class StackTest {
@Test
public void canCreateStack() throws Exception {
Stack stack = new Stack();
assertTrue(stack.isEmpty());
}
}
The second law kicks in here, so we’d better get this to compile:
Click here to view code image
// T: 01:49
import static junit.framework.TestCase.assertTrue;
Nine seconds later, the test passes. It has taken 9 seconds to ensure that the
test both passes and fails.
When programmers first see that false and then that true, they often laugh
because it looks so foolish. It looks like cheating. But it’s not cheating, and
it’s not at all foolish. It has taken mere seconds to ensure that the test both
passes and fails as it should. Why in the world would we not do this?
What’s the next test? Well, we know we need to write the push function. So,
by Rule 1, we write the test that forces us to write the push function:
Click here to view code image
// T 02:24 StackTest.java
@Test
public void canPush() throws Exception {
Stack stack = new Stack();
stack.push(0);
}
This doesn’t compile. By the second law, then, we must write the production
code that will make it compile:
Click here to view code image
// T: 02:31 Stack.java
public void push(int element) {
This compiles, of course, but now we have another test without an assertion.
The obvious thing to assert is that, after one push, the stack is not empty:
Click here to view code image
// T: 02:54 StackTest.java
@Test
public void canPush() throws Exception {
Stack stack = new Stack();
stack.push(0);
assertFalse(stack.isEmpty());
}
This passes. It has been 2 minutes since the last test passed. Now, by Rule 2,
we need to clean this up. The duplication of the stack creation bothers me, so
let’s extract the stack into a field of the class and initialize it:
Click here to view code image
// T: 04:24 StackTest.java
public class StackTest {
private Stack stack = new Stack();
@Test
public void canCreateStack() throws Exception {
assertTrue(stack.isEmpty());
}
@Test
public void canPush() throws Exception {
stack.push(0);
assertFalse(stack.isEmpty());
}
}
This fails because nothing sets the empty flag back to true, so
// T: 06:06 Stack.java
public int pop() {
empty=true;
return -1;
}
And, of course, this passes. It has been 76 seconds since the last test passed.
Nothing to clean up, so back to the first law. The size of the stack should be 2
after two pushes.
Click here to view code image
// T: 06:48 StackTest.java
@Test
public void afterTwoPushes_sizeIsTwo() throws Exception {
stack.push(0);
stack.push(0);
assertEquals(2, stack.getSize());
}
The second law kicks in because of the compile errors, but they are easy to
fix. We add the necessary import to the test and the following function to the
production code:
// T: 07:23 Stack.java
public int getSize() {
return 0;
}
This looks dumb, but we have now seen the test fail and pass properly, and it
took only 11 seconds. So, again, why wouldn’t we do this?
But this solution is clearly naive, so by Rule 1, we modify a previous test in
a manner that will force us to write a better solution. And, of course, we
screw it up (you can blame me):
Click here to view code image
// T: 08:06 StackTest.java
@Test
public void afterOnePushAndOnePop_isEmpty() throws Exception {
stack.push(0);
stack.pop();
assertTrue(stack.isEmpty());
assertEquals(1, stack.getSize());
}
Okay, that was really stupid. But programmers make dumb mistakes from
time to time, and I am no exception. I didn’t spot this mistake right away
when I first wrote the example because the test failed just the way I expected
it to.
So, now, secure in the assumption that our tests are good, let’s make the
changes that we believe will make those tests pass:
Click here to view code image
// T: 08:56
public class Stack {
private boolean empty = true;
private int size = 0;
I was surprised to see this fail. But after composing myself, I quickly found
my error and repaired the test. Let’s do this:
Click here to view code image
// T: 09:28 StackTest.java
@Test
public void afterOnePushAndOnePop_isEmpty() throws Exception {
stack.push(0);
stack.pop();
assertTrue(stack.isEmpty());
assertEquals(0, stack.getSize());
}
And the tests all pass. It has been 3 minutes and 22 seconds since the tests
last passed.
For the sake of completeness, let’s add the size check to another test:
Click here to view code image
// T: 09:51 StackTest.java
@Test
public void afterOnePush_isNotEmpty() throws Exception {
stack.push(0);
assertFalse(stack.isEmpty());
assertEquals(1, stack.getSize());
}
That passes. It has been 1 minute and 27 seconds since the tests last passed.
Back to the first law. The stack should remember what was pushed. Let’s try
the simplest case:
Click here to view code image
// T: 11:49 StackTest.java
@Test
public void afterPushingX_willPopX() throws Exception {
stack.push(99);
assertEquals(99, stack.pop());
}
This fails because pop is currently returning -1. We make it pass by returning
99:
// T: 11:57 Stack.java
public int pop() {
if (size == 0)
throw new Underflow();
--size;
return 99;
}
This fails because we’re returning 99. We make it pass by adding a field to
record the last push:
Click here to view code image
// T: 12:50 Stack.java
public class Stack {
private int size = 0;
private int element;
This passes. It has been 92 seconds since the tests last passed.
At this point, you are probably pretty frustrated with me. You might even be
shouting at these pages, demanding that I stop messing around and just write
the damned stack. But actually, I’ve been following Rule 3.
Rule 3: Don’t go for the gold.
When you first start TDD, the temptation is overwhelming to tackle the hard
or interesting things first. Someone writing a stack would be tempted to test
first-in-last-out (FILO) behavior first. This is called “going for the gold.” By
now, you have noticed that I have purposely avoided testing anything stack-
like. I’ve been focusing on all the ancillary stuff around the outside of the
stack, things like emptiness and size.
Why haven’t I been going for the gold? Why does Rule 3 exist? Because
when you go for the gold too early, you tend to miss all the details around the
outside. Also, as you will soon see, you tend to miss the simplifying
opportunities that those ancillary details provide.
Anyway, the first law has just kicked in. We need to write a failing test. And
the most obvious test to write is FILO behavior:
Click here to view code image
// T: 13:36 StackTest.java
@Test
public void afterPushingXandY_willPopYthenX() {
stack.push(99);
stack.push(88);
assertEquals(88, stack.pop());
assertEquals(99, stack.pop());
}
This fails. Getting it to pass will require that we remember more than one
value, so we should probably use an array. Let’s rename the field to elements
and turn it into an array:
Click here to view code image
// T: 13:51 Stack.java
public class Stack {
private int size = 0;
private int[] elements = new int[2];
Ah, but there are compile errors. Let’s tackle them one by one. The elements
variable in push needs brackets:
Click here to view code image
// T: 14:01 Stack.java
public void push(int element) {
size++;
this.elements[] = element;
}
We need something to put into those braces. Hmmm. There’s that size++
from the previous snippets:
Click here to view code image
// T: 14:07 Stack.java
public void push(int element) {
this.elements[size++] = element;
}
And now the tests pass again. It has been 94 seconds since the tests last
passed.
And we’re done. Oh, there’s more we could do. The stack holds only two
elements and does not deal with overflows, but there’s nothing left that I want
to demonstrate in this example. So, consider those improvements an exercise
for you, my reader.
It has taken 14 minutes and 24 seconds to write an integer stack from scratch.
The rhythm you saw here was real and is typical. This is how TDD feels,
regardless of the scope of the project.
Exercise
Implement a first-in-first-out queue of integers using the technique shown
previously. Use a fixed-sized array to hold the integers. You will likely need
two pointers to keep track of where the elements are to be added and
removed. When you are done, you may find that you have implemented a
circular buffer.
Prime Factors
Watch related video: Prime Factors
Access video by registering at informit.com/register
The next example has a story and a lesson. The story begins in 2002 or
thereabouts. I had been using TDD for a couple of years by that time and was
learning Ruby. My son, Justin, came home from school and asked me for help
with a homework problem. The homework was to find the prime factors of a
set of integers.
I told Justin to try to work the problem by himself and that I would write a
little program for him that would check his work. He retired to his room, and
I set my laptop up on the kitchen table and started to ponder how to code the
algorithm to find prime factors.
I settled on the obvious approach of using the sieve of Eratosthenes to
generate a list of prime numbers and then dividing those primes into the
candidate number. I was about to code it when a thought occurred to me:
What if I just start writing tests and see what happens?
I began writing tests and making them pass, following the TDD cycle. And
this is what happened.
First watch the video if you can. It will show you a lot of the nuances that I
can’t show in text. In the text that follows, I avoid the tedium of all the
timestamps, compile-time errors, and so on. I just show you the incremental
progress of the tests and the code.
We begin with the most obvious and degenerate case. Indeed, that follows a
rule:
Rule 4: Write the simplest, most specific, most degenerate2 test that will fail.
2. The word degenerate is used here to mean the most absurdly simple starting point.
The most degenerate case is the prime factors of 1. The most degenerate
failing solution is to simply return a null.
Click here to view code image
public class PrimeFactorsTest {
@Test
public void factors() throws Exception {
assertThat(factorsOf(1), is(empty()));
}
private List<Integer> factorsOf(int n) {
return null;
}
}
Note that I am including the function being tested within the test class. This is
not typical but is convenient for this example. It allows me to avoid bouncing
back and forth between two source files.
Now this test fails, but it is easy to make it pass. We simply return an empty
list:
Click here to view code image
private List<Integer> factorsOf(int n) {
return new ArrayList<>();
}
This fails; but again, it’s easy to make it pass. That’s one of the reasons we
choose degenerate tests: they are almost always easy to make pass.
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
if (n>1)
factors.add(2);
return factors;
}
If you watched the video, you saw that this was done in two steps. The first
step was to extract the new ArrayList<>() as a variable named factors. The
second step was to add the if statement.
I emphasize these two steps because the first follows Rule 5.
Rule 5: Generalize where possible.
I sat at my kitchen table and marveled that a simple one-character change that
replaced a constant with a variable, a simple generalization, made the new
test pass and kept all previous tests passing.
I’d say we were on a roll, but the next test is going to be disappointing. The
test itself is obvious:
Click here to view code image
assertThat(factorsOf(4), contains(2, 2));
But how do we solve that by generalizing? I can’t think of a way. The only
solution I can think of is to test whether n is divisible by 2, and that just isn’t
very general. Nevertheless,
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
if (n>1) {
if (n%2 == 0) {
factors.add(2);
n /= 2;
}
factors.add(n);
}
return factors;
}
Not only is this not very general; it also fails a previous test. It fails the test
for the prime factors of 2. The reason should be clear. When we reduce n by
a factor of 2, it becomes 1, which then gets put into the list.
We can fix that with some even less general code:
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
if (n > 1) {
if (n % 2 == 0) {
factors.add(2);
n /= 2;
}
if (n > 1)
factors.add(n);
}
return factors;
}
That’s a pretty good indication that we are on the right track, and it makes me
feel better about those ugly if statements.
The next most degenerate test is 8, and it must fail because our solution code
simply cannot put three things into the list:
Click here to view code image
assertThat(factorsOf(8), contains(2, 2, 2));
But this is horrific. Not only is it a gross violation of Rule 5, but it’s also a
huge duplication of code. I’m not sure which rule violation is worse!
And this is where the generalization mantra kicks in:
As the tests get more specific, the code gets more generic.
Every new test we write makes the test suite more specific. Every time we
invoke Rule 5, the solution code gets more generic. We’ll come back to this
mantra later. It turns out to be critically important for test design and for the
prevention of fragile tests.
We can eliminate the violation of duplication, and Rule 5, by putting the
original factoring code into a loop:
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
int divisor = 2;
while (n > 1) {
while (n % divisor == 0) {
factors.add(divisor);
n /= divisor;
}
divisor++;
}
if (n > 1)
factors.add(n);
return factors;
}
Once again, if you watch the video, you will see that this was done in several
steps. The first step was to extract the three 2s into the divisor variable. The
next step was to introduce the divisor++ statement. Then the initialization of
the divisor variable was moved above the if statement. Finally, the if was
changed into a while.
There it is again: that transition from if->while. Did you notice that the
predicate of the original if statement turned out to be the predicate for the
outer while loop? I found this to be startling. There’s something genetic about
it. It’s as though the creature that I’ve been trying to create started from a
simple seed and gradually evolved through a sequence of tiny mutations.
Notice that the if statement at the bottom has become superfluous. The only
way the loop can terminate is if n is 1. That if statement really was the
terminating condition of an unwound loop!
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
int divisor = 2;
while (n > 1) {
while (n % divisor == 0) {
factors.add(divisor);
n /= divisor;
}
divisor++;
}
return factors;
}
return factors;
}
And we’re done. If you watch the video, you’ll see that I added one more
test, which proves that this algorithm is sufficient.
Sitting at my kitchen table, I saw those three salient lines and I had two
questions. Where did this algorithm come from, and how does it work?
Clearly, it came from my brain. It was my fingers on the keyboard after all.
But this was certainly not the algorithm I had planned on creating at the start.
Where was the sieve of Eratosthenes? Where was the list of prime numbers?
None of it was there!
Worse, why does this algorithm work? I was astounded that I could create a
working algorithm and yet not understand how it functioned. I had to study it
for a while to figure it out. My dilemma was the divisor++ incrementer of
the outer loop, which guarantees that every integer will be checked as a
factor, including composite factors! Given the integer 12, that incrementer
will check whether 4 is a factor. Why doesn’t it put 4 in the list?
The answer is in the order of the execution, of course. By the time the
incrementer gets to 4, all the 2s have been removed from n. And if you think
about that for a bit, you’ll realize that it is the sieve of Eratosthenes—but in a
very different form than usual.
The bottom line here is that I derived this algorithm one test case at a time. I
did not think it through up front. I didn’t even know what this algorithm was
going to look like when I started. The algorithm seemed to almost put itself
together before my eyes. Again, it was like an embryo evolving one small
step at a time into an ever-more-complex organism.
Even now, if you look at those three lines, you can see the humble beginnings.
You can see the remnants of that initial if statement and fragments of all the
other changes. The breadcrumbs are all there.
And we are left with a disturbing possibility. Perhaps TDD is a general
technique for incrementally deriving algorithms. Perhaps, given a properly
ordered suite of tests, we can use TDD to derive any computer program in a
step-by-step, determinative, manner.
In 1936, Alan Turing and Alonzo Church separately proved that there was no
general procedure for determining if there was a program for any given
problem.3 In so doing, they separately, and respectively, invented procedural
and functional programming. Now TDD looks like it might be a general
procedure for deriving the algorithms that solve the problems that can be
solved.
3. This was Hilbert’s “decidability problem.” He asked whether there was a generalized way to prove
that any given Diophantine equation was solvable. A Diophantine equation is a mathematical function
with integer inputs and outputs. A computer program is also a mathematical function with integer
inputs and outputs. Therefore, Hilbert’s question can be described as pertaining to computer
programs.
On the player’s first attempt, he knocked down one pin. On his second
attempt, he knocked down four more, for a total of 5 points.
In the second frame, he rolled a 4 followed by a 5, giving him a 9 for the
frame and a total of 14.
In the third frame, he rolled a 6 followed by a 4 (a spare). The total for that
frame cannot be computed until the player begins the next frame.
In the fourth frame, the player rolls a 5. Now the score can be computed for
the previous frame, which is 15, for a total of 29 in the third frame.
The spare in the fourth frame must wait until the fifth frame, for which the
player rolls a strike. The fourth frame is therefore 20 points for a total of 49.
The strike in the fifth frame cannot be scored until the player rolls the next
two balls for the sixth frame. Unfortunately, he rolls a 0 and a 1, giving him
only 11 points for the fifth frame and a total of 60.
And on it goes until the tenth and final frame. Here the player rolls a spare
and is thus allowed to roll one extra ball to finish out that spare.
Now, you are a programmer, a good object-oriented programmer. What are
the classes and relationships you would use to represent the problem of
computing the score of a game of bowling? Can you draw it in UML?4
4. The Unified Modeling Language. If you don’t know UML, don’t worry—it’s just arrows and
rectangles. You’ll work it out from the description in the text.
Perhaps you might come up with something like what is shown in Figure 2.5.
Figure 2.5 A UML diagram of the scoring of bowling
The Game has ten Frames. Each Frame has one or two Rolls, except the
TenthFrame subclass, which inherits the 1..2 and adds one more roll to make
it 2..3. Each Frame object points to the next Frame so that the score function
can look ahead in case it has to score a spare or a strike.
The Game has two functions. The roll function is called every time the player
rolls a ball, and it is passed the number of pins the player knocked down.
The score function is called once all the balls are rolled, and it returns the
score for the entire game.
That’s a nice, simple object-oriented model. It ought to be pretty easy to
code. Indeed, given a team of four people, we could divide the work into the
four classes and then meet a day or so later to integrate them and get them
working.
Or, we could use TDD. If you can watch the video, you should do that now.
In any case, please read through the text that follows.
Watch related video: Bowling Game
Access video by registering at informit.com/register
We begin, as always, with a test that does nothing, just to prove we can
compile and execute. Once this test runs, we delete it:
Click here to view code image
public class BowlingTest {
@Test
public void nothing() throws Exception {
}
}
Then we make that compile and pass by directing our IDE to create the
missing class:
public class Game {
}
And then we make that compile and pass by directing the IDE to create the
roll function, and we give the argument a reasonable name:
You’re probably already bored. This should be nothing new to you by now.
But bear with me—it’s about to get interesting. There’s a bit of duplication in
the tests already. We should get rid of it, so we factor out the creating of the
game into the setup function:
Click here to view code image
public class BowlingTest {
private Game g;
@Before
public void setUp() throws Exception {
g = new Game();
}
}
This makes the first test completely empty, so we delete it. The second test is
also pretty useless because it doesn’t assert anything. We delete it as well.
Those tests served their purpose. They were stairstep tests.
Stairstep tests: Some tests are written just to force us to create classes or functions or
other structures that we’re going to need. Sometimes these tests are so degenerate that
they assert nothing. Other times they assert something very naive. Often these tests are
superseded by more comprehensive tests later and can be safely deleted. We call these
kinds of tests stairstep tests because they are like stairs that allow us to incrementally
increase the complexity to the appropriate level.
Next, we want to assert that we can score a game. But to do that, we need to
roll a complete game. Remember that the score function can be called only
after all the balls in a game are rolled.
We fall back on Rule 4 and roll the simplest, most degenerate game we can
think of:
Click here to view code image
@Test
public void gutterGame() throws Exception {
for (int i=0; i<20; i++)
g.roll(0);
assertEquals(0, g.score());
}
Making this pass is trivial. We just need to return zero from score. But first
we return -1 (not shown) just to see it fail. Then we return zero to see it pass:
Click here to view code image
public class Game {
public void roll(int pins) {
}
Okay, I said this was about to get interesting, and it is. Just one more little
setup. The next test is another example of Rule 4. The next most degenerate
test I can think of is all ones. We can write this test with a simple copy paste
from the last test:
Click here to view code image
@Test
public void allOnes() throws Exception {
for (int i=0; i<20; i++)
g.roll(1);
assertEquals(20, g.score());
}
This creates some duplicate code. The last two tests are virtually identical.
When we refactor, we’ll have to fix that. But first, we need to make this test
pass, and that’s really easy. All we need to do is add up all the rolls:
Click here to view code image
public class Game {
private int score;
Of course, this is not the right algorithm for scoring bowling. Indeed, it is
hard to see how this algorithm could ever evolve into becoming the rules for
scoring bowling. So, I’m suspicious—I expect squalls in the coming tests.
But for now, we must refactor.
The duplication in the tests can be eliminated by extracting a function called
rollMany. The IDE’s Extract Method refactoring helps immensely with this
and even detects and replaces both instances of the duplication:
Click here to view code image
public class BowlingTest {
private Game g;
@Before
public void setUp() throws Exception {
g = new Game();
}
@Test
public void gutterGame() throws Exception {
rollMany(20, 0);
assertEquals(0, g.score());
}
@Test
public void allOnes() throws Exception {
rollMany(20, 1);
assertEquals(20, g.score());
}
}
Okay, next test. It’s hard to think of something degenerate at this point, so we
might as well try for a spare. We’ll keep it simple though: one spare, with
one extra bonus ball, and all the rest gutter balls.
Click here to view code image
@Test
public void oneSpare() throws Exception {
rollMany(2, 5); // spare
g.roll(7);
rollMany(17, 0);
assertEquals(24, g.score());
}
Let’s check my logic: This game has two balls in each frame. The first two
balls are the spare. The next ball is the ball after the spare, and the 17 gutter
balls complete the game.
The score in the first frame is 17, which is 10 plus the 7 rolled in the next
frame. The score for the whole game, therefore, is 24 because the 7 is
counted twice. Convince yourself that this is correct.
This test fails, of course. So how do we get it to pass? Let’s look at the code:
Click here to view code image
public class Game {
private int score;
public void roll(int pins) {
score += pins;
}
The score is being calculated in the roll function, so we need to modify that
function to account for the spare. But that will force us to do something really
ugly, like this:
Click here to view code image
public void roll(int pins) {
if (pins + lastPins == 10) { // horrors!
//God knows what…
}
score += pins;
}
That lastPins variable must be a field of the Game class that remembers what
the last roll was. And if the last roll and this roll add up to 10, then that’s a
spare. Right? Ew!
You should feel your sphincter muscles tightening. You should feel your
gorge rise and a tension headache beginning to build. The angst of the
software craftsman should be raising your blood pressure.
This is just wrong!
We’ve all had that feeling before, haven’t we? The question is, what do you
do about it?
Whenever you get that feeling that something is wrong, trust it! So, what is it
that’s wrong?
There’s a design flaw. You might rightly wonder how there could be a design
flaw in two executable lines of code. But the design flaw is there; it’s blatant,
and it’s deeply harmful. As soon as I tell you what it is you’ll recognize it
and agree with me. Can you find it?
Jeopardy song interlude.
I told you what the design flaw was at the very start. Which of the two
functions in this class claims, by its name, to calculate the score? The score
function, of course. Which function actually does calculate the score? The
roll function. That’s misplaced responsibility.
Misplaced responsibility: A design flaw in which the function that claims to perform a
computation does not actually perform the computation. The computation is performed
elsewhere.
How many times have you gone to the function that claims to do some task,
only to find that function does not do that task? And now you have no idea
where, in the system, that task is actually done. Why does this happen?
Clever programmers. Or rather, programmers who think they are clever.
It was very clever of us to sum up the pins in the roll function, wasn’t it? We
knew that function was going to be called once per roll, and we knew that all
we had to do was sum up the rolls, so we just put that addition right there.
Clever, clever, clever. And that cleverness leads us to Rule 6.
Rule 6: When the code feels wrong, fix the design before proceeding.
So how do we fix this design flaw? The calculation of the score is in the
wrong place, so we’re going to have to move it. By moving it, perhaps we’ll
be able to figure out how to pass the spare test.
Moving the calculation means that the roll function is going to have to
remember all the rolls in something like an array. Then the score function can
sum up the array.
Click here to view code image
public class Game {
private int rolls[] = new int[21];
private int currentRoll = 0;
This fails the spare test, but it passes the other two tests. What’s more, the
spare test fails for the same reason as before. So, although we’ve completely
changed the structure of the code, the behavior remains unchanged. By the
way, that is the definition of refactoring.
Refactoring: A change to the structure of the code that has no effect upon the behavior.5
5. Martin Fowler, Refactoring: Improving the Design of Existing Code, 2nd ed. (Addison-Wesley,
2019).
Can we pass the spare case now? Well, maybe, but it’s still icky:
Click here to view code image
public int score() {
int score = 0;
for (int i = 0; i < rolls.length; i++) {
if (rolls[i] + rolls[i+1] == 10) { // icky
// What now?
}
score += rolls[i];
}
return score;
}
Is that right? No, it’s wrong, isn’t it? It only works if i is even. To make that
if statement actually detect a spare, it would have to look like this:
So, we’re back to Rule 6—there’s another design problem. What could it be?
Look back to the UML diagram earlier in this chapter. That diagram shows
that the Game class should have ten Frame instances. Is there any wisdom in
that? Look at our loop. At the moment, it’s going to loop 21 times! Does that
make any sense?
Let me put it this way. If you were about to review the code for scoring
bowling—code that you’d never seen before—what number would you
expect to see in that code? Would it be 21? Or would it be 10?
I hope you said 10, because there are 10 frames in a game of bowling. Where
is the number 10 in our scoring algorithm? Nowhere!
How can we get the number 10 into our algorithm? We need to loop through
the array one frame at a time. How do we do that?
Well, we could loop through the array two balls at a time, couldn’t we? I
mean like this:
Click here to view code image
public int score() {
int score = 0;
int i = 0;
for (int frame = 0; frame<10; frame++) {
score += rolls[i] + rolls[i+1];
i += 2;
}
return score;
}
Once again, this passes the first two tests and fails the spare test for the same
reason as before. So, no behavior was changed. This was a true refactoring.
You might be ready to tear this book up now because you know that looping
through the array two balls at a time is just plain wrong. Strikes have only
one ball in their frame, and the tenth frame could have three.
True enough. However, so far none of our tests has used a strike, or the tenth
frame. So, for the moment, two balls per frame works fine.
Can we pass the spare case now? Yes. It’s trivial:
Click here to view code image
public int score() {
int score = 0;
int i = 0;
for (int frame = 0; frame < 10; frame++) {
if (rolls[i] + rolls[i + 1] == 10) { // spare
score += 10 + rolls[i + 2];
i += 2;
} else {
score += rolls[i] + rolls[i + 1];
i += 2;
}
}
return score;
}
This passes the spare test. Nice. But the code is pretty ugly. We can rename i
to frameIndex and get rid of that ugly comment by extracting a nice little
method:
Click here to view code image
public int score() {
int score = 0;
int frameIndex = 0;
for (int frame = 0; frame < 10; frame++) {
if (isSpare(frameIndex)) {
score += 10 + rolls[frameIndex + 2];
frameIndex += 2;
} else {
score += rolls[frameIndex] + rolls[frameIndex + 1];
frameIndex += 2;
}
}
return score;
}
That’s better. We can also clean up the ugly comment in the spare test by
doing the same:
Click here to view code image
private void rollSpare() {
rollMany(2, 5);
}
@Test
public void oneSpare() throws Exception {
rollSpare();
g.roll(7);
rollMany(17, 0);
assertEquals(24, g.score());
}
Replacing comments with pleasant little functions like this is almost always a
good idea. The folks who read your code later will thank you.
So, what’s the next test? I suppose we should try a strike:
Click here to view code image
@Test
public void oneStrike() throws Exception {
g.roll(10); // strike
g.roll(2);
g.roll(3);
rollMany(16, 0);
assertEquals(20, g.score());
}
Convince yourself that this is correct. There’s the strike, the 2 bonus balls,
and 16 gutter balls to fill up the remaining eight frames. The score is 15 in
the first frame and 5 in the second. All the rest are 0 for a total of 20.
This test fails, of course. What do we have to do to make it pass?
Click here to view code image
public int score() {
int score = 0;
int frameIndex = 0;
for (int frame = 0; frame < 10; frame++) {
if (rolls[frameIndex] == 10) { // strike
score += 10 + rolls[frameIndex+1] +
rolls[frameIndex+2];
frameIndex++;
}
else if (isSpare(frameIndex)) {
score += 10 + rolls[frameIndex + 2];
frameIndex += 2;
} else {
score += rolls[frameIndex] + rolls[frameIndex + 1];
frameIndex += 2;
}
}
return score;
}
This passes. Note that we increment frameIndex only by one. That’s because
a strike has only one ball in a frame—and you were so worried about that,
weren’t you?
This is a very good example of what happens when you get the design right.
The rest of the code just starts trivially falling into place. Pay special
attention to Rule 6, boys and girls, and get the design right early. It will save
you immense amounts of time.
We can clean this up quite a bit. That ugly comment can be fixed by extracting
an isStrike method. We can extract some of that ugly math into some
pleasantly named functions too. When we’re done, it looks like this:
Click here to view code image
public int score() {
int score = 0;
int frameIndex = 0;
for (int frame = 0; frame < 10; frame++) {
if (isStrike(frameIndex)) {
score += 10 + strikeBonus(frameIndex);
frameIndex++;
} else if (isSpare(frameIndex)) {
score += 10 + spareBonus(frameIndex);
frameIndex += 2;
} else {
score += twoBallsInFrame(frameIndex);
frameIndex += 2;
}
}
return score;
}
We can also clean up the ugly comment in the test by extracting a rollStrike
method:
Click here to view code image
@Test
public void oneStrike() throws Exception {
rollStrike();
g.roll(2);
g.roll(3);
rollMany(16, 0);
assertEquals(20, g.score());
}
What’s the next test? We haven’t tested the tenth frame yet. But I’m starting to
feel pretty good about this code. I think it’s time to break Rule 3 and go for
the gold. Let’s test a perfect game!
Click here to view code image
@Test
public void perfectGame() throws Exception {
rollMany(12, 10);
assertEquals(300, g.score());
}
We roll a strike in the first nine frames, and then a strike and two 10s in the
tenth frame. The score, of course, is 300—everybody knows that.
What’s going to happen when I run this test? It should fail, right? But, no! It
passes! It passes because we are done! The score function is the solution.
You can prove that to yourself by reading it. Here, you follow along while I
read it to you:
Click here to view code image
For each of the ten frames
If that frame is a strike,
Then the score is 10 plus the strike bonus
(the next two balls).
If that frame is a spare,
Then the score is 10 plus the spare bonus
(the next ball).
Otherwise,
The score is the two balls in the frame.
The code reads like the rules for scoring bowling. Go to the beginning of
this chapter and read those rules again. Compare them to the code. And then
ask yourself if you’ve ever seen requirements and code that are so closely
aligned.
Some of you may be confused about why this works. You look at the tenth
frame on that scorecard and you see that it doesn’t look like any of the other
frames; and yet there is no code in our solution that makes the tenth frame a
special case. How can that be?
The answer is that the tenth frame is not special at all. It is drawn differently
on the score card, but it is not scored differently. There is no special case for
the tenth frame.
And we were going to make a subclass out of it!
Look back at that UML diagram. We could have divvied out the tasks to three
or four programmers and integrated a day or two later. And the tragedy is that
we’d have gotten it working. We would have celebrated the 4006 lines of
working code, never knowing that the algorithm was a for loop and two if
statements that fits into 14 lines of code.
6. I know it’s 400 lines of code because I’ve written it.
Did you see the solution early? Did you see the for loop and two if
statements? Or did you expect one of the tests to eventually force me to write
the Frame class? Were you holding out for the tenth frame? Is that where you
thought all the complexity would be found?
Did you know we were done before we ran the tenth frame test? Or did you
think there was a lot more to do? Isn’t it fascinating that we can write a test
fully expecting more work ahead, only to find, to our surprise, that we are
done?
Some folks have complained that if we’d followed the UML diagram shown
earlier, we’d have wound up with code that was easier to change and
maintain. That’s pure bollocks! What would you rather maintain, 400 lines of
code in four classes or 14 lines with a for loop and two if statements?
Conclusion
In this chapter, we studied the motivations and basics of TDD. If you’ve
gotten this far, your head may be spinning. We covered a lot of ground. But
not nearly enough. The next chapter goes significantly deeper into the topic of
TDD, so you may wish to take a little rest before turning the page.
3 Advanced TDD
Hold on to your hats. The ride is about to get fast and bumpy. To quote Dr.
Morbius as he led a tour of the Krell machine: “Prepare your minds for a
new scale of scientific values.”
Sort 1
The last two examples in Chapter 2, “Test-Driven Development,” begged an
interesting question. Where does the algorithm that we derive using test-
driven development (TDD) come from? Clearly it comes from our brain, but
not in the way we are used to. Somehow, the sequence of failing tests coaxes
the algorithm out of our brain without the need to think it all through
beforehand.
This raises the possibility that TDD may be a step-by-step, incremental
procedure for deriving any algorithm for any problem. Think of it like
solving a mathematical or geometric proof. You start from basic postulates—
the degenerate failing tests. Then, one step at a time, you build up the solution
to the problem.
At each step, the tests get more and more constraining and specific, but the
production code gets more and more generic. This process continues until the
production code is so general that you can’t think of any more tests that will
fail. That solves the entire problem.
Let’s try this out. Let’s use the approach to derive an algorithm for sorting an
array of integers.
If you can watch the video, this would be a good time, but in any case, please
continue on to read the text that follows.
Watch related video: SORT 1
Access video by registering at informit.com/register
We begin, as usual, with a test that does nothing:
Click here to view code image
public class SortTest {
@Test
public void nothing() throws Exception {
}
}
The first failing test will be the degenerate case of an empty array:
Click here to view code image
public class SortTest {
@Test
public void sorted() throws Exception {
assertEquals(asList(), sort(asList()));
}
private List<Integer> sort(List<Integer> list) {
return null;
}
}
Stepping up one level in degeneracy, we try a list with one integer in it:
Click here to view code image
assertEquals(asList(1), sort(asList(1)));
This clearly fails, but we can make it pass by making the production code a
bit more general:
Click here to view code image
private List<Integer> sort(List<Integer> list) {
return list;
}
Cute, right? And we saw that trick before, in the prime factors example in
Chapter 2. It seems relatively common that the first two tests in a given
problem are solved by returning the most degenerate answer, followed by the
input argument.
The next test is trivial because it already passes: two elements in order. It
could be argued that we shouldn’t even have written it because it’s not a
failing test. But it’s nice to see these tests pass.
Click here to view code image
assertEquals(asList(1, 2), sort(asList(1, 2)));
If we reverse the order of the input array, the test will fail: two elements out
of order.
Click here to view code image
assertEquals(asList(1, 2), sort(asList(2, 1)));
Maybe you can see where this is headed. If so, don’t spoil the surprise for
everyone else. Also, remember this moment—we’ll be returning to it in the
next section.
The next two tests pass already. In the first test, the input array is already in
order. In the second test, the first two elements are out of order, and our
current solution swaps them.
Click here to view code image
assertEquals(asList(1, 2, 3), sort(asList(1, 2, 3)));
assertEquals(asList(1, 2, 3), sort(asList(2, 1, 3)));
The next failing test is three elements with the second two out of order:
Click here to view code image
assertEquals(asList(1, 2, 3), sort(asList(2, 3, 1)));
We get this one to pass by putting our compare and swap algorithm into a
loop that walks down the length of the list:
Click here to view code image
private List<Integer> sort(List<Integer> list) {
if (list.size() > 1) {
for (int firstIndex=0; firstIndex < list.size()-1;
firstIndex++) {
int secondIndex = firstIndex + 1;
if (list.get(firstIndex) > list.get(secondIndex)) {
int first = list.get(firstIndex);
int second = list.get(secondIndex);
list.set(firstIndex, second);
list.set(secondIndex, first);
}
}
}
return list;
}
Can you tell where this is going yet? Most of you likely do. Anyway, the next
failing test case is three elements in reverse order:
Click here to view code image
assertEquals(asList(1, 2, 3), sort(asList(3, 2, 1)));
The failure results are telling. The sort function returns [2, 1, 3]. Note that
the 3 got moved all the way to the end of the list. That’s good! But the first
two elements are still out of order. It’s not hard to see why. The 3 got
swapped with the 2, and then the 3 got swapped with the 1. But that left the 2
and 1 still out of order. They need to be swapped again.
So, the way to get this test to pass is to put the compare and swap loop into
another loop that incrementally reduces the length of the comparing and
swapping. Maybe that’s easier to read in code:
Click here to view code image
private List<Integer> sort(List<Integer> list) {
if (list.size() > 1) {
for (int limit = list.size() - 1; limit > 0; limit--) {
for (int firstIndex = 0; firstIndex < limit; firstIndex++) {
int secondIndex = firstIndex + 1;
if (list.get(firstIndex) > list.get(secondIndex)) {
int first = list.get(firstIndex);
int second = list.get(secondIndex);
list.set(firstIndex, second);
list.set(secondIndex, first);
}
}
}
}
return list;
}
Sort 2
Let’s try this again. This time we’ll choose a slightly different pathway. And
again, view the video if possible, but then continue reading from this point.
Watch related video: SORT 2
Access video by registering at informit.com/register
We begin as we did before with the most degenerate tests possible and the
code that makes them pass:
Click here to view code image
public class SortTest {
@Test
public void testSort() throws Exception {
assertEquals(asList(), sort(asList()));
assertEquals(asList(1), sort(asList(1)));
assertEquals(asList(1, 2), sort(asList(1, 2)));
}
private List<Integer> sort(List<Integer> list) {
return list;
}
}
But now, instead of comparing and swapping them within the input list, we
compare and create an entirely new list with the elements in the right order:
Click here to view code image
private List<Integer> sort(List<Integer> list) {
if (list.size() <= 1)
return list;
else {
int first = list.get(0);
int second = list.get(1);
if (first > second)
return asList(second, first);
else
return asList(first, second);
}
}
This is a good moment to pause and reflect. When we faced this test in the
previous section, we blithely wrote the compare and swap solution as though
it were the only possible way to pass the test. But we were wrong. In this
example, we see another way.
This tells us that from time to time we may encounter failing tests that have
more than one possible solution. Think of these as forks in the road. Which
fork should we take?
Let’s watch how this fork proceeds.
The obvious next test is, as it was before, three elements in order:
Click here to view code image
assertEquals(asList(1, 2, 3), sort(asList(1, 2, 3)));
But unlike in the previous example, this test fails. It fails because none of the
pathways through the code can return a list with more than two elements.
However, making it pass is trivial:
Click here to view code image
private List<Integer> sort(List<Integer> list) {
if (list.size() <= 1)
return list;
else if (list.size() == 2){
int first = list.get(0);
int second = list.get(1);
if (first > second)
return asList(second, first);
else
return asList(first, second);
}
else {
return list;
}
}
Of course, this is silly, but the next test—three elements with the first two out
of order—eliminates the silliness. This obviously fails:
Click here to view code image
assertEquals(asList(1, 2, 3), sort(asList(2, 1, 3)));
How the devil are we going to get this to pass? There are only two
possibilities for a list with two elements, and our solution exhausts both of
those possibilities. But with three elements, there are six possibilities. Are
we really going to decode and construct all six possible combinations?
No, that would be absurd. We need a simpler approach. What if we use the
law of trichotomy?
The law of trichotomy says that given two numbers A and B, there are only
three possible relationships between them: A < B, A = B, or A > B. Okay, so
let’s arbitrarily pick one of the elements of the list and then decide which of
those relationships it has with the others.
The relevant code looks like this:
Click here to view code image
else {
int first = list.get(0);
int middle = list.get(1);
int last = list.get(2);
List<Integer> lessers = new ArrayList<>();
List<Integer> greaters = new ArrayList<>();
So far, we’ve tried four of the six possible cases for a list of three unique
elements. If we had tried the other two, [2,3,1] and [3,1,2], as we should
have, both of them would have failed.
But due either to impatience or oversight, we move on to test a lists with four
elements:
Click here to view code image
assertEquals(asList(1, 2, 3, 4), sort(asList(1, 2, 3, 4)));
This fails, of course, because the current solution assumes that the list has no
more than three elements. And, of course, our simplification of first,
middle, and last breaks down with four elements. This may make you
wonder why we chose the middle as element 1. Why couldn’t it be element
0?
So, let’s comment out that last test and change middle to element 0:
int first = list.get(1);
int middle = list.get(0);
int last = list.get(2);
Surprise—the [1,3,2] test fails. Can you see why? If middle is 1, then the 3
and 2 get added to the greaters list in the wrong order.
Now, it just so happens that our solution already knows how to sort a list
with two elements in it. And greaters is such a list, so we can make this
pass by calling sort on the greaters list:
Click here to view code image
List<Integer> result = new ArrayList<>();
result.addAll(lessers);
result.add(middle);
result.addAll(sort(greaters));
return result;
That caused the [1,3,2] test to pass but failed the [3,2,1] test because the
lessers list was out of order. But that’s a pretty easy fix:
So, yeah, we should have tried the two remaining cases of three elements
before going on to the four-element list.
Rule 7: Exhaust the current simpler case before testing the next more complex case.
It should come as no surprise that this passes and also passes the next two
tests:
Click here to view code image
assertEquals(asList(1, 2, 3, 4), sort(asList(2, 1, 3, 4)));
assertEquals(asList(1, 2, 3, 4), sort(asList(4, 3, 2, 1)));
But now you may be wondering about that middle. What if the middle element
was not unique in the list? Let’s try that:
Click here to view code image
assertEquals(asList(1, 1, 2, 3), sort(asList(1, 3, 1, 2)));
Yeah, that fails. That just means we should stop treating the middle as
something special:
Click here to view code image
else {
int middle = list.get(0);
List<Integer> middles =
list.stream().filter(x -> x == middle).collect(toList());
List<Integer> lessers =
list.stream().filter(x -> x<middle).collect(toList());
List<Integer> greaters =
list.stream().filter(x -> x>middle).collect(toList());
This passes. However, look up at that else. Remember what’s above it?
Here, I’ll show you:
Click here to view code image
if (list.size() <= 1)
return list;
else if (list.size() == 2){
int first = list.get(0);
int second = list.get(1);
if (first > second)
return asList(second, first);
else
return asList(first, second);
}
Is that ==2 case really necessary anymore? No. Removing it still passes all
tests.
Okay, so what about that first if statement? Is that still necessary? Actually, it
can be changed to something better. And, in fact, let me just show you the
final algorithm:
Click here to view code image
private List<Integer> sort(List<Integer> list) {
List<Integer> result = new ArrayList<>();
if (list.size() == 0)
return result;
else {
int middle = list.get(0);
List<Integer> middles =
list.stream().filter(x -> x == middle).collect(toList());
List<Integer> lessers =
list.stream().filter(x -> x < middle).collect(toList());
List<Integer> greaters =
list.stream().filter(x -> x > middle).collect(toList());
result.addAll(sort(lessers));
result.addAll(middles);
result.addAll(sort(greaters));
return result;
}
}
This algorithm has a name. It’s called quick sort. It is one of the best sorting
algorithms known.
How much better it is? This algorithm, on my laptop, can sort an array of 1
million random integers between zero and a million in 1.5 seconds. The
bubble sort from the previous section will sort the same list in about six
months. So … better.
And this leads us to a disturbing observation. There were two different
solutions to sorting a list with two elements out of order. One solution led us
directly to a bubble sort, the other solution led us directly to a quick sort.
This means that identifying forks in the road, and choosing the right path, can
sometimes be pretty important. In this case, one path led us to a pretty poor
algorithm, and the other led us to a very good algorithm.
Can we identify these forks and determine which path to choose? Perhaps.
But that’s a topic for a more advanced chapter.
Getting Stuck
At this point, I think you’ve seen enough videos to get a good idea about the
rhythm of TDD. From now on, we’ll forego the videos and just rely on the
text of the chapter.
It often happens, to TDD novices, that they find themselves in a pickle. They
write a perfectly good test and then discover that the only way to make that
test pass is to implement the entire algorithm at once. I call this “getting
stuck.”
The solution to getting stuck is to delete the last test you wrote and find a
simpler test to pass.
Rule 8: If you must implement too much to get the current test to pass, delete that test and
write a simpler test that you can more easily pass.
How would you write this algorithm test-first? We might begin as usual with
this failing test:
Click here to view code image
public class WrapTest {
@Test
public void testWrap() throws Exception {
assertEquals("Four", wrap("Four", 7));
}
private String wrap(String s, int w) {
return null;
}
}
How many TDD rules did we break with this test? Can you name them? Let’s
proceed anyway. It’s easy to make this test pass:
Click here to view code image
private String wrap(String s, int w) {
return "Four";
}
And the code that makes that pass is pretty obvious too:
Click here to view code image
private String wrap(String s, int w) {
return s.replace(" ", "\n");
}
Just replace all spaces with line ends. Perfect. Before we continue, let’s
clean this up a bit:
Click here to view code image
private void assertWrapped(String s, int width, String expected) {
assertEquals(expected, wrap(s, width));
}
@Test
public void testWrap() throws Exception {
assertWrapped("Four", 7, "Four");
assertWrapped("Four score", 7, "Four\nscore");
}
That’s better. Now, the next failing test. If we simply follow the Gettysburg
Address along, the next failure would be
Click here to view code image
assertWrapped("Four score and seven years ago our", 7,
"Four\nscore\nand\nseven\nyears\nago our");
That does, in fact, fail. We can tighten that failure up a bit like so:
Click here to view code image
assertWrapped("ago our", 7, "ago our");
Okay, now how do we make that pass? It looks like we need to not replace
all spaces with line ends. So, which ones do we replace? Or, should we go
ahead and replace all spaces with line ends, and then figure out which line
ends to put back?
I’ll let you ponder this for a while. I don’t think you’ll find an easy solution.
And that means we’re stuck. The only way to get this test to pass is to invent
a very large part of the word-wrap algorithm at once.
The solution to being stuck is to delete one or more tests and replace them
with simpler tests that you can pass incrementally. Let’s try that:
Click here to view code image
@Test
public void testWrap() throws Exception {
assertWrapped("", 1, "");
}
private String wrap(String s, int w) {
return "";
}
Ah, yes, that’s a truly degenerate test, isn’t it? That’s one of the rules we
forgot earlier.
Okay, what’s the next most degenerate test? How about this?
assertWrapped("x", 1, "x");
There’s that pattern again. We get the first test to pass by returning the
degenerate constant. We get the second test to pass by returning the input.
Interesting. Okay, so what’s the next most degenerate test?
Click here to view code image
assertWrapped("xx", 1, "x\nx");
That fails because it returns "xx". It’s not hard to get it to pass, though:
Click here to view code image
private String wrap(String s, int w) {
if (w >= s.length())
return s;
else
return s.substring(0, w) + "\n" + s.substring(w);
That was easy. So, what’s the next most degenerate test?
Click here to view code image
assertWrapped("xx", 2, "xx");
This fails. And it implies some kind of a loop. But wait. There’s an easier
way:
Click here to view code image
private String wrap(String s, int w) {
if (w >= s.length())
return s;
else
return s.substring(0, w) + "\n" + wrap(s.substring(w), w);
}
There’s probably no point in continuing with this pattern. It’s time for some
spaces:
Click here to view code image
assertWrapped("x x", 1, "x\nx");
This one fails because it returns "x\n \nx". We can fix that by eliminating
any prefix spaces before we do the recursive call to wrap.
Click here to view code image
return s.substring(0, w) + "\n" + wrap(s.substring(w).trim(), w);
This passes. And now we have a new test pattern to follow. So, the next test
is
Click here to view code image
assertWrapped("x x", 2, "x\nx");
This fails because the first substring has a trailing space. We can get rid of
that with another call to trim:
Click here to view code image
return s.substring(0, w).trim() + "\n" +
wrap(s.substring(w).trim(), w);
And this passes. The next test in the sequence passes too:
Click here to view code image
assertWrapped("x x", 3, "x x");
They all pass. There’s probably not much point in adding the fourth x.
Let’s try this:
Click here to view code image
assertWrapped("xx xx", 1, "x\nx\nx\nx");
That one passes. And so do the next two tests in the sequence:
Click here to view code image
assertWrapped("xx xx", 2, "xx\nxx");
assertWrapped("xx xx", 3, "xx\nxx");
This passes. I have a feeling that we are done. But let’s try a few more test
cases:
Click here to view code image
assertWrapped("xx xx", 5, "xx xx");
assertWrapped("xx xx xx", 1, "x\nx\nx\nx\nx\nx");
assertWrapped("xx xx xx", 2, "xx\nxx\nxx");
assertWrapped("xx xx xx", 3, "xx\nxx\nxx");
assertWrapped("xx xx xx", 4, "xx\nxx\nxx");
assertWrapped("xx xx xx", 5, "xx xx\nxx");
assertWrapped("xx xx xx", 6, "xx xx\nxx");
assertWrapped("xx xx xx", 7, "xx xx\nxx");
assertWrapped("xx xx xx", 8, "xx xx xx");
They all pass. I think we’re done. Let’s try the Gettysburg Address, with a
length of 15:
Four score and
seven years ago
our fathers
brought forth
upon this
continent a new
nation
conceived in
liberty and
dedicated to
the proposition
that all men
are created
equal
The arrange portion of this test is the creation of the Game in the Setup
function and the rollMany(20, 0) to set up the scoring of a gutter game.
The act portion of the test is the call to g.score().
The assert portion of the test is the assertEquals.
In the two and a half decades since I started practicing TDD, I have never
found a test that did not follow this pattern.
Enter BDD
In 2003, while practicing and teaching TDD, Dan North, in collaboration
with Chris Stevenson and Chris Matz, made the same discovery that Bill
Wake had made. However, they used a different vocabulary: Given-When-
Then (GWT).
This was the beginning of behavior-driven development (BDD).
At first, BDD was viewed as an improved way of writing tests. Dan and
other proponents liked the vocabulary better and affixed that vocabulary into
testing tools such as JBehave, and RSpec.
As an example, I can rephrase the gutterGame test in BDD terms as follows:
Click here to view code image
Given that the player rolled a game of 20 gutter balls,
When I request the score of that game,
Then the score will be zero.
It should be clear that some parsing would have to take place in order to
translate that statement into an executable test. JBehave and RSpec provided
affordances for that kind of parsing. It also ought to be clear that the TDD test
and the BDD test are synonymous.
Over time, the vocabulary of BDD drew it away from testing and toward the
problem of system specification. BDD proponents realized that even if the
GWT statements were never executed as tests, they remained valuable as
specifications of behavior.
In 2013, Liz Keogh said of BDD:
It’s using examples to talk through how an application behaves. … And having
conversations about those examples.
Still, it is hard to separate BDD entirely from testing if only because the
vocabulary of GWT and AAA are so obviously synonymous. If you have any
doubts about that, consider the following:
Given that the test data has been Arranged
When I execute the tested Act
Then the expected result is Asserted
The turnstile starts in the locked state. A coin will send it to the unlocked
state. When someone passes through, the turnstile returns to the locked state.
If someone passes without paying, the turnstile alarms. If someone drops two
coins, the extra coin is refunded.
This diagram can be turned into a state transition table as follows:
Each row in the table is a transition from the current state to the next state
triggered by the event. Each row is a triplet, just like GWT or AAA. More
important, each one of those transition triplets is synonymous with a
corresponding GWT or AAA triplet, as follows:
Click here to view code image
Given it is in the Locked state
When it gets a Coin event
Then it transitions to the Unlocked state.
What we can deduce from this is that every test you write is a transition of
the finite state machine that describes the behavior of the system.
Repeat that to yourself several times. Every test is a transition of the finite
state machine you are trying to create in your program.
Did you know that the program you are writing is a finite state machine? Of
course it is. Every program is a finite state machine. That’s because
computers are nothing more than finite state machine processors. The
computer itself transitions from one finite state to the next with every
instruction it executes.
Thus, the tests you write when practicing TDD and the behaviors you
describe while practicing BDD are simply transitions of the finite state
machine that you are attempting to create. Your test suite, if it is complete, is
that finite state machine.
The obvious question, therefore, is how do you ensure that all the transitions
you desire your state machine to handle are encoded as tests? How do you
ensure that the state machine that your tests are describing is the complete
state machine that your program should be implementing?
What better way than to write the transitions first, as tests, and then write the
production code that implements those transitions?
BDD Again
And don’t you think it’s fascinating, and perhaps even a little ironic, that the
BDD folks, perhaps without realizing it, have come to the conclusion that the
best way to describe the behavior of a system is by specifying it as a finite
state machine?
Test Doubles
In 2000, Steve Freeman, Tim McKinnon, and Philip Craig presented a paper1
called “Endo-Testing: Unit Testing with Mock Objects.” The influence this
paper had on the software community is evidenced by the pervasiveness of
the term they coined: mock. The term has since become a verb. Nowadays,
we use mocking frameworks to mock things out.
1. Steve Freeman, Tim McKinnon, and Philip Craig, “Endo-Testing: Unit Testing with Mock Objects,”
paper presented at eXtreme Programming and Flexible Processes in Software Engineering
(XP2000), https://www2.ccs.neu.edu/research/demeter/related-work/extreme-
programming/MockObjectsFinal.PDF.
In those early days, the notion of TDD was just beginning to permeate the
software community. Most of us had never applied object-oriented design to
test code. Most of us had never applied any kind of design at all to test code.
This led to all kinds of problems for test authors.
Oh, we could test the simple things like the examples you saw in the previous
chapters. But there was another class of problems that we just didn’t know
how to test. For example, how do you test the code that reacts to an
input/output (IO) failure? You can’t really force an IO device to fail in a unit
test. Or how do you test the code that interacts with an external service? Do
you have to have the external service connected for your tests? And how do
you test the code that handles external service failure?
The original TDDers were Smalltalk programmers. For them, objects were
what the universe was made of. So, although they were almost certainly using
mock objects, they likely thought nothing of it. Indeed, when I presented the
idea of a mock object in Java to one expert Smalltalker and TDDer in 1999,
his response to me was: “Too much mechanism.”
Nevertheless, the technique caught hold and has become a mainstay of TDD
practitioners.
But before we delve too much further into the technique itself, we need to
clear up a vocabulary issue. Almost all of us use the term mock object
incorrectly—at least in a formal sense. The mock objects we talk about
nowadays are very different from the mock objects that were presented in the
endo-testing paper back in 2000. So different, in fact, that a different
vocabulary has been adopted to clarify the separate meanings.
In 2007, Gerard Meszaros published xUnit Test Patterns: Refactoring Test
Code.2 In it, he adopted the formal vocabulary that we use today. Informally,
we still talk about mocks and mocking, but when we want to be precise, we
use Meszaros’s formal vocabulary.
2. Gerard Meszaros, xUnit Test Patterns: Refactoring Test Code (Addison-Wesley, 2007).
Meszaros identified five kinds of objects that fall under the informal mock
category: dummies, stubs, spies, mocks, and fakes. He called them all test
doubles.
That’s actually a very good name. In the movies, a stunt double stands in for
an actor, and a hand double stands in for closeups of an actor’s hands. A
body double stands in for shots when the body, and not the face, of the actor
is on screen. And that’s just what a test double does. A test double stands in
for another object while a test is being run.
Test doubles form a type hierarchy of sorts (Figure 3.2). Dummies are the
simplest. Stubs are dummies, spies are stubs, and mocks are spies. Fakes
stand alone.
Figure 3.2 Test doubles
The mechanism that all test doubles use (and which my Smalltalker friend
thought was “too much”) is simply polymorphism. For example, if you want
to test the code that manages an external service, then you isolate that
external service behind a polymorphic interface, and then you create an
implementation of that interface that stands in for the service. That
implementation is the test double.
But perhaps the best way to explain all this is to demonstrate it.
Dummy
Test doubles generally start with an interface—an abstract class with no
implemented methods. For example, we could start with the Authenticator
interface:
Click here to view code image
public interface Authenticator {
public Boolean authenticate(String username, String password);
}
//…
In fact, this is the precise implementation that my IDE creates when I invoke
the Implement Interface command.
Now the test can be written without all that cruft and all those nasty
dependencies:
Click here to view code image
@Test
public void whenClosed_loginIsCancelled() throws Exception {
Authenticator authenticator = new AuthenticatorDummy();
LoginDialog dialog = new LoginDialog(authenticator);
dialog.show();
boolean success = dialog.sendEvent(Event.CLOSE);
assertTrue(success);
}
Stub
As Figure 3.4 shows, a stub is a dummy; it is implemented to do nothing.
However, instead of returning zero or null, the functions of a stub return
values that drive the function being tested through the pathways that the test
wants executed.
Figure 3.4 Stub
Let’s imagine the following test that ensures that a login attempt will fail if
the Authenticator rejects the username and password:
Click here to view code image
public void whenAuthorizerRejects_loginFails() throws Exception {
Authenticator authenticator = new ?;
LoginDialog dialog = new LoginDialog(authenticator);
dialog.show();
boolean success = dialog.submit("bad username", "bad password");
assertFalse(success);
}
So, a stub is a dummy that returns test-specific values in order to drive the
system under test through the pathways being tested.
Spy
A spy (Figure 3.5) is a stub. It returns test-specific values in order to drive
the system under test through desired pathways. However, a spy remembers
what was done to it and allows the test to ask about it.
Figure 3.5 Spy
Note that the authenticate method keeps track of the number of times it was
called and the last username and password that it was called with. Notice
also that it provides accessors for these values. It is that behavior and those
accessors that make this class a spy.
Notice also that the authenticate method returns result, which can be set by
the setResult method. That makes this spy a programmable stub.
Here’s a test that might use that spy:
Click here to view code image
@Test
public void loginDialog_correctlyInvokesAuthenticator() throws
Exception {
AuthenticatorSpy spy = new AuthenticatorSpy();
LoginDialog dialog = new LoginDialog(spy);
spy.setResult(true);
dialog.show();
boolean success = dialog.submit("user", "pw");
assertTrue(success);
assertEquals(1, spy.getCount());
assertEquals("user", spy.getLastUsername());
assertEquals("pw", spy.getLastPassword());
}
The name of the test tells us a lot. This test makes sure that the LoginDialog
correctly invokes the Authenticator. It does this by making sure that the
authenticate method is called only once and that the arguments were the
arguments that were passed into submit.
A spy can be as simple as a single Boolean that is set when a particular
method is called. Or a spy can be a relatively complex object that maintains
a history of every call and every argument passed to every call.
Spies are useful as a way to ensure that the algorithm being tested behaves
correctly. Spies are dangerous because they couple the tests to the
implementation of the function being tested. We’ll have more to say about
this later.
Mock
Now, at last, we come to the true mock object (Figure 3.6). This is the mock
that Mackinnon, Freeman, and Craig described in their endo-testing paper.
Figure 3.6 The mock object
As you can see, the mock has three expectation fields that are set by the
constructor. This makes this mock a programmable mock. Notice also that the
AuthenticatorMock derives from the AuthenticatorSpy. We reuse all that spy
code in the mock.
The validate function of the mock does the final comparison. If the count,
lastPassword, and lastUsername collected by the spy match the expectations
set into the mock, then validate returns true.
Now the test that uses this mock should make perfect sense:
Click here to view code image
@Test
public void loginDialogCallToAuthenticator_validated() throws
Exception {
AuthenticatorMock mock = new AuthenticatorMock("Bob", "xyzzy",
1);
LoginDialog dialog = new LoginDialog(mock);
mock.setResult(true);
dialog.show();
boolean success = dialog.submit("Bob", "xyzzy");
assertTrue(success);
assertTrue(mock.validate());
}
We create the mock with the appropriate expectations. The username should
be "Bob", the password should be "xyzzy", and the number of times the
authenticate method is called should be 1.
Fake
Finally, we can deal with the last of the test doubles: the fake (Figure 3.7). A
fake is not a dummy, not a stub, not a spy, and not a mock. A fake is a
different kind of test double entirely. A fake is a simulator.
Figure 3.7 The fake
Long ago, in the late 1970s, I worked for a company that built a system that
was installed into telephone company facilities. This system tested telephone
lines. There was a central computer at the service center that communicated
over modem links to computers we installed in the switching offices. The
computer in the service center was called the SAC (service area computer),
and the computer in the switching office was called the COLT (central office
line tester).
The COLT interfaced to the switching hardware and could create an
electrical connection between any of the phone lines emanating from that
switching office and the measurement hardware that it controlled. The COLT
would then measure the electronic characteristics of the phone line and
report the raw results back to the SAC.
The SAC did all the analysis on those raw results in order to determine
whether there was a fault and, if so, where that fault was located.
How did we test that system?
We built a fake. The fake we built was a COLT whose switching interface
was replaced with a simulator. That simulator would pretend to dial up
phone lines and pretend to measure them. Then it would report back canned
raw results based on the phone number it was asked to test.
The fake allowed us to test the SAC communication, control, and analysis
software without having to install an actual COLT in a real telephone
company switching office or even having to install real switching hardware
and “real” phone lines.
Today, a fake is a test double that implements some kind of rudimentary
business rules so that the tests that use that fake can select how the fake
behaves. But perhaps an example would be the best explanation:
Click here to view code image
@Test
public void badPasswordAttempt_loginFails() throws Exception {
Authenticator authenticator = new FakeAuthenticator();
LoginDialog dialog = new LoginDialog(authenticator);
dialog.show();
boolean success = dialog.submit("user", "bad password");
assertFalse(success);
}
@Test
public void goodPasswordAttempt_loginSucceeds() throws Exception {
Authenticator authenticator = new FakeAuthenticator();
LoginDialog dialog = new LoginDialog(authenticator);
dialog.show();
boolean success = dialog.submit("user", "good password");
assertTrue(success);
}
These two tests use the same FakeAuthorizer but pass it a different
password. The tests expect that bad password will fail the login attempt and
that good password will succeed.
The code for FakeAuthenticator should be easy to envision:
Click here to view code image
public class FakeAuthenticator implements Authenticator {
public Boolean authenticate(String username, String password)
{
return (username.equals("user") &&
password.equals("good password"));
}
}
The problem with fakes is that as the application grows, there will always be
more conditions to test. Consequently, the fakes tend to grow with each new
tested condition. Eventually, they can get so large and complex that they need
tests of their own.
I seldom write fakes because I don’t trust them not to grow.
Now, if you are thinking ahead, this should already bother you. This test does
not constrain anything other than the value of sin(0).
What do I mean by that? I mean that most of the functions we write using
TDD are so constrained by the growing set of tests that there comes a point
where the function will pass any other test we can pose. We saw that in the
prime factors example and the bowling game example. Each test narrowed
the possible solution until finally the solution was known.
But here, the sin(r) function does not look to behave that way. The test for
sin(0) == 0 is correct, but it does not appear to constrain the solution
beyond that one point.
This becomes much more evident when we try the next test. What should that
test be? Why not try sin(π)?
Click here to view code image
public class SineTest {
private static final double EPSILON = 0.0001;
@Test
public void sines() throws Exception {
assertEquals(0, sin(0), EPSILON);
assertEquals(0, sin(Math.PI), EPSILON);
}
Once again, we have that feeling of being unconstrained. This test doesn’t
seem to add anything to the solution. It gives us no hint of how to solve the
problem, so let’s try π/2:
Click here to view code image
public class SineTest {
private static final double EPSILON = 0.0001;
@Test
public void sines() throws Exception {
assertEquals(0, sin(0), EPSILON);
assertEquals(0, sin(Math.PI), EPSILON);
assertEquals(1, sin(Math.PI/2), EPSILON);
}
double sin(double radians) {
return 0.0;
}
}
This fails. How can we make it pass? Again, the test gives us no hint about
how to pass it. We could try to put some horrible if statement in, but that will
just lead to more and more if statements.
At this point, you might think that the best approach would be to look up the
Taylor series for sine and just implement that.
This passes, but it’s pretty ugly. Still, we ought to be able to calculate a few
other sines this way:
Click here to view code image
public void sines() throws Exception {
assertEquals(0, sin(0), EPSILON);
assertEquals(0, sin(Math.PI), EPSILON);
assertEquals(1, sin(Math.PI/2), EPSILON);
assertEquals(0.8660, sin(Math.PI/3), EPSILON);
assertEquals(0.7071, sin(Math.PI/4), EPSILON);
assertEquals(0.5877, sin(Math.PI/5), EPSILON);
}
Yes, this passes. But this solution is ugly because it is limited in precision.
We should take the Taylor series out to enough terms that it converges to the
limit of our precision. (Note the change to the ESPILON constant.)
Click here to view code image
public class SineTest {
private static final double EPSILON = 0.000000001;
@Test
public void sines() throws Exception {
assertEquals(0, sin(0), EPSILON);
assertEquals(0, sin(Math.PI), EPSILON);
assertEquals(1, sin(Math.PI/2), EPSILON);
assertEquals(0.8660254038, sin(Math.PI/3), EPSILON);
assertEquals(0.7071067812, sin(Math.PI/4), EPSILON);
assertEquals(0.5877852523, sin(Math.PI/5), EPSILON);
}
return result;
}
Okay, now we’re cooking with gas. But wait? What happened to the TDD?
And how do we know that this algorithm is really working right? I mean,
that’s a lot of code. How can we tell if that code is right?
I suppose we could test a few more values. And, yikes, these tests are getting
unwieldy. Let’s refactor a bit too:
Click here to view code image
private void checkSin(double radians, double sin) {
assertEquals(sin, sin(radians), EPSILON);
}
@Test
public void sines() throws Exception {
checkSin(0, 0);
checkSin(PI, 0);
checkSin(PI/2, 1);
checkSin(PI/3, 0.8660254038);
checkSin(PI/4, 0.7071067812);
checkSin(PI/5, 0.5877852523);
checkSin(3* PI/2, -1);
}
Ah, 2π works, but 3π does not. It’s close, though: 4.6130E-9. We could
probably fix that by bumping up the limit of our comparison in the close()
function, but that seems like cheating and probably wouldn’t work for 100π
or 1,000π. A better solution would be to reduce the angle to keep it between
0 and 2π.
double sin(double radians) {
radians %= 2*PI;
double result = radians;
Yeah, they all work. Okay, what about big numbers that aren’t perfect
multiples of 2π?
Click here to view code image
checkSin(1000*PI + PI/3, sin(PI/3));
Sigh. That works too. Is there anything else to try? Are there any values that
might fail?
Ouch! I don’t know.
How does this help us? Well, what if we had a spy that told us how the terms
of the Taylor series were being calculated, it would let us write a test like
this:
Click here to view code image
@Test
public void taylorTerms() throws Exception {
SineTaylorCalculatorSpy c = new SineTaylorCalculatorSpy();
double r = Math.random() * PI;
for (int n = 1; n <= 10; n++) {
c.calculateTerm(r, n);
assertEquals(n - 1, c.getSignPower());
assertEquals(r, c.getR(), EPSILON);
assertEquals(2 * n - 1, c.getRPower());
assertEquals(2 * n - 1, c.getFac());
}
}
Using a random number for r and all reasonable values for n allows us to
avoid specific values. Our interest here is that given some r and some n, the
right numbers are fed into the right functions. If this test passes, we will know
that the sign, the power, and the factorial calculators have been given the
right inputs.
We can make this pass with the following simple code:
Click here to view code image
public class SineTaylorCalculator {
public double calculateTerm(double r, int n) {
int sign = calcSign(n-1);
double power = calcPower(r, 2*n-1);
double factorial = calcFactorial(2*n-1);
return sign*power/factorial;
}
Note that we are not testing the actual calculation functions. They are pretty
simple and probably don’t need testing. This is especially true in light of the
other tests we are about to write.
Here’s the spy:
Click here to view code image
package London_sine;
Given that the test passes, how hard is it to write the summing algorithm?
Click here to view code image
public double sin(double r) {
double sin=0;
for (int n=1; n<10; n++)
sin += calculateTerm(r, n);
return sin;
}
You can complain about the efficiency of the whole thing, but do you believe
that it works? Does the calculateTerm function properly calculate the right
Taylor term? Does the sin function properly add them together? Are 10
iterations enough? How can we test this without falling back on all those
original value tests?
Here’s an interesting test. All values of sin(r) should be between –1 and 1
(open).
Click here to view code image
@Test
public void testSineInRange() throws Exception {
SineTaylorCalculator c = new SineTaylorCalculator();
for (int i=0; i<100; i++) {
double r = (Math.random() * 4 * PI) - (2 * PI) ;
double sinr = c.sin(r);
assertTrue(sinr < 1 && sinr > -1);
}
}
Hmmm. That actually failed until we raised the number of terms to 20, which
is, of course, an absurdly high number. But, like I said, this is an extreme
exercise.
Given these tests, how confident are we that we are calculating the sine
properly? I don’t know about you, but I’m pretty confident. I know the terms
are being fed the right numbers. I can eyeball the simple calculators, and the
sin function seems to have the properties of a sine.
Oh, bother, let’s just do some value tests for the hell of it:
Click here to view code image
@Test
public void sineValues() throws Exception {
checkSin(0, 0);
checkSin(PI, 0);
checkSin(PI/2, 1);
checkSin(PI/3, 0.8660254038);
checkSin(PI/4, 0.7071067812);
checkSin(PI/5, 0.5877852523);
}
London
The London school of TDD gets its name from Steve Freeman and Nat Pryce,
who live in London and who wrote the book3 on the topic. This is the school
that prefers certainty over flexibility.
3. Steve Freeman and Nat Pryce, Growing Object-Oriented Software, Guided by Tests (Addison-
Wesley, 2010).
Note the use of the term over. London schoolers do not abandon flexibility.
Indeed, they still value it highly. It’s just that they are willing to tolerate a
certain level of rigidity in order to gain more certainty.
Therefore, if you look over the tests written by London schoolers, you’ll see
a consistent, and relatively unconstrained, use of mocks and spies.
This attitude focuses more on algorithm than on results. To a Londoner, the
results are important, but the way in which the results are obtained is more
important. This leads to a fascinating design approach. Londoners practice
outside-in design.
Programmers who follow the outside-in design approach start at the user
interface and design their way toward the business rules, one use case at a
time. They use mocks and spies at every boundary in order to prove that the
algorithm they are using to communicate inward is working. Eventually, they
get to a business rule, implement it, connect it to the database, and then turn
around and test their way, with mocks and spies, back out to the user
interface.
Again, this full outside-in round trip is done one use case at a time.
This is an extremely disciplined and orderly approach that can work very
well indeed.
Chicago
The Chicago school of TDD gets its name from ThoughtWorks, which was, at
the time, based in Chicago. Martin Fowler was (and at the time of this
writing still is) the chief scientist there. Actually, the name Chicago is a bit
more mysterious than that. At one time, this school was called the Detroit
school.
The Chicago school focuses on flexibility over certainty. Again, note the
word over. Chicagoans know the value of certainty but, given the choice,
prefer to make their tests more flexible. Consequently, they focus much more
on results than on interactions and algorithms.
This, of course, leads to a very different design philosophy. Chicagoans tend
to start with business rules and then move outward towards the user
interface. This approach is often called inside-out design.
The Chicago design process is just as disciplined as the London process but
attacks the problem in a very different order. A Chicagoan will not drive an
entire use case from end to end before starting on the next. Rather, a
Chicagoan might use value and property tests to implement several business
rules without any user interface at all. The user interface, and the layers
between it and the business rules, are implemented as and when necessary.
Chicagoans may not take the business rules all the way out to the database
right away either. Rather than the one-use-case-at-a-time round trip, they are
looking for synergies and duplications within the layers. Rather than sewing
a thread from the inputs of a use case all the way to the output of that use
case, they work in broader stripes, within the layers, starting with the
business rules and moving gradually out to the user interface and database.
As they explore each layer, they are hunting for design patterns and
opportunities for abstraction and generality.
This approach is a less ordered than that of the London school, but it is more
holistic. It tends to keep the big picture more clearly in view—in my humble
opinion.
Synthesis
Although these two schools of thought exist, and although there are
practitioners who tend toward one side or the other, London versus Chicago
is not a war. It’s not really even much of a disagreement. It is a minor point of
emphasis, and little more.
Indeed, all practitioners, whether Chicagoans or Londoners, use both
techniques in their work. It’s just that some do a little more of one, and some
do a little more of the other.
Which is right? Neither, of course. I tend more toward the Chicago side, but
you may look at the London school and decide you are more comfortable
there. I have no disagreement with you. Indeed, I will happily pair program
with you to create a lovely synthesis.
That synthesis becomes very important when we start to consider
architecture.
Architecture
The trade-off I make between the London and Chicago strategies is
architectural. If you read Clean Architecture,4 you know that I like to
partition systems into components. I call the divisions between those
components boundaries. My rule for boundaries is that source code
dependencies must always cross a boundary pointing toward high-level
policy.
4. Robert C. Martin, Clean Architecture: A Craftsman’s Guide to Software Structure and Design
(Addison-Wesley, 2018).
This means that components that contain low-level details, like graphical
user interfaces (GUIs) and databases, depend on higher-level components
like business rules. High-level components do not depend on lower-level
components. This is an example of the dependency inversion principle,
which is the D in SOLID.
When writing the lowest-level programmer tests, I will use spies (and rarely
mocks) when testing across an architectural boundary. Or, to say that a
different way, when testing a component, I use spies to mock out any
collaborating components and to ensure that the component I am testing
invokes the collaborators correctly. So, if my test crosses an architectural
boundary, I’m a Londoner.
However, when a test does not cross such a boundary, I tend to be a
Chicagoan. Within a component, I depend much more on state and property
testing in order to keep the coupling, and therefore the fragility, of my tests as
low as possible.
Let’s look at an example. The UML diagram in Figure 3.8 shows a set of
classes and the four components that contain them.
Figure 3.8 A set of classes and the four components that contain them
Note that the arrows all point from lower-level components to higher-level
components. This is the Dependency Rule that is discussed in Clean
Architecture. The highest-level component contains the business objects. The
next level down contains the interactors and the communications interfaces.
At the lowest level is the GUI and the database.
We might use some stubs when testing the business objects, but we won’t
need any spies or mocks because the business objects don’t know about any
other components.
The interactors, on the other hand, manipulate the business objects, the
database, and the GUI. Our tests will likely use spies to make sure that the
database and GUI are being manipulated properly. However, we will
probably not use as many spies, or even stubs, between the interactors and
the business objects because the functions of the business objects are
probably not expensive.
When testing the controller, we will almost certainly use a spy to represent
the interactor because we don’t want the calls to propagate to the database or
presenter.
The presenter is interesting. We think of it as part of the GUI component, but
in fact we’re going to need a spy to test it. We don’t want to test it with the
real view, so we probably need a fifth component to hold the view apart
from the controller and presenter.
That last little complication is common. We often modify our component
boundaries because the tests demand it.
Conclusion
In this chapter, we looked at some of the more advanced aspects of TDD:
from the incremental development of algorithms, to the problem of getting
stuck, from the finite state machine nature of tests to test doubles and the
TDD uncertainty principle. But we’re not done. There’s more. So, get a nice
hot cup of tea and turn the improbability generator up to infinity.
4 Test Design
Testing Databases
The first rule of testing databases is: Don’t test databases. You don’t need to
test the database. You can assume that the database works. You’ll find out
soon enough if it doesn’t.
What you really want to test are queries. Or, rather, you want to test that the
commands that you send to the database are properly formed. If you write
SQL directly, you are going to want to test that your SQL statements work as
intended. If you use an object-relational mapping (ORM) framework, such as
Hibernate, you are going to want to test that Hibernate operates the database
the way you intended. If you use a NoSQL database, you are going to want to
test that all your database accesses behave as you intended.
None of these tests require you to test business rules; they are only about the
queries themselves. So, the second rule of testing databases is: Decouple the
database from the business rules.
We decouple them by creating an interface, which I have called Gateway1 in
the diagram in Figure 4.1. Within the Gateway interface, we create one
method for every kind of query we wish to perform. Those methods can take
arguments that modify the query. For example, in order to fetch all the
Employees from the database whose hiring date was after 2001, we might call
the Gateway method getEmployeesHiredAfter(2001).
1. Martin Fowler, Patterns of Enterprise Application Architecture (Addison-Wesley, 2003), 466.
Figure 4.1 Testing the database
Every query, update, delete, or add we want to perform in the database will
have a corresponding method in a Gateway interface. There can, of course, be
many Gateways, depending on how we want to partition the database.
The GatewayImpl class implements the gateway and directs the actual
database to perform the functions required. If this is a SQL database, then all
the SQL is created within the GatewayImpl class. If you are using ORM, the
ORM framework is manipulated by the GatewayImpl class. Neither SQL nor
the ORM framework nor the database API is known above the architectural
boundary that separates the Gateway from the GatewayImp1.
Indeed, we don’t even want the schema of the database known above that
boundary. The GatewayImpl should unpack the rows, or data elements,
retrieved from the database and use that data to construct appropriate
business objects to pass across the boundary to the business rules.
And now testing the database is trivial. You create a suitably simple test
database, and then you call each query function of the GatewayImpl from your
tests and ensure that it has the desired effect on that test database. Make sure
that each query function returns a properly loaded set of business objects.
Make sure that each update, add, and delete changes the database
appropriately.
Do not use a production database for these tests. Create a test database with
just enough rows to prove that the tests work and then make a backup of that
database. Prior to running the tests, restore that backup so that the tests are
always run against the same test data.
When testing the business rules, use stubs and spies to replace the
GatewayImpl classes. Do not test business rules with the real database
connected. This is slow and error prone. Instead, test that your business rules
and interactors manipulate the Gateway interfaces correctly.
Testing GUIs
The rules for testing GUIs are as follows:
1. Don’t test GUIs.
2. Test everything but the GUI.
3. The GUI is smaller than you think it is.
Let’s tackle the third rule first. The GUI is a lot smaller than you think it is.
The GUI is just one very small element of the software that presents
information on the screen. It is likely the smallest part of that software. It is
the software that builds the commands that are sent to the engine that actually
paints the pixels on the screen.
For a Web-based system, the GUI is the software that builds the HTML. For
a desktop system, the GUI is the software that invokes the API of the graphic
control software. Your job, as a software designer, is to make that GUI
software as small as possible.
For example, does that software need to know how to format a date, or
currency, or general numbers? No. Some other module can do that. All the
GUI needs are the appropriate strings that represent the formatted dates,
currencies, or numbers.
We call that other module a presenter. The presenter is responsible for
formatting and arranging the data that is to appear on the screen, or in a
window. The presenter does as much as possible toward that end, allowing
us to make the GUI absurdly small.
So, for example, the presenter is the module that determines the state of every
button and menu item. It specifies their names and whether or not they should
be grayed out. If the name of a button changes based on the state of a window,
it is the presenter that knows that state and changes that name. If a grid of
numbers should appear on the screen, it is the presenter that creates a table of
strings, all properly formatted and arranged. If there are fields that should
have special colors or fonts, it is the presenter that determines those colors
and fonts.
The presenter takes care of all of that detailed formatting and arrangement
and produces a simple data structure full of strings and flags that the GUI can
use to build the commands that get sent to the screen. And, of course, that
makes the GUI very small indeed.
The data structure created by the presenter is often called a view model.
In the diagram in Figure 4.2, the interactor is responsible for telling the
presenter what data should be presented to the screen. This communication
will be in the form of one or more data structures passed to the presenter
through a set of functions. The actual presenter is shielded from the interactor
by the presenter interface. This prevents the high-level interactor from
depending on the implementation of the lower-level presenter.
Figure 4.2 The interactor is responsible for telling the presenter what
data should be presented to the screen.
The presenter builds the view model data structure, which the GUI then
translates into the commands that control the screen.
Clearly, the interactor can be tested by using a spy for the presenter. Just as
clearly, the presenter can be tested by sending it commands and inspecting
the result in the view model.
The only thing that cannot be easily tested (with automated unit tests) is the
GUI itself, and so we make it very small.
Of course, the GUI can still be tested; you just have to use your eyes to do it.
But that turns out to be quite simple because you can simply pass a canned set
of view models to the GUI and visually ensure that those view models are
rendered appropriately.
There are tools that you can use to automate even that last step, but I
generally advise against them. They tend to be slow and fragile. What’s
more, the GUI is most likely a very volatile module. Any time someone
wants to change the look and appearance of something on the screen, it is
bound to affect that GUI code. Thus, writing automated tests for that last little
bit is often a waste of time because that part of the code changes so
frequently, and for such evanescent reasons, any tests are seldom valid for
long.
GUI Input
Testing GUI input follows the same rules: We drive the GUI to be as
insignificant as possible. In the diagram in Figure 4.3, the GUI framework is
the code that sits at the boundary of the system. It might be the Web container,
or it might be something like Swing2 or Processing3 for controlling a desktop.
2. https://docs.oracle.com/javase/8/docs/technotes/guides/swing/
3. https://processing.org/
Figure 4.3 Testing the GUI
It is not my intention to try to document all those patterns and recipes here. I
just want to mention the three that I have found most useful over the years.
Test-Specific Subclass
This pattern is primarily used as a safety mechanism. For example, let’s say
that you want to test the align method of the XRay class. However, the align
method invokes the turnOn method. You probably don’t want x-rays turned on
every time you run the tests.
The solution, as shown in Figure 4.4, is to create a test-specific subclass of
the XRay class that overrides the turnOn method to do nothing. The test
creates an instance of the SafeXRay class and then calls the assign method,
without having to worry that the x-ray machine will actually be turned on.
Figure 4.4 Test-Specific-Subclass pattern
It is often helpful to make the test-specific subclass a spy so that the test can
interrogate the safe object about whether the unsafe method was actually
called or not.
In the example, if SafeXRay were a spy, then the turnOn method would record
its invocation, and the test method in the XRayTest class could interrogate that
record to ensure that turnOn was actually called.
Sometimes the Test-Specific Subclass pattern is used for convenience and
throughput rather than safety. For example, you may not wish the method
being tested to start a new process or perform an expensive computation.
It is not at all uncommon for the dangerous, inconvenient, or slow operations
to be extracted into new methods for the express purpose of overriding them
in a test-specific subclass. This is just one of the ways that tests impact the
design of the code.
Self-Shunt
A variation on that theme is the Self-Shunt pattern. Because the test class is a
class, it is often convenient for the test class to become the test-specific
subclass, as shown in Figure 4.5.
Humble Object
We like to think that every bit of code in the system can be tested using the
three laws of TDD, but this is not entirely true. The parts of the code that
communicate across a hardware boundary are perniciously difficult to test.
It is difficult, for example, to test what is displayed on the screen or what
was sent out a network interface or what was sent out a parallel or serial I/O
port. Without some specially designed hardware mechanisms that the tests
can communicate with, such tests are impossible.
What’s more, such hardware mechanisms may well be slow and/or
unreliable. Imagine, for example, a video camera staring at the screen and
your test code trying desperately to determine if the image coming back from
the camera is the image you sent to the screen. Or imagine a loopback
network cable that connects the output port of the network adapter to the input
port. Your tests would have to read the stream of data coming in on that input
port and look for the specific data you sent out on the output port.
In most cases, this kind of specialized hardware is inconvenient, if not
entirely impractical.
The Humble Object pattern is a compromise solution. This pattern
acknowledges that there is code that cannot be practicably tested. The goal of
the pattern, therefore, is to humiliate that code by making it too simple to
bother testing. We saw a simple example of this earlier, in the “Testing
GUIs” section, but now let’s take a deeper look.
The general strategy is shown in Figure 4.6. The code that communicates
across the boundary is separated into two elements: the presenter and the
Humble Object (denoted here as the HumbleView). The communication
between the two is a data structure named the Presentation.
Let’s assume that our application (not shown) wants to display something on
the screen. It sends the appropriate data to the presenter. The presenter then
unpacks that data into the simplest possible form and loads it into the
Presentation. The goal of this unpacking is to eliminate all but the simplest
processing steps from the HumbleView. The job of the HumbleView is simply to
transport the unpacked data in the Presentation across the boundary.
To make this concrete, let’s say that the application wants to put up a dialog
box that has Post and Cancel buttons, a selection menu of order IDs, and a
grid of dates and currency items. The data that the application sends to the
presenter consists of that data grid, in the form of Date and Money objects. It
also sends the list of selectable Order objects for the menu.
The presenter’s job is to turn everything into strings and flags and load them
into the Presentation. The Money and Date objects are converted into locale-
specific strings. The Order objects are converted into ID strings. The names
of the two buttons are loaded as strings. If one or more of the buttons should
be grayed out, an appropriate flag is set in the Presentation.
The end result is that the HumbleView has nothing more to do than transport
those strings across the boundary along with the metadata implied by the
flags. Again, the goal is to make the HumbleView too simple to need testing.
It should be clear that this strategy will work for any kind of boundary
crossing, not just displays.
To demonstrate, let’s say we are coding the control software for a self-
driving car. Let’s also say that the steering wheel is controlled by a stepper
motor that moves the wheel one degree per step. Our software controls the
stepper motor by issuing the following command:
out(0x3ff9, d);
where 0x3ff9 is the IO address of the stepper motor controller, and d is 1 for
right and 0 for left.
At the high level, our self-driving AI issues commands of this form to the
SteeringPresenter:
turn(RIGHT, 30, 2300);
This means that the car (not the steering wheel!) should be gradually turned
30 degrees to the right over the next 2,300ms. To accomplish this, the wheel
must be turned to the right a certain number of steps, at a certain rate, and
then turned back to the left at a certain rate so that, after 2,300ms, the car is
heading 30 degrees to the right of its previous course.
How can we test that the steering wheel is being properly controlled by the
AI? We need to humiliate the low-level steering-wheel control software. We
can do this by passing it a presentation, which is an array of data structures
that looks like this:
Click here to view code image
struct SteeringPresentationElement{
int steps;
bool direction;
int stepTime;
int delay;
};
The low-level controller walks through the array and issues the appropriate
number of steps to the stepper motor, in the specified direction, waiting
stepTime milliseconds between each step and waiting delay milliseconds
before moving to the next element in the array.
The SteeringPresenter has the task of translating the commands from the AI
into the array of SteeringPresentationElements. In order to accomplish this,
the SteeringPresenter needs to know the speed of the car and the ratio of the
angle of the steering wheel to the angle of the wheels of the car.
It should be clear that the SteeringPresenter is easy to test. The test simply
sends the appropriate Turn commands to the SteeringPresenter and then
inspects the results in the resulting array of SteeringPresentationElements.
Finally, note the ViewInterface in the diagram. If we think of the
ViewInterface, the presenter, and the Presentation as belonging together in
a single component, then the HumbleView depends on that component. This is
an architectural strategy for keeping the higher-level presenter from
depending on the detailed implementation of the HumbleView.
Test Design
We are all familiar with the need to design our production code well. But
have you ever thought about the design of your tests? Many programmers
have not. Indeed, many programmers just throw tests at the code without any
thought to how those tests should be designed. This always leads to
problems.
On the right side of the diagram, we see five production code modules, α, β,
γ, δ, and ε. The α and ε modules stand alone, but β is coupled to γ, which is
coupled to δ. On the left, we see the test modules. Note that each of the test
modules is coupled to the corresponding production code module. However,
because β is coupled to γ and δ, βTest may also be coupled to γ and δ.
This coupling may not be obvious. The reason that βTest is likely to have
couplings to γ and δ is that β may need to be constructed with γ and δ, or the
methods of β may take γ and δ as arguments.
This powerful coupling between βTest and so much of the production code
means that a minor change to δ could affect βTest, γTest, and δTest. Thus,
the one-to-one correspondence between the tests and production code can
lead to very tight coupling and fragility.
Rule 12: Decouple the structure of your tests from the structure of the production code.
Breaking the Correspondence
To break, or avoid creating, the correspondence between tests and
production code, we need to think of the test modules the way we think of all
the other modules in a software system: as independent and decoupled from
each other.
At first, this may seem absurd. You might argue that tests must be coupled to
production code, because the tests exercise the production code. The last
clause is true, but the predicate is false. Exercising code does not imply
strong coupling. Indeed, good designers consistently strive to break strong
couplings between modules while allowing those modules to interact with
and exercise each other.
How is this accomplished? By creating interface layers.
In the diagram in Figure 4.8, we see αTest coupled to α. Behind α we see a
family of modules that support α but of which αTest is ignorant. The α
module is the interface to that family. A good programmer is very careful to
ensure that none of the details of the α family leak out of that interface.
Figure 4.9 Interposing a polymorphic interface between the test and the α
family
Again, this may seem absurd to the newcomer to TDD. How, you might ask,
can we write tests against α5 when we cannot access that module from
αTest? The answer to that question is simply that you do not need access to
α5 in order to test the functionality of α5.
If α5 performs an important function for α, then that functionality must be
testable through the α interface. That is not an arbitrary rule—it is a statement
of mathematical certainty. If a behavior is important, it must also be visible
through the interface. That visibility can be direct or indirect, but it must
exist.
Perhaps an example would be beneficial to drive this point home.
In this example, we use TDD to create the program from scratch. You will
learn the requirements by reading the tests as we go along.
Requirement 1: Regular movies rent, on the first day, for $1.50 and earn 1
renter point per day rented.
Red: We write a test class for the customer named CustomerTest and add the
first test method.
Click here to view code image
public class CustomerTest {
@Test
public void RegularMovie_OneDay() throws Exception {
Customer c = new Customer();
c.addRental("RegularMovie", 1);
assertEquals(1.5, c.getRentalFee(), 0.001);
assertEquals(1, c.getRenterPoints());
}
}
@Before
public void setUp() throws Exception {
customer = new Customer();
}
@Test
public void RegularMovie_OneDay() throws Exception {
customer.addRental("RegularMovie", 1);
assertFeeAndPoints(1.5, 1);
}
}
Requirement 2: The second and third days’ rentals of regular movies are
free, and no points are earned for them.
Green: No change to production code.
Click here to view code image
@Test
public void RegularMovie_SecondAndThirdDayFree() throws Exception {
customer.addRental("RegularMovie", 2);
assertFeeAndPoints(1.5, 1);
customer.addRental("RegularMovie", 3);
assertFeeAndPoints(1.5, 1);
}
Requirement 3: All subsequent days rent for $1.50 and earn 1 renter point.
Red: The test is simple.
Click here to view code image
@Test
public void RegularMovie_FourDays() throws Exception {
customer.addRental("RegularMovie", 4);
assertFeeAndPoints(3.0, 2);
}
Red: We want to use applyGracePeriod for both the renter points and the fee,
but the fee is a double, and the points are an int. Money should never be a
double! So, we changed the fee into an int, and all the tests broke. We need
to go back and fix all our tests.
Click here to view code image
public class CustomerTest {
private Customer customer;
@Before
public void setUp() throws Exception {
customer = new Customer();
}
@Test
public void RegularMovie_OneDay() throws Exception {
customer.addRental("RegularMovie", 1);
assertFeeAndPoints(150, 1);
}
@Test
public void RegularMovie_SecondAndThirdDayFree() throws Exception
{
customer.addRental("RegularMovie", 2);
assertFeeAndPoints(150, 1);
customer.addRental("RegularMovie", 3);
assertFeeAndPoints(150, 1);
}
@Test
public void RegularMovie_FourDays() throws Exception {
customer.addRental("RegularMovie", 4);
assertFeeAndPoints(300, 2);
}
}
Requirement 4: Children’s movies rent for $1.00 per day and earn 1 point.
Red: The first day business rule is simple:
Click here to view code image
@Test
public void ChildrensMovie_OneDay() throws Exception {
customer.addRental("ChildrensMovie", 1);
assertFeeAndPoints(100, 1);
}
Green: It’s not hard to make this pass with some very ugly code.
Click here to view code image
public int getRentalFee() {
if (title.equals("RegularMovie"))
return applyGracePeriod(150, 3);
else
return 100;
}
Refactor: But now we have to clean up that ugliness. There’s no way the type
of the video should be coupled to the title, so let’s make a registry.
Click here to view code image
public class Customer {
private String title;
private int days;
private Map<String, VideoType> movieRegistry = new HashMap<>();
public Customer() {
movieRegistry.put("RegularMovie", REGULAR);
movieRegistry.put("ChildrensMovie", CHILDRENS);
}
That’s better, but it violates the single responsibility principle7 because the
Customer class should not be responsible for initializing the registry. The
registry should be initialized during early configuration of the system. Let’s
separate that registry from Customer:
7. Robert C. Martin, Clean Architecture: A Craftsman’s Guide to Software Structure and Design
(Addison-Wesley, 2018), 61ff.
Red: Note that requirement 4 said that customers earn 1 point for a children’s
movie, not 1 point per day. So, the next test looks like this:
Click here to view code image
@Test
public void ChildrensMovie_FourDays() throws Exception {
customer.addRental("ChildrensMovie", 4);
assertFeeAndPoints(400, 1);
}
I chose four days because of the 3 currently sitting as the second argument of
the call to applyGracePeriod within the getRenterPoints method of the
Customer. (Though we sometimes feign naïveté while doing TDD, we do
actually know what’s going on.)
Green: With the registry in place, this is easily repaired.
Click here to view code image
public int getRenterPoints() {
if (VideoRegistry.getType(title) == REGULAR)
return applyGracePeriod(1, 3);
else
return 1;
}
At this point, I want you to notice that there are no tests for the
VideoRegistry class. Or, rather, no direct tests. VideoRegistry is, in fact,
being tested indirectly because none of the passing tests would pass if
VideoRegistry were not functioning properly.
Red: So far, our Customer class can handle only a single movie. Let’s make
sure it can handle more than one:
Click here to view code image
@Test
public void OneRegularOneChildrens_FourDays() throws Exception {
customer.addRental("RegularMovie", 4); //$3+2p
customer.addRental("ChildrensMovie", 4); //$4+1p
assertFeeAndPoints(700, 3);
}
Green: That’s just a nice little list and a couple of loops. It’s also nice to
move the registry stuff into the new Rental class:
Click here to view code image
public class Customer {
private List<Rental> rentals = new ArrayList<>();
This actually fails the old test because Customer now sums up the two
rentals:
Click here to view code image
@Test
public void RegularMovie_SecondAndThirdDayFree() throws Exception
{
customer.addRental("RegularMovie", 2);
assertFeeAndPoints(150, 1);
customer.addRental("RegularMovie", 3);
assertFeeAndPoints(150, 1);
}
@Test
public void RegularMovie_ThirdDayFree() throws Exception {
customer.addRental("RegularMovie", 3);
assertFeeAndPoints(150, 1);
}
Refactor: There’s an awful lot I don’t like about the Customer class now.
Those two ugly loops with the strange if statements inside them are pretty
awful. We can extract a few nicer methods from those loops.
Click here to view code image
public int getRentalFee() {
int fee = 0;
for (Rental rental : rentals)
fee += feeFor(rental);
return fee;
}
Those two private functions seem to play more with the Rental than with the
Customer. Let’s move them along with their utility function
applyGracePeriod. This makes the Customer class much cleaner.
Click here to view code image
public class Customer {
private List<Rental> rentals = new ArrayList<>();
The Rental class has grown a lot and is much uglier now:
Click here to view code image
public class Rental {
private String title;
private int days;
private VideoType type;
Those ugly if statements need to be gotten rid of. Every new type of video is
going to mean another clause in those statements. Let’s head that off with
some subclasses and polymorphism.
First, there’s the abstract Movie class. It’s got the applyGracePeriod utility
and two abstract functions to get the fee and the points.
Click here to view code image
public abstract class Movie {
private String title;
And Customer? Well, it just had the wrong name all this time. It is really the
RentalCalculator class. It is the class that protects our tests from the family
of classes that serves it.
Click here to view code image
public class RentalCalculator {
private List<Rental> rentals = new ArrayList<>();
As the code evolved, all those classes to the right of RentalCalculator were
created by various refactorings. Yet RentalCalculatorTest knows nothing of
them other than VideoRegistry, which it must initialize with test data.
Moreover, no other test module exercises those classes.
RentalCalculatorTest tests all those other classes indirectly. The one-to-one
correspondence is broken.
This is the way that good programmers protect and decouple the structure of
the production code from the structure of the tests, thereby avoiding the
fragile test problem.
In large systems, of course, this pattern will repeat over and over. There will
be many families of modules, each protected from the test modules that
exercise them by their own particular facades or interface modules.
Some might suggest that tests that operate a family of modules through a
facade are integration tests. We talk about integration tests later in this book.
For now, I’ll simply point out that the purpose of integration tests is very
different from the purpose of the tests shown here. These are programmer
tests, tests written by programmers for programmers for the purpose of
specifying behavior.
The family of production code modules grows as the tests grow. However,
they evolve in very different directions.
As each new test case is added, the suite of tests becomes increasingly
specific. However, programmers should drive the family of modules being
tested in the opposite direction. That family should become increasingly
general (Figure 4.11).
Figure 4.11 The suite of tests becomes more specific, while the family of
modules being tested becomes more general.
This is one of the goals of the refactoring step. You saw it happening in the
video store example. First a test case was added. Then some ugly production
code was added to get the test to pass. That production code was not general.
Often, in fact, it was deeply specific. Then, in the refactoring step, that
specific code was massaged into a more general form.
This divergent evolution of the tests and the production code means that the
shapes of the two will be remarkably different. The tests will grow into a
linear list of constraints and specifications. The production code, on the other
hand, will grow into a rich family of logic and behavior organized to address
the underlying abstraction that drives the application.
This divergent style further decouples the tests from the production code,
protecting the two from changes in the other.
Of course, the coupling can never be completely broken. There will be
changes in one that force changes in the other. The goal is not to eliminate
such changes but to minimize them. And the techniques described are
effective toward that end.
We made the failing test pass by transforming the null into new ArrayList<>
(), as follows:
That transformation changed the behavior of the solution but also generalized
it. That null was extremely specific. ArrayList is more general than null.
The next failing test case also resulted in generalizing transformations:
Click here to view code image
assertThat(factorsOf(2), contains(2));
{} → Nil
This is usually the very first transformation employed at the start of a TDD
session. We begin with no code at all. We write the most degenerate test we
can think of. Then, to get it to compile and fail, we make the function we are
testing return null,9 as we did in the prime factors kata.
9. Or the most degenerate allowed return value.
This code transforms nothing into a function that returns nothing. Doing so
seldom makes the failing test pass, so the next transformation usually follows
immediately.
Nil → Constant
Again, we see this in the prime factors kata. The null we returned is
transformed into an empty list of integers.
Click here to view code image
private List<Integer> factorsOf(int n) {
return new ArrayList<>();
}
We also saw this in the bowling game kata in Chapter 2, though in that case,
we skipped the {} → Nil transformation and went straight to the constant.
public int score() {
return 0;
}
Constant → Variable
This transformation changes a constant into a variable. We saw this in the
stack kata (Chapter 2) when we created the empty variable to hold the true
value that isEmpty had been returning.
Click here to view code image
public class Stack {
private boolean empty = true;
We saw this again in prime factors when, in order to pass the case for
factoring 3, we replaced the constant 2 with the argument n.
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
if (n>1)
factors.add(n);
return factors;
}
Unconditional → Selection
This transformation adds an if statement, or the equivalent. This is not
always a generalization. Programmers must take care not to make the
predicate of the selection specific to the currently failing test.
We saw this transformation in the prime factors kata when we needed to
factor the number 2. Note that the predicate of the if statement in that kata
was not (n==2); that would have been too specific. The (n>1) inequality was
an attempt to keep the if statement more general.
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
if (n>1)
factors.add(2);
return factors;
}
Value → List
This generalizing transformation changes a variable that holds a single value
into a list of values. The list could be an array or a more complex container.
We saw this transformation in the stack kata when we changed the element
variable into the elements array.
Click here to view code image
public class Stack {
private int size = 0;
private int[] elements = new int[2];
Statement → Recursion
This generalizing transformation changes a single statement into a recursive
statement, in lieu of a loop. These kinds of transformations are very common
in languages that support recursion, especially those like Lisp and Logo that
have no looping facilities other than recursion. The transformation changes
an expression that is evaluated once into an expression that is evaluated in
terms of itself. We saw this transformation in the word-wrap kata in Chapter
3, “Advanced TDD.”
Click here to view code image
private String wrap(String s, int w) {
if (w >= s.length())
return s;
else
return s.substring(0, w) + "\n" + wrap(s.substring(w), w);
}
Selection → Iteration
We saw this several times in the prime factors kata when we converted those
if statements into while statements. This is clearly a generalization because
iteration is the general form of selection, and selection is merely degenerate
iteration.
Click here to view code image
private List<Integer> factorsOf(int n) {
ArrayList<Integer> factors = new ArrayList<>();
if (n > 1) {
while (n % 2 == 0) {
factors.add(2);
n /= 2;
}
}
if (n > 1)
factors.add(n);
return factors;
}
Example: Fibonacci
Let’s try a simple kata and keep track of the transformations. We’ll do the
tried and true Fibonacci kata. Remember that fib(0) = 1, fib(1) = 1, and
fib(n) = fib(n-1) + fib(n-2).
We begin, as always, with a failing test. If you are wondering why I’m using
BigInteger, it is because Fibonacci numbers get big very quickly.
Yes, I thought the use of the String argument was odd too; but that’s the Java
library for you.
The next test passes right out of the box:
Click here to view code image
@Test
public void testFibs() throws Exception {
assertThat(fib(0), equalTo(BigInteger.ONE));
assertThat(fib(1), equalTo(BigInteger.ONE));
}
You might think that last transformation was just a different application of
Statement → Recursion, but it wasn’t. It was actually Selection → Iteration.
In fact, if the Java compiler would deign to offer us tail-call-optimization,12
it would translate almost exactly to the following code. Note the implied if-
>while.
12. Java, Java, wherefore art thou Java?
At the time that I was reading the first edition, my twelve-year-old son,
Justin, was on a hockey team. For those of you who are not hockey parents,
the games involve five minutes of your child playing on the ice and ten to
fifteen minutes off the ice so that they can cool down.
While my son was off the ice, I read Martin’s wonderful book. It was the first
book I had ever read that presented code as something malleable. Most other
books of the period, and before, presented code in final form. But this book
showed how you could take bad code and clean it.
As I read it, I would hear the crowd cheer for the kids on the ice, and I would
cheer along with them—but I was not cheering for the game. I was cheering
for what I was reading in that book. In many ways, it was the book that put
me on the path to writing Clean Code.3
3. Robert C. Martin, Clean Code (Addison-Wesley, 2009).
This chapter presents the art of refactoring from my personal point of view. It
is not intended as a replacement for Martin’s book.
What Is Refactoring?
This time, I paraphrase Martin with my own quote:
Refactoring is a sequence of small changes that improve the structure of the software
without changing its behavior—as proven by passing a comprehensive suite of tests after
each change in the sequence.
The purpose of refactoring is to clean the code. The process for refactoring
is the red → green → refactor cycle. Refactoring is a constant activity, not a
scheduled and planned activity. You keep the code clean by refactoring it
every time around the red → green → refactor loop.
There will be times when larger refactorings are necessary. You will
inevitably find that the design for your system needs updating, and you’ll
want to make that design change throughout the body of the code. You do not
schedule this. You do not stop adding features and fixing bugs to do it. You
simply add a bit of extra refactoring effort to the red → green → refactor
cycle and gradually make the desired changes as you also continuously
deliver business value.
Naming things is hard. Finding the right name for something is often a
process of successive, iterative improvements. Do not be afraid to pursue the
right name. Improve names as often as you can when the project is young.
As the project ages, changing names becomes increasingly difficult.
Increasing numbers of programmers will have committed the names to
memory and will not react well if those names are changed without warning.
As time goes on, renaming important classes and functions will require
meetings and consensus.
So, as you write new code, and while that code is not too widely known,
experiment with names. Rename your classes and methods frequently. As you
do, you’ll find that you’ll want to group them differently. You’ll move
methods from one class to another to remain consistent with your new names.
You’ll change the partitioning of functions and classes to correspond to the
new naming scheme.
In short, the practice of searching for the best names will likely have a
profoundly positive effect on the way you partition the code into classes and
modules.
So, learn to use the Rename refactoring frequently and well.
Extract Method
The Extract Method refactoring may be the most important of all the
refactorings. Indeed, this refactoring may be the most important mechanism
for keeping your code clean and well organized.
My advice is to follow the extract ’til you drop discipline.
This discipline pursues two goals. First, every function should do one thing.5
Second, your code should read like well-written prose.6
5. Martin, Clean Code, p. 7.
6. Martin, p. 8.
A function does one thing when no other function can be extracted from it.
Therefore, in order that your functions all do one thing, you should extract
and extract and extract until you cannot extract any more.
This will, of course, lead to a plethora of little tiny functions. And this may
disturb you. You may feel that so many little tiny functions will obscure the
intent of your code. You may worry that it would be easy to get lost within
such a huge swarm of functions.
But the opposite happens. The intent of your code becomes much more
obvious. The levels of abstraction become crisp and the lines between them
clear.
Remember that languages nowadays are rich with modules, classes, and
namespaces. This allows you to build a hierarchy of names within which to
place your functions. Namespaces hold classes. Classes hold functions.
Public functions refer to private functions. Classes hold inner and nested
classes. And so on. Take advantage of these tools to create a structure that
makes it easy for other programmers to locate the functions you have written.
And then choose good names. Remember that the length of a function’s name
should be inversely proportional to the scope that contains it. The names of
public functions should be relatively short. The names of private functions
should be longer.
As you extract and extract, the names of the functions will get longer and
longer because the purpose of the function will become less and less general.
Most of these extracted functions will be called from only one place, so their
purpose will be extremely specialized and precise. The names of such
specialized and precise functions must be long. They will likely be full
clauses or even sentences.
These functions will be called from within the parentheses of while loops
and if statements. They will be called from within the bodies of those
statements as well, leading to code that looks like this:
Click here to view code image
if (employeeShouldHaveFullBenefits())
AddFullBenefitsToEmployee();
Extract Variable
If Extract Method is the most important of refactorings, Extract Variable is
its ready assistant. It turns out that in order to extract methods, you often must
extract variables first.
For example, consider this refactoring from the bowling game in Chapter 2,
“Test-Driven Development.” We started with this:
Click here to view code image
@Test
public void allOnes() throws Exception {
for (int i=0; i<20; i++)
g.roll(1);
assertEquals(20, g.score());
}
Fortunately for us, the author was kind enough to have written some tests.
These tests aren’t great, but they’ll do.
Click here to view code image
public class NewCasesReporterTest {
private final double DELTA = 0.0001;
private NewCasesReporter reporter;
@Before
public void setUp() throws Exception {
reporter = new NewCasesReporter();
}
@Test
public void countyReport() throws Exception {
String report = reporter.makeReport("" +
"c1, s1, 1, 1, 1, 1, 1, 1, 1, 7\n" +
"c2, s2, 2, 2, 2, 2, 2, 2, 2, 7");
assertEquals("" +
"County State Avg New Cases\n" +
"====== ===== =============\n" +
"c1 s1 1.86\n" +
"c2 s2 2.71\n\n" +
"s1 cases: 14\n" +
"s2 cases: 21\n" +
"Total Cases: 35\n",
report);
}
@Test
public void stateWithTwoCounties() throws Exception {
String report = reporter.makeReport("" +
"c1, s1, 1, 1, 1, 1, 1, 1, 1, 7\n" +
"c2, s1, 2, 2, 2, 2, 2, 2, 2, 7");
assertEquals("" +
"County State Avg New Cases\n" +
"====== ===== =============\n" +
"c1 s1 1.86\n" +
"c2 s1 2.71\n\n" +
"s1 cases: 35\n" +
"Total Cases: 35\n",
report);
}
@Test
public void statesWithShortLines() throws Exception {
String report = reporter.makeReport("" +
"c1, s1, 1, 1, 1, 1, 7\n" +
"c2, s2, 7\n");
assertEquals("" +
"County State Avg New Cases\n" +
"====== ===== =============\n" +
"c1 s1 2.20\n" +
"c2 s2 7.00\n\n" +
"s1 cases: 11\n" +
"s2 cases: 7\n" +
"Total Cases: 18\n",
report);
}
}
The tests give us a good idea of what the program is doing. The input is a
CSV string. Each line represents a county and has a list of the number of new
COVID cases per day. The output is a report that shows the seven-day rolling
average of new cases per county and provides some totals for each state,
along with a grand total.
Clearly, we want to start extracting methods from this big horrible function.
Let’s begin with that loop up at the top. That loop does all the math for all the
counties, so we should probably call it something like calculateCounties.
However, selecting that loop and trying to extract a method produces the
dialog shown in Figure 5.1.
Figure 5.1 Extract Method dialog
The IDE wants to name the function getTotalCases. You’ve got to hand it to
the IDE authors—they worked pretty hard to try to suggest names. The IDE
decided on that name because the code after the loop needs the number of
new cases and has no way to get it if this new function doesn’t return it.
But we don’t want to call the function getTotalCases. That’s not our intent
for this function. We want to call it calculateCounties. Moreover, we don’t
want to pass in those four arguments either. All we really want to pass in to
the extracted function is the lines array.
So let’s hit Cancel and look again.
To refactor this properly, we need to extract some of the local variables
within that loop into fields of the surrounding class. We use the Extract Field
refactoring to do this:
Click here to view code image
public class NewCasesReporter {
private int totalCases;
private final Map<String, Integer> stateCounts = new HashMap<>();
private final List<County> counties = new ArrayList<>();
Note that we initialize the values of those variables at the top of the
makeReport function. This preserves the original behavior.
Now we can extract out the loop without passing in any more variables than
we want and without returning the totalCases:
Click here to view code image
public class NewCasesReporter {
private int totalCases;
private final Map<String, Integer> stateCounts = new HashMap<>();
private final List<County> counties = new ArrayList<>();
Now, with those variables as fields, we can continue to extract and rename to
our heart’s delight.
Click here to view code image
public class NewCasesReporter {
private int totalCases;
private final Map<String, Integer> stateCounts = new HashMap<>();
private final List<County> counties = new ArrayList<>();
county.rollingAverage = calculateRollingAverage(tokens);
return county;
}
county.rollingAverage = calculateRollingAverage(tokens);
=======
This partitioning separates things out very nicely. Reporting and calculation
are accomplished in separate modules. And all because of that initial
Extract Field.
Rubik’s Cube
So far, I’ve tried to show you how powerful a small set of refactorings can
be. In my normal work, I seldom use more than the ones I’ve shown you. The
trick is to learn them well and understand all the details of the IDE and the
tricks for using them.
I have often compared refactoring to solving a Rubik’s cube. If you’ve never
solved one of these puzzles, it would be worth your time to learn how. Once
you know the trick, it’s relatively easy.
It turns out that there are a set of “operations” that you can apply to the cube
that preserve most of the cube’s positions but change certain positions in
predictable ways. Once you know three or four of those operations, you can
incrementally manipulate the cube into a solvable position.
The more operations you know and the more adept you are at performing
them, the faster and more directly you can solve the cube. But you’d better
learn those operations well. One missed step and the cube melts down into a
random distribution of cubies, and you have to start all over.
Refactoring code is a lot like this. The more refactorings you know and the
more adept you are at using them, the easier it is to push, pull, and stretch the
code in any direction you desire.
Oh, and you’d better have tests too. Without them, meltdowns are a near
certainty.
The Disciplines
Refactoring is safe, easy, and powerful if you approach it in a regular and
disciplined manner. If, on the other hand, you approach it as an ad hoc,
temporary, and sporadic activity, that safety and power can quickly
evaporate.
Tests
The first of the disciplines, of course, is tests. Tests, tests, tests, tests, and
more tests. To safely and reliably refactor your code, you need a test suite
that you trust with your life. You need tests.
Quick Tests
The tests also need to be quick. Refactoring just doesn’t work well if your
tests take hours (or even minutes) to run.
In large systems, no matter how hard you try to reduce test time, it’s hard to
reduce it to less than a few minutes. For this reason, I like to organize my test
suite such that I can quickly and easily run the relevant subset of tests that
check the part of the code I am refactoring at the moment. This usually allows
me to reduce the test time from minutes to sub-seconds. I run the whole suite
once every hour or so just to make sure no bugs leaked out.
Refactor Continuously
When I cook a meal, I make it a rule to clean the preparation dishes as I
proceed.9 I do not let them pile up in the sink. There’s always enough time to
clean the used utensils and pans while the food is cooking.
9. My wife disputes this claim.
Refactoring is like that too. Don’t wait to refactor. Refactor as you go. Keep
the red → green → refactor loop in your mind, and spin around that loop
every few minutes. That way, you will prevent the mess from building so
large that it starts to intimidate you.
Refactor Mercilessly
Merciless refactoring was one of Kent Beck’s sound bites for Extreme
Programming. It was a good one. The discipline is simply to be courageous
when you refactor. Don’t be afraid to try things. Don’t be reluctant to make
changes. Manipulate the code as though it is clay and you are the sculptor.
Fear of the code is the mind-killer, the dark path. Once you start down the
dark path, forever will it dominate your destiny. Consume you it will.
Conclusion
I kept this chapter intentionally brief because there were only a few ideas
that I wanted to add to Martin Fowler’s Refactoring. Again, I urge you to
refer to that book for an in-depth understanding.
The best approach to refactoring is to develop a comfortable repertoire of
refactorings that you use frequently and to have a good working knowledge of
many others. If you use an IDE that provides refactoring operations, make
sure you understand them in detail.
Refactoring makes no sense without tests. Without tests, the opportunities for
error are just too common. Even the automated refactorings that your IDE
provides can sometimes make mistakes. So always back up your refactoring
efforts with a comprehensive suite of tests.
Finally, be disciplined. Refactor frequently. Refactor mercilessly. And
refactor without apology. Never, ever ask permission to refactor.
6 Simple Design
Design. The Holy Grail and ultimate objective of the software craft. We all
seek to create a design so perfect that features can be added without effort
and without fuss. A design so robust that, despite months and years of
constant maintenance, the system remains facile and flexible. Design, in the
end, is what it’s all about.
I have written a great deal about design. I have written books about design
principles, design patterns, and architecture. And I am far from the only
author to focus on this topic. The amount of literature on software design is
enormous.
But that’s not what this chapter is about. You would be well advised to
research the topic of design, read those authors, and understand the
principles, the patterns, and the overall gestalt of software design and
architecture.
But the key to it all, the aspect of design that imbues it with all the
characteristics that we desire, is—in a word—simplicity. As Chet
Hendrickson1 once said, “Uncle Bob wrote thousands of pages on clean
code. Kent Beck wrote four lines.” It is those four lines we focus on here.
1. As cited by Martin Fowler in a tweet quoting Chet Hendrickson at AATC2017. I was in attendance
when Chet said it, and I completely agreed with him.
It should be obvious, on the face of it, that the best design for a system is the
simplest design that supports all the required features of that system while
simultaneously affording the greatest flexibility for change. However, that
leaves us to ponder the meaning of simplicity.2 Simple does not mean easy.
Simple means untangled, and untangling things is hard.
2. In 2012, Rich Hickey gave a wonderful talk, Simple Made Easy. I encourage you to listen to it.
https://www.youtube.com/watch?v=oytL881p-nQ.
What things get tangled in software systems? The most expensive and
significant entanglements are those that convolve high-level policies with
low-level details. You create terrible complexities when you conjoin SQL
with HTML, or frameworks with core values, or the format of a report with
the business rules that calculate the reported values. These entanglements are
easy to write, but they make it hard to add new features, hard to fix bugs, and
hard to improve and clean the design.
A simple design is a design in which high-level policies are ignorant of low-
level details. Those high-level policies are sequestered and isolated from
low-level details such that changes to the low-level details have no impact
on the high-level policies.3
3. I write a great deal about this in Clean Architecture: A Craftsman’s Guide to Software Structure
and Design (Addison-Wesley, 2018).
The primary means for creating this separation and isolation is abstraction.
Abstraction is the amplification of the essential and the elimination of the
irrelevant. High-level policies are essential, so they are amplified. Low-
level details are irrelevant, so they are isolated and sequestered.
The physical means we employ for this abstraction is polymorphism. We
arrange high-level policies to use polymorphic interfaces to manage the low-
level details. Then we arrange the low-level details as implementations of
those polymorphic interfaces. This practice keeps all source code
dependencies pointing from low-level details to high-level policies and
keeps high-level policies ignorant of the implementations of the low-level
details. Low-level details can be changed without affecting the high-level
policies (Figure 6.1).
If the best design of a system is the simplest design that supports the features,
then we can say that such a design must have the fewest abstractions that
manage to isolate high-level policy from low-level detail.
And yet, this was precisely the opposite of the strategy we employed
throughout the 1980s and 1990s. In those days, we were obsessed with
future-proofing our code by putting the hooks in for the changes we
anticipated in the future.
We took this path because software, in those days, was hard to change—even
if the design was simple.
Why was software hard to change? Because build times were long and test
times were longer.
In the 1980s, a small system might require an hour or more to build and many
hours to test. Tests, of course, were manual and were therefore woefully
inadequate. As a system grew larger and more complicated, the programmers
became ever-more afraid of making changes. This led to a mentality of
overdesign, which drove us to create systems that were far more complicated
than necessary for the features they had.
We reversed course in the late 1990s with the advent of Extreme
Programming and then Agile. By then, our machines had become so
massively powerful that build times could be reduced to minutes or even
seconds, and we found that we could afford to automate tests that could run
very quickly.
Driven by this technological leap, the discipline of YAGNI and the four
principles of simple design, described by Kent Beck, became practicable.
YAGNI
What if you aren’t gonna need it?
In 1999, I was teaching an Extreme Programming course with Martin Fowler,
Kent Beck, Ron Jeffries, and a host of others. The topic turned to the dangers
of overdesign and premature generalization. Someone wrote YAGNI on the
whiteboard and said, “You aren’t gonna need it.” Beck interrupted and said
something to the effect maybe you are gonna need it, but you should ask
yourself, “What if you aren’t?”
That was the original question that YAGNI asked. Every time you thought to
yourself, I’m going to need this hook, you then asked yourself what would
happen if you didn’t put the hook in. If the cost of leaving the hook out was
tolerable, then you probably shouldn’t put it in. If the cost of carrying the
hook in the design, year after year, would be high but the odds that you’d
eventually need that hook were low, you probably shouldn’t put that hook in.
It is hard to imagine the furor this new perspective raised in the late 1990s.
Designers everywhere had gotten very used to putting all the hooks in. In
those days, putting the hooks in was considered conventional wisdom and a
best practice.
So, when the YAGNI discipline of Extreme Programming surfaced, it was
roundly criticized and panned as heresy and claptrap.
Nowadays, ironically, it is one of the most important disciplines of good
software design. If you have a good suite of tests and you are skilled at the
discipline of refactoring, then the cost of adding a new feature and updating
the design to support that new feature will almost certainly be smaller than
the cost of implementing and maintaining all the hooks you might need one
day.
Hooks are problematic in any case. We seldom get them right. That’s because
we are not particularly good at predicting what customers will actually do.
Consequently, we tend to put in far more hooks than we need and base those
hooks on assumptions that rarely pan out.
The bottom line is that that the effect that gigahertz clock rates and terabyte
memories had on the process of software design and architecture took us all
by surprise. We did not realize until the late 1990s that those advances would
allow us to vastly simplify our designs.
It is one of the great ironies of our industry that the exponential increase of
Moore’s law that drove us to build ever-more complex software systems
also made it possible to simplify the designs of those systems.
YAGNI, it turns out, is the unintended consequence of the virtually limitless
computer power now at our disposal. Because our build times have shrunk
down into the seconds and because we can afford to write and execute
comprehensive test suites that execute in seconds, we can afford to not put
the hooks in and instead refactor the designs as the requirements change.
Does this mean that we never put the hooks in? Do we always design our
systems only for the features we need today? Do we never think ahead and
plan for the future?
No, that’s not what YAGNI means. There are times when putting a particular
hook in is a good idea. Future-proofing the code is not dead, and it is always
wise to think of the future.
It’s just that the trade-offs have changed so dramatically in the last few
decades that it is now usually better to leave the majority of the hooks out.
And that’s why we ask the question:
What if you aren’t gonna need it?
Covered by Tests
The first time I ran across Beck’s rules of simple design was in the first
edition of Extreme Programming Explained.4 At that time, the four rules
were as follows:
4. Kent Beck, Extreme Programming Explained (Addison-Wesley, 1999).
1. The system (code and tests) must communicate everything you want to
communicate.
2. The system must contain no duplicate code.
3. The system should have the fewest possible classes.
4. The system should have the fewest possible methods.
By 2011, they had evolved to these:
1. Tests pass.
2. Reveal intent.
3. No duplication.
4. Small.
By 2014, Corey Haines had written a book5 about those four rules:
5. Corey Haines, Understanding the Four Rules of Simple Design (Leanpub, 2014).
Notice how the emphasis of that first rule has changed over the years. The
first rule split in two and the last two rules merged into one. Notice also that,
as the years went by, tests grew in importance from communication to
coverage.
Coverage
The concept of tests coverage is an old one. The first mention I was able to
find goes all the way back to 1963.7 The article begins with two paragraphs
that I think you’ll find interesting, if not evocative.
7. Joan Miller and Clifford J Maloney, “Systematic Mistake Analysis of Digital Computer Programs,”
Communications of the ACM 6, no. 2 (1963): 58–63.
Effective program checkout is imperative to any complex computer program. One or
more test cases are always run for a program before it is considered ready for
application to the actual problem. Each test case checks that portion of the program
actually used in its computation. Too often, however, mistakes show up as late as several
months (or even years) after a program has been put into operation. This is an indication
that the portions of the program called upon only by rarely occurring input conditions
have not been properly tested during the checkout stage.
In order to rely with confidence upon any particular program it is not sufficient to know
that the program works most of the time or even that it has never made a mistake so far.
The real question is whether it can be counted upon to fulfill its functional specifications
successfully every single time. This means that, after a program has passed the checkout
stage, there should be no possibility that an unusual combination of input data or
conditions may bring to light an unexpected mistake in the program. Every portion of the
program must be utilized during checkout in order that its correctness be confirmed.
Nineteen sixty-three was only seventeen years after the very first program ran
on the very first electronic computer,8 and already we knew that the only way
to effectively mitigate the threat of software errors is to test every single line
of code.
8. Presuming that the first computer was the Automated Computing Engine and that the first program
executed in 1946.
Code coverage tools have been around for decades. I don’t remember when I
first encountered them. I think it was in the late 1980s or early 1990s. At the
time, I was working on Sun Microsystems Sparc Stations, and Sun had a tool
called tcov.
I don’t remember when I first heard the question, What’s your code
coverage? It was probably in the very early 2000s. But thereafter the notion
that code coverage was a number became pretty much universal.
Since then, it has become relatively commonplace for software teams to run a
code coverage tool as part of their continuous build process and to publish
the code coverage number for each build.
What is a good code coverage number? Eighty percent? Ninety percent?
Many teams are more than happy to report such numbers. But six decades
before the publication of this book, Miller and Maloney answered the
question very differently: Their answer was 100 percent.
What other number could possibly make sense? If you are happy with 80
percent coverage, it means you don’t know if 20 percent of your code works.
How could you possibly be happy with that? How could your customers be
happy with that?
So, when I use the term covered in the first rule of simple design, I mean
covered. I mean 100 percent line coverage and 100 percent branch coverage.
An Asymptotic Goal
You might complain that 100 percent is an unreachable goal. I might even
agree with you. Achieving 100 percent line and branch coverage is no mean
feat. It may, in fact, be impractical depending on the situation. But that does
not mean that your coverage cannot be improved.
Think of the number 100 percent as an asymptotic goal. You may never reach
it, but that’s no excuse for not trying to get closer and closer with every
check-in.
I have personally participated in projects that grew to many tens of thousands
of lines of code while constantly keeping the code coverage in the very high
nineties.
Design?
But what does high code coverage have to do with simple design? Why is
coverage the first rule?
Testable code is decoupled code.
In order to achieve high line and branch coverage of each individual part of
the code, each of those parts must be made accessible to the test code. That
means those parts must be so well decoupled from the rest of the code that
they can be isolated and invoked from an individual test. Therefore, those
tests not only are tests of behavior but also are tests of decoupling. The act of
writing isolated tests is an act of design, because the code being tested must
be designed to be tested.
In Chapter 4, “Testing Design,” we talked about how the test code and the
production code evolve in different directions in order to keep the tests from
coupling too strongly to the production code. This prevents the problem of
fragile tests. But the problem of fragile tests is no different from the problem
of fragile modules, and the cure for both is the same. If the design of your
system keeps your tests from being fragile, it will also keep the other
elements of your system from being fragile.
Maximize Expression
In the early decades of programming, the code we wrote could not reveal
intent. Indeed, the very name “code” suggests that intent is obscured. Back in
those days, code looked like what is shown in Figure 6.2.
Figure 6.2 An example of an early program
As an example of such code, consider this little bit of Java from the video
store example in Chapter 4:
Click here to view code image
public class RentalCalculator {
private List<Rental> rentals = new ArrayList<>();
If you were not a programmer on this project, you might not understand
everything that is going on in this code. However, after even the most cursory
glance, the basic intent of the designer is easy to identify. The names of the
variables, functions, and types are deeply descriptive. The structure of the
algorithm is easy to see. This code is expressive. This code is simple.
Minimize Duplication
In the early days of software, we had no source code editors at all. We wrote
our code, using #2 pencils, on preprinted coding forms. The best editing tool
we had was an eraser. We had no practical means to copy and paste.
Because of that, we did not duplicate code. It was easier for us to create a
single instance of a code snippet and put it into a subroutine.
But then came the source code editors, and with those editors came
copy/paste operations. Suddenly it was much easier to copy a snippet of
code and paste it into a new location and then fiddle with it until it worked.
Thus, as the years went by, more and more systems exhibited massive
amounts of duplication in the code.
Duplication is usually problematic. Two or more similar stretches of code
will often need to be modified together. Finding those similar stretches is
hard. Properly modifying them is even harder because they exist in different
contexts. Thus, duplication leads to fragility.
In general, it is best to reduce similar stretches of code into a single instance
by abstracting the code into a new function and providing it with appropriate
arguments that communicate any differences in context.
Sometimes that strategy is not workable. For example, sometimes the
duplication is in code that traverses a complex data structure. Many different
parts of the system may wish to traverse that structure and will use the same
looping and traversal code only to then operate on the data structure in the
body of that code.
As the structure of the data changes over time, programmers will have to find
all the duplications of the traversal code and properly update them. The more
the traversal code is duplicated, the higher the risk of fragility.
The duplications of the traversal code can be eliminated by encapsulating it
in once place and using lambdas, Command objects, the Strategy pattern or
even the Template Method pattern10 to pass the necessary operations into the
traversal.
10. Erich Gamma, Richard Helm, Ralph Johnson, and John M Vlissides, Design Patterns: Elements of
Reusable Object-Oriented Software (Addison-Wesley, 1995).
Accidental Duplication
Not all duplication should be eliminated. There are instances in which two
stretches of code may be very similar, even identical, but will change for
very different reasons.11 I call this accidental duplication. Accidental
duplicates should not be eliminated. The duplication should be allowed to
persist. As the requirements change, the duplicates will evolve separately
and the accidental duplication will dissolve.
11. See the single responsibility principle. Robert C. Martin, Agile Software Development: Principles,
Patterns, and Practices (Pearson, 2003).
Minimize Size
A simple design is composed of simple elements. Simple elements are small.
The last rule of simple design states that for each function you write, after
you’ve gotten all the tests to pass, and after you have made the code as
expressive as possible, and after you have minimized duplication, then you
should work to decrease the size of the code within each function without
violating the other three principles.
How do you do that? Mostly by extracting more functions. As we discussed
in Chapter 5, “Refactoring,” you extract functions until you cannot extract any
more.
This practice leaves you with nice small functions that have nice long names
that help to make the functions very small and very expressive.
Simple Design
Many years back, Kent Beck and I were having a discussion on the principles
of design. He said something that has always stuck with me. He said that if
you followed these four things as diligently as possible, all other design
principles would be satisfied—the principles of design can be reduced to
coverage, expression, singularization, and reduction.
I don’t know if this is true or not. I don’t know if a perfectly covered,
expressed, singularized, and reduced program necessarily conforms to the
open-closed principle or the single responsibility principle. What I am very
sure of, however, is that knowing and studying the principles of good design
and good architecture (e.g., the SOLID principles) makes it much easier to
create well-partitioned and simple designs.
This is not a book about those principles. I have written about them many
times before,12 as have others. I encourage you to read those works and study
those principles as part of maturing in your craft.
12. See Martin, Clean Code; Clean Architecture; and Agile Software Development: Principles,
Patterns, and Practices.
7 Collaborative Programming
The discipline involves two or more people working together at the same
time, on the same code. Nowadays, this is typically done by using screen-
sharing software. Both programmers see the same code on their screens. Both
can use their mouse and keyboard to manipulate that code. Their
workstations are slaved to each other either locally or remotely.
Collaboration like this is not something that should generally be done 100
percent of the time. Rather, collaboration sessions are generally brief,
informal, and intermittent. The total time a team should work collaboratively
depends on the maturity, skill, geography, and demographics of the team and
should be somewhere in the range of 20 to 70 percent.3
3. There are teams who pair 100 percent of the time. They seem to enjoy it, and more power to them.
Those last two statistics imply that the structure of the code being produced
is significantly better than the code that might have been produced by the
programmers working alone.
I haven’t seen any studies on mobbing, but the anecdotal evidence6 is
encouraging.
6. Agile Alliance, “Mob Programming: A Whole Team Approach,” AATC2017,
https://www.agilealliance.org/resources/sessions/mob-programming-aatc2017/.
Seniors can collaborate with juniors. When they do, the seniors are slowed
down by the juniors for the duration of the session. The juniors, on the other
hand, are sped up for the rest of their lives—so it’s a good trade-off.
Seniors can collaborate with seniors; just make sure there are no weapons in
the room.
Juniors can collaborate with juniors, though seniors should watch such
sessions carefully. Juniors are likely to prefer to work with other juniors. If
that happens too frequently, a senior should step in.
Some folks simply do not like to participate in collaborations like this. Some
people work better alone. They should not be forced into collaboration
sessions beyond reasonable peer pressure. Nor should they be disparaged for
their preference. Often, they will be happier in a mob than in a pair.
Collaboration is a skill that takes time and patience to acquire. Don’t expect
to be good at it until you’ve practiced it for many hours. However, it is a
skill that is very beneficial to the team as a whole, and to each programmer
who engages in it.
8 Acceptance Tests
All tests begin by arranging the input data for the test; then the test causes the
tested action to be performed. Finally, the test asserts that the output data
from that action matches the expectation.
These three elements can be specified in a variety of different ways, but the
most easily approachable is a simple tabular format.
Figure 8.1, for example, is a portion of one of the acceptance tests within the
FitNesse tool. FitNesse is a wiki, and this test checks that the various markup
gestures are properly translated into HTML. The action to be performed is
widget should render, the input data is the wiki text, and the output is the
html text.
Figure 8.1 A portion of the results of one of the acceptance tests from the
FitNesse tool
Another common format is Given-When-Then:
Click here to view code image
Given a page with the wiki text: !1 header
When that page is rendered.
Then the page will contain: <h1>header</h1>
The Discipline
In the strictest form of the discipline, the acceptance tests are written by BA
and QA. BA focuses on the happy path scenarios, whereas QA focuses on
exploring the myriad of ways that the system can fail.
These tests are written at the same time as, or just before, the features they
test are developed. In an Agile project, divided up into sprints or iterations,
the tests are written during the first few days of the sprint. They should all
pass by the end of the sprint.
BA and QA provide the programmers with these tests, and the programmers
automate them in a manner that keeps BA and QA engaged.
These tests become the definition of done. A feature is not complete until all
its acceptance tests pass. And when all the acceptance tests pass, the feature
is done.
This, of course, puts a huge responsibility on BA and QA. The tests that they
write must be full specifications of the features being tested. The suite of
acceptance tests is the requirements document for the entire system. By
writing those tests, BA and QA are certifying that when they pass, the
specified features are done and working.
Some BA and QA teams may be unaccustomed to writing such formal and
detailed documents. In these cases, the programmers may wish to write the
acceptance tests with guidance from BA and QA. The intermediate goal is to
create tests that BA and QA can read and bless. The ultimate goal is to get
BA and QA comfortable enough to write the tests.
The Continuous Build
Once an acceptance test passes, it goes into the suite of tests that is run during
the continuous build.
The continuous build is an automated procedure that runs every time3 a
programmer checks code into the source code control system. This procedure
builds the system from source and then runs the suites of automated
programmer unit tests and automated acceptance tests. The results of that run
are visibly posted, often in an email to every programmer and interested
party. The state of the continuous build is something everyone should be
continuously aware of.
3. Within a few minutes.
The continuous running of all these tests ensures that subsequent changes to
the system do not break working features. If a previously passing acceptance
test fails during the continuous build, the team must immediately respond and
repair it before making any other changes. Allowing failures to accumulate in
the continuous build is suicidal.
II The Standards
Standards are baseline expectations. They are the lines in the sand that we
decide we cannot cross. They are the parameters we set as the minimum we
can accept. Standards can be exceeded, but we must never fail to meet them.
Stable Productivity
Software projects often experience a decrease in productivity with time. This
is a symptom of serious dysfunction. It is caused by the neglect of testing and
refactoring disciplines. That neglect leads to the ever-increasing impediment
of tangled, fragile, and rigid code.
This impediment is a runaway effect. The more fragile and rigid the code
within a system becomes, the more difficult that code is to keep clean. As the
fragility of the code increases, so does the fear of change. Developers
become ever-more reluctant to clean up messy code because they fear that
any such effort will lead to more defects.
This process leads, in a matter of months, to an extreme and accelerating loss
of productivity. Each succeeding month sees the team’s productivity seeming
to approach zero asymptotically.
Managers often try to combat this decline in productivity by adding
manpower to the project. But this strategy often fails because the new
programmers brought onto the team are no less subject to the fear of change
than the programmers who have been there all along. They quickly learn to
emulate the behavior of those team members and thereby perpetuate the
problem.
When pressed about the loss of productivity, the developers often complain
about the awful nature of the code. They may even begin to militate for a
redesign of the system. Once begun, this complaint grows in volume until
managers cannot ignore it.
The argument posed by the developers is that they can increase productivity
if they redesign the system from scratch. They argue that they know the
mistakes that were made and will not repeat them. Managers, of course, do
not trust this argument. But managers are desperate for anything that will
increase productivity. In the end, many managers accede to the demands of
the programmers despite the costs and risks.
I do not expect this to happen. I expect development teams to keep their
productivity consistently high. I expect that development teams will reliably
employ the disciplines that keep the structure of the software from degrading.
I expect stable productivity.
10 Quality
Continuous Improvement
I expect continuous improvement.
Human beings improve things with time. Human beings impose order upon
chaos. Human beings make thing better.
Our computers are better than they used to be. Our cars are better than they
used to be. Our airplanes, our roads, our telephones, our TV service, our
communications services are all better than they used to be. Our medical
technology is better than it used to be. Our space technology is better than it
used to be. Our civilization is massively better than it used to be.
Why, then, does software degrade with time? I do not expect our software to
degrade.
I expect that, as time goes by, the design and architecture of our systems will
improve. I expect that the software will get cleaner and more flexible with
every passing week. I expect that the cost of change will decrease as the
software ages. I expect everything to get better with time.
What does it take to make software better with time? It takes will. It takes
attitude. It takes a commitment to the disciplines that we know work.
I expect that every time any programmer checks in code, they check it in
cleaner than they checked it out. I expect that every programmer improves the
code that they touch, regardless of why they are touching it. If they fix a bug,
they should also make the code better. If they are adding a feature, they
should also make the code better. I expect every manipulation of the code to
result in better code, better designs, and better architecture.
I expect continuous improvement.
Fearless Competence
I expect fearless competence.
As the internal structure of a system degrades, the complexity of the system
can rapidly become intractable. This causes the developers to naturally
become more and more fearful of making changes. Even simple
improvements become fraught with risk. Reluctance to make changes and
improvements can drastically lower the programmers’ competence to manage
and maintain the system.
This loss of competence is not intrinsic. The programmers have not become
less competent. Rather, the growing intractable complexity of the system
begins to exceed the natural competence of the programmers.
As the system becomes increasingly difficult for the programmers to handle,
they begin to fear working on it. That fear exacerbates the problem because
programmers who are afraid to change the system will only make changes
that they feel are safest to make. Such changes are seldom those that improve
the system. Indeed, often the so-called safest changes are those that degrade
the system even more.
If this trepidation and timidity are allowed to continue, estimates will
naturally grow, defect rates will increase, deadlines will become harder and
harder to achieve, productivity will plumet, and morale will slide into the
pit.
The solution is to eliminate the fear that accelerates the degradation. We
eliminate the fear by employing the disciplines that create suites of tests that
the programmers trust with their lives.
With such tests in place and with the skill to refactor and drive toward
simple design, the programmers will not fear to clean a system that has
degraded in one place or another. They will have the confidence and the
competence to quickly repair that degradation and keep the software on an
ever-improving trajectory.
I expect the team to always exhibit fearless competence.
Extreme Quality
I expect extreme quality.
When did we first begin to accept that bugs are just a natural part of
software? When did it become acceptable to ship software with a certain
level of defects? When did we decide that beta tests were appropriate for
general distribution?
I do not accept that bugs are unavoidable. I do not accept the attitude that
expects defects. I expect every programmer to deliver software that is defect
free.
And I am not simply referring to behavior defects. I expect every
programmer to deliver software that is free of defects in behavior and
structure.
Is this an achievable goal? Can this expectation be met? Whether it can or
can’t, I expect every programmer to accept it as the standard and
continuously work toward achieving it.
I expect extreme quality coming out of the programming team.
The QA Disease
How do you know if QA is doing a good job? On what basis do you give
them raises and promotions? Is it defect discovery? Are the best QA people
the ones who find the most defects?
If so, then QA views defects as positive things. The more the better! And that
is, of course, sick.
But QA may not be the only folks who view defects in a positive light.
There’s an old saying1 in software: “I can meet any schedule you like, so
long as the software doesn’t need to work.”
1. I first heard it from Kent Beck.
That may sound funny, but it is also a strategy that developers can use to meet
individual deadlines. If it’s QA’s job to find the bugs, why not deliver on
time and let them find some?
No word has to be spoken. No deal has to be made. No handshakes ever
occur. And yet everyone knows that there is an economy of bugs flowing
back and forth between the developers and QA. And that is a very deep
sickness.
I expect that we will not dump on QA.
To avoid this situation, developers should isolate the business rules from the
user interface with a function call API, as shown in Figure 10.2. Tests that
use this API are completely independent of the user interface and are not
subject to interface changes.
Figure 10.2 Tests through the API are independent of the user interface.
Doing so ensures that the tests of the user interface will be fast and
unambiguous. If the user interface is significantly large and complex, then an
automated user interface testing framework may be appropriate. The use of
the business rule stub will make those tests much more reliable.
If the user interface is small and simple, it may be more expeditious to fall
back on manual testing, especially if there are aesthetics to be evaluated.
Again, the use of the business rule stub will make these manual tests much
easier to conduct.
I expect that every test that can practicably be automated will be automated,
that the tests will execute quickly, and that the tests will not be fragile.
11 Courage
Honest Estimates
What is that 5? It’s an arbitrary number of points that is relative to some task
of known size. For example, let’s say that the Login story was arbitrarily
given 3 points. When you estimate the Deposit story, you decide that Deposit
is not quite twice as hard as Login, so you give it a 5. That’s really all there
is to it.
Story points already have the probability distribution embedded within them.
First, the points are not dates or times; they are just points. Second, the points
are not promises; they are guesses. At the end of each Agile iteration (usually
a week or two), we total the points completed. We use that number to
estimate how many points we might complete in the next iteration.
I expect honest estimates that describe your uncertainty. I do not expect a
promise of a date.
How and when do you do this learning? If your employer provides you the
time and space to do this kind of learning, then take as much advantage of it
as you can. If your employer is not so helpful, then you’ll have to learn on
your own time. Be prepared to spend several hours per month on it. Make
sure you have the personal time set aside for it.
Yes, I know, you have family obligations, there are bills to pay and planes to
catch, and you’ve got a life. Okay, but you also have a profession. And
professions need care and maintenance.
I expect us all to be continuous aggressive learners.
Mentoring
We seem to have an unending need for more and more programmers. The
number of programmers in the world is increasing at a furious and
exponential pace. Universities can only teach so much, and unfortunately,
many of them fail to teach much at all.
Therefore, the job of teaching new programmers falls to us. We, the
programmers who have been working for a few years, must pick up the
burden of teaching those who have just started.
Perhaps you think this is hard. It is. But it comes with a huge benefit. The best
way to learn is to teach. Nothing else even comes close. So, if you want to
learn something, teach it.
If you have been a programmer for 5 years, or 10 years, or 15 years, you
have an immense amount of experience and life lessons to teach to the new
programmers who have just started. Take one or two of them under your wing
and guide them through their first 6 months.
Sit down with them at their workstations and help them write code. Tell them
stories about your past failures and successes. Teach them about disciplines,
standards, and ethics. Teach them the craft.
I expect all programmers to become mentors. I expect you to get involved in
helping others to learn.
III The Ethics
The First Programmer
The profession of software began, inauspiciously, in the summer of 1935
when Alan Turing began work on his paper. His goal was to resolve a
mathematical dilemma that had perplexed mathematicians for a decade or
more—the Entscheidungsproblem. The decision problem.
In that goal he was successful, but he had no idea, at the time, that his paper
would spawn a globe-spanning industry upon which we would all depend
and that now forms the life’s blood of our entire civilization.
Many people think of Lord Byron’s daughter, Ada, the countess of Lovelace,
as the first programmer, and with good reason. She was the first person we
know of who understood that the numbers manipulated by a computing
machine could represent nonnumeric concepts. Symbols instead of numbers.
And, to be fair, Ada did write some algorithms for Charles Babbage’s
Analytical Engine, which unfortunately was never built.
But it was Alan Turing who wrote the first1 programs to execute in an
electronic computer. And it was Alan Turing who first defined the profession
of software.
In 1945, Turing wrote code for the Automated Computing Engine (ACE). He
wrote this code in binary machine language, using base-32 numbers. Code
like this had never been written before, so he had to invent and implement
concepts such as subroutines, stacks, and floating-point numbers.
After several months of inventing the basics and using them to solve
mathematical problems, he wrote a report that stated the following
conclusion:
We shall need a great number of mathematicians of ability, because there will probably
be a good deal of work of this kind to be done.
1. Some folks point out that Konrad Zuse wrote algorithms for his electromechanical computer before
Turing programmed the ACE.
“A great number.” How did he know? In reality, he had no idea just how
prescient that statement was—we certainly have a great number now.
But what was that other thing he said? “Mathematicians of ability.” Do you
consider yourself to be a mathematician of ability?
In the same report, he went on to write:
One of our difficulties will be the maintenance of an appropriate discipline, so that we
do not lose track of what we are doing.
“Discipline!” How did he know? How could he look forward 70 years and
know that our problem would be discipline?
Seventy years ago, Alan Turing laid the first stone in the framework of
software professionalism. He said that we should be mathematicians of
ability who maintain an appropriate discipline.
Is that who we are? Is that who you are?
Seventy-Five Years
One person’s lifetime. That’s how old our profession is as of this writing.
Just 75 years. And what has happened in those three score and fifteen years?
Let’s more carefully revisit the history that I presented in Chapter 1,
“Craftsmanship.”
In 1945, there was one computer in the world and one programmer. Alan
Turing. These numbers grew rapidly in those first years. But let’s use this as
our origin.
1945. Computers: O(1). Programmers: O(1).
In the decade that followed, the reliability, consistency, and power usage of
vacuum tubes improved dramatically. This made it possible for larger and
more powerful computers to be built.
By 1960, IBM had sold 140 of its 700 series computers. These were huge,
expensive behemoths that could only be afforded by military, government,
and very large corporations. They were also slow, resource limited, and
fragile.
It was during this period that Grace Hopper invented the concept of a higher-
level language and coined the term compiler. By 1960, her work led to
COBOL.
In 1952, John Backus submitted the FORTRAN specification. This was
followed rapidly by the development of ALGOL. By 1958, John McCarthy
had developed LISP. The proliferation of the language zoo had begun.
In those days, there were no operating systems, no frameworks, no subroutine
libraries. If something executed on your computer, it was because you wrote
it. Consequently, in those days, it took a staff of a dozen or more
programmers just to keep one computer running.
By 1960, 15 years after Turing, there were O(100) computers in the world.
The number of programmers was an order of magnitude greater: O(1,000).
Who were these programmers? They were people like Grace Hopper, Edsger
Dijkstra, Jon Von Neumann, John Backus, and Jean Jennings. They were
scientists and mathematicians and engineers. Most were people who already
had careers and already understood the businesses and disciplines they were
employed by. Many, if not most, were in their 30s, 40s, and 50s.
The 1960s was the decade of the transistor. Bit by bit, these small, simple,
inexpensive, and reliable devices replaced the vacuum tube. And the effect
on computers was a game changer.
By 1965, IBM had produced more than ten thousand 1401 transistor-based
computers. They rented for about $2,500 per month, putting them within
reach of thousands of medium-sized businesses.
These machines were programmed in Assembler, Fortran, COBOL, and
RPG. And all those companies who rented those machines needed staffs of
programmers to write their applications.
IBM wasn’t the only company making computers at the time, so we’ll just say
that by 1965, there were O(10,000) computers in the world. And if each
computer needed ten programmers to keep it running, there must have been
O(100,000) programmers.
Twenty years after Turing, there must have been several hundred thousand
programmers in the world. Where did these programmers come from? There
weren’t enough mathematicians, scientists, and engineers to cover the need.
And there weren’t any computer science graduates coming out of the
universities because there weren’t any computer science degree programs—
anywhere.
Companies thus drew from the best and brightest of their accountants, clerks,
planners, and so on—anyone with some proven technical aptitude. And they
found lots.
And, again, these were people who were already professionals in another
field. They were in their 30s and 40s. They already understood deadlines,
and commitments, what to leave in, and what to leave out.2 Although these
people were not mathematicians per se, they were disciplined professionals.
Turing would likely have approved.
But the crank kept on turning. By 1966, IBM was producing a thousand 360s
every month. These computers were popping up everywhere. They were
immensely powerful for the day. The model 30 could address 64K bytes of
memory and execute 35,000 instructions per second.
It was during this period, in the mid-1960s, when Ole Johann Dahl and
Kristen Nygard invented Simula-67—the first object-oriented language. It
was also during this period that Edsger Dijkstra invented structured
programming. And it was also during this time that Ken Thompson and
Dennis Ritchie invented C and UNIX.
2. Apologies to Bob Seger.
Still the crank turned. In the early 1970s, the integrated circuit came into
regular use. These little chips could hold dozens, hundreds, even thousands
of transistors. They allowed electronic circuits to be massively miniaturized.
And thus, the minicomputer was born.
In the late 1960s and into the 1970s, Digital Equipment Corporation sold fifty
thousand PDP-8 systems and hundreds of thousands of PDP-11 systems.
And they weren’t alone! The minicomputer market exploded. By the mid-
1970s, there were dozens and dozens of companies selling minicomputers, so
by 1975, 30 years after Turing, there were about 1 million computers in the
world. And how many programmers were there? The ratio was starting to
change. The number of computers was approaching the number of
programmers, so by 1975, there were O(1E6) programmers.
Where did these millions of programmers come from? Who were they?
They were me. Me and my buddies. Me and my cohort of young, energetic,
geeky boys.
Tens of thousands of new electronic engineering and computer science grads:
We were all young. We were all smart. We, in the United States, were all
concerned about the draft. And we were almost all male.
Oh, it’s not that women were leaving the field in any number—yet. That
didn’t start until the mid-1980s. No, it’s just that many more boys (and we
were boys) were entering the field.
In my first job as a programmer, in 1969, there were a couple of dozen
programmers. They were all in their 30s or 40s, and a third to a half were
women.
Ten years later, I was working at a company with about 50 programmers, and
perhaps three were women.
So, 30 years after Turing, the demographics of programming had shifted
dramatically toward very young men. Hundreds of thousands of twenty-
something males. We were typically not what Turing would have described
as disciplined mathematicians.
But businesses had to have programmers. The demand was through the roof.
And what very young men lack in discipline, they make up for with energy.
We were also cheap. Despite the high starting salaries of programmers today,
back then companies could pick up programmers pretty inexpensively. My
starting salary in 1969 was $7,200 per year.
This has been the trend ever since. Young men have been pouring out of
computer science programs every year, and industry seems to have an
insatiable appetite for them.
In the 30 years between 1945 and 1975, the number of programmers grew by
at least a factor of a million. In the 40 years since then, that growth rate has
slowed a bit but is still very high.
How many programmers do you think were in the world by 2020? If you
include the VBA3 programmers, I think there must be hundreds of millions of
programmers in the world today.
This is clearly exponential growth. Exponential growth curves have a
doubling rate. You can do the math. Hey, Albert, what’s the doubling rate if
we go from 1 to 100 million in 75 years?
The log to the base 2 of 100 million is approximately 27—divide that into
75, and you get about 2.8, so perhaps the number of programmers doubled
roughly every two and a half-ish years.
3. Visual Basic for Applications.
Actually, as we saw earlier, the rate was higher in the first decades and has
slowed down a bit now. My guess is that the doubling rate is about five
years. Every five years, the number of programmers in the world doubles.
The implications of that fact are staggering. If the number of programmers in
the world doubles every five years, it means that half the programmers in the
world have less than five years’ experience, and this will always be true so
long as that doubling rate continues. This leaves the programming industry in
the precarious position of—perpetual inexperience.
Catastrophes
We’ve seen plenty of software catastrophes over the years. Some have been
pretty spectacular.
For example, in 2016, we lost the Schiaparelli Mars Lander and Rover
because of a software issue that caused the lander to believe it had already
landed when it was actually nearly 4 kilometers above the surface.
In 1999, we lost the Mars Climate Orbiter because of a ground-based
software error that transmitted data to the orbiter using English units (pound-
seconds) rather than metric units (newton-seconds). This error caused the
orbiter to descend too far into the Martian atmosphere, where it was torn to
pieces.
In 1996, the Ariane 5 launch vehicle and payload was destroyed 37 seconds
after launch because of an integer overflow exception when a 64-bit floating-
point number underwent an unchecked conversion to a 16-bit integer. The
exception crashed the onboard computers, and the vehicle self-destructed.
Should we talk about the Therac-25 radiation therapy machine that, because
of a race condition, killed three people and injured three others by blasting
them with a high-powered electron beam?
Or maybe we should talk about Knight Capital Group, which lost $460
million in 45 minutes because it reused a flag that activated dead code left in
the system.
Or perhaps we should talk about the Toyota stack overflow bug that could
cause cars to accelerate out of control—killing perhaps as many as 89
people.
Or maybe we should talk about HealthCare.gov, the software failure that
nearly overturned a new and controversial American healthcare law.
These disasters have cost billions of dollars and many lives. And they were
caused by programmers.
We, programmers, through the code that we write, are killing people.
Now, I know you didn’t get into this business in order to kill people.
Probably, you became a programmer because you wrote an infinite loop that
printed your name once, and you experienced that joyous feeling of power.
But facts are facts. We are now in a position in our society where our actions
can destroy fortunes, livelihoods, and lives.
One day, probably not too long from now, some poor programmer is going to
do something just a little dumb, and tens of thousands of people will die.
This isn’t wild speculation—it’s just a matter of time.
And when this happens, the politicians of the world will demand an
accounting. They will demand that we show them how we will prevent such
errors in the future.
And if we show up without a statement of ethics, if we show up without any
standards or defined disciplines, if we show up and whine about how our
bosses set unreasonable schedules and deadlines—then we will be found
GUILTY.
The Oath
To begin the discussion of our ethics as software developers, I offer the
following oath.
In order to defend and preserve the honor of the profession of computer programmers, I
promise that, to the best of my ability and judgment:
First, Do No Harm
Promise 1. I will not produce harmful code.
No Harm to Society
First, you will do no harm to the society in which you live.
This is the rule that the VW programmers broke. Their software might have
benefited their employer—Volkswagen. However, it harmed society in
general. And we, programmers, must never do that.
But how do you know whether or not you are harming society? For example,
is building software that controls weapon systems harmful to society? What
about gambling software? What about violent or sexist video games? What
about pornography?
If your software is within the law, might it still be harmful to society?
Frankly, that’s a matter for your own judgment. You’ll just have to make the
best call you can. Your conscience is going to have to be your guide.
Another example of harm to society was the failed launch of HealthCare.gov,
although in this case the harm was unintentional. The Affordable Care Act
was passed by the US Congress and signed into law by the president in 2010.
Among its many directives was the demand that a Web site be created and
activated on October 1, 2013.
Never mind the insanity of specifying, by law, a date certain by which a
whole new massive software system must be activated. The real problem
was that on October 1, 2013, they actually turned it on.
Do you think, maybe, there were some programmers hiding under their desks
that day?
Oh man, I think they turned it on.
Yeah, yeah, they really shouldn’t be doing that.
Oh, my poor mother. Whatever will she do?
This is a case where a technical screwup put a huge new public policy at
risk. The law was very nearly overturned because of these failures. And no
matter what you think about the politics of the situation, that was harmful to
society.
Who was responsible for that harm? Every programmer, team lead, manager,
and director who knew that system wasn’t ready and who nevertheless
remained silent.
That harm to society was perpetrated by every software developer who
maintained a passive-aggressive attitude toward their management, everyone
who said, “I’m just doing my job—it’s their problem.” Every software
developer who knew something was wrong and yet did nothing to stop the
deployment of that system shares part of the blame.
Because here’s the thing. One of the prime reasons you were hired as a
programmer was because you know when things are going wrong. You have
the knowledge to identify trouble before it happens. Therefore, you have the
responsibility to speak up before something terrible happens.
Harm to Function
You must KNOW that your code works. You must KNOW that the functioning
of your code will not do harm to your company, your users, or your fellow
programmers.
On August 1, 2012, some technicians at Knight Capital Group loaded their
servers with new software. Unfortunately, they loaded only seven of the eight
servers, leaving the eighth running with the older version.
Why they made this mistake is anybody’s guess. Somebody got sloppy.
Knight Capital ran a trading system. It traded stocks on the New York Stock
Exchange. Part of its operation was to take large parent trades and break
them up into many smaller child trades in order to prevent other traders from
seeing the size of the initial parent trade and adjusting prices accordingly.
Eight years earlier, a simple version of this parent–child algorithm, named
Power Peg, was disabled and replaced with something much better called
Smart Market Access Routing System (SMARS). Oddly, however, the old
Power Peg code was not removed from the system. It was simply disabled
with a flag.
That flag had been used to regulate the parent–child process. When the flag
was on, child trades were made. When enough child trades had been made to
satisfy the parent trade, the flag was turned off.
They disabled the Power Peg code by simply leaving the flag off.
Unfortunately, the new software update that made it on to only seven of the
eight servers repurposed that flag. The update turned the flag on, and that
eighth server started making child trades in a high-speed infinite loop.
The programmers knew something had gone wrong, but they didn’t know
exactly what. It took them 45 minutes to shut down that errant server—45
minutes during which it was making bad trades in an infinite loop.
The bottom line is that, in that first 45 minutes of trading, Knight Capital had
unintentionally purchased more than $7 billion worth of stock that it did not
want and had to sell it off at a $460 million loss. Worse, the company only
had $360 million in cash. Knight Capital was bankrupt.
Forty-five minutes. One dumb mistake. Four hundred sixty million dollars.
And what was that mistake? The mistake was that the programmers did not
KNOW what their system was going to do.
At this point, you may be worried that I’m demanding that programmers have
perfect knowledge about the behavior of their code. Of course, perfect
knowledge is not achievable; there will always be a knowledge deficit of
some kind.
The issue isn’t the perfection of knowledge. Rather, the issue is to KNOW
that there will be no harm.
Those poor guys at Knight Capital had a knowledge deficit that was horribly
harmful—and given what was at stake, it was one they should not have
allowed to exist.
Another example is the case of Toyota and the software system that made
cars accelerate uncontrollably.
As many as 89 people were killed by that software, and more were injured.
Imagine that you are driving in a crowded downtown business district.
Imagine that your car suddenly begins to accelerate, and your brakes stop
working. Within seconds, you are rocketing through stoplights and
crosswalks with no way to stop.
That’s what investigators found that the Toyota software could do—and quite
possibly had done.
The software was killing people.
The programmers who wrote that code did not KNOW that their code would
not kill—notice the double negative. They did not KNOW that their code
would NOT kill. And they should have known that. They should have known
that their code would NOT kill.
Now again, this is all about risk. When the stakes are high, you want to drive
your knowledge as near to perfection as you can get it. If lives are at stake,
you have to KNOW that your code won’t kill anyone. If fortunes are a stake,
you have to KNOW that your code won’t lose them.
On the other hand, if you are writing a chat application or a simple shopping
cart Web site, neither fortunes nor lives are at stake …
… or are they?
What if someone using your chat application has a medical emergency and
types “HELP ME. CALL 911” on your app? What if your app malfunctions
and drops that message?
What if your Web site leaks personal information to hackers who use it to
steal identities?
And what if the poor functioning of your code drives customers away from
your employer and to a competitor?
The point is that it’s easy to underestimate the harm that can be done by
software. It’s comforting to think that your software can’t harm anyone
because it’s not important enough. But you forget that software is very
expensive to write, and at very least, the money spent on its development is
at stake—not to mention whatever the users put at stake by depending on it.
The bottom line is, there’s almost always more at stake than you think.
No Harm to Structure
You must not harm the structure of the code. You must keep the code clean
and well organized.
Ask yourself why the programmers at Knight Capital did not KNOW that
their code could be harmful.
I think the answer is pretty obvious. They had forgotten that the Power Peg
software was still in the system. They had forgotten that the flag they
repurposed would activate it. And they assumed that all servers would
always have the same software running.
They did not know their system could exhibit harmful behavior because of
the harm they had done to the structure of the system by leaving that dead
code in place. And that’s one of the big reasons why code structure and
cleanliness are so important. The more tangled the structure, the more
difficulty in knowing what the code will do. The more the mess, the more the
uncertainty.
Take the Toyota case, for example. Why didn’t the programmers know their
software could kill people? Do you think the fact that they had more than ten
thousand global variables might have been a factor?
Making a mess in the software undermines your ability to know what the
software does and therefore your ability to prevent harm.
Messy software is harmful software.
Some of you might object by saying that sometimes a quick and dirty patch is
necessary to fix a nasty production bug.
Sure. Of course. If you can fix a production crisis with a quick and dirty
patch, then you should do it. No question.
A stupid idea that works is not a stupid idea.
However, you can’t leave that quick and dirty patch in place without causing
harm. The longer that patch remains in the code, the more harm it can do.
Remember that the Knight Capital debacle would not have happened if the
old Power Peg code had been removed from the code base. It was that old
defunct code that actually made the bad trades.
What do we mean by “harm to structure”? Clearly, having thousands of
global variables is a structural flaw. So is dead code left in the code base.
Structural harm is harm to the organization and content of the source code. It
is anything that makes that source code hard to read, hard to understand, hard
to change, or hard to reuse.
It is the responsibility of every professional software developer to know the
disciplines and standards of good software structure. They should know how
to refactor, how to write tests, how to recognize bad code, how to decouple
designs and create appropriate architectural boundaries. They should know
and apply the principles of low- and high-level design. And it is the
responsibility of every senior developer to make sure that younger
developers learn these things and satisfy them in the code that they write.
Soft
The first word in software is soft. Software is supposed to be SOFT. It’s
supposed to be easy to change. If we didn’t want it to be easy to change,
we’d have called it hardware.
It’s important to remember why software exists at all. We invented it as a
means to make the behavior of machines easy to change. To the extent that our
software is hard to change, we have thwarted the very reason that software
exists.
Remember that software has two values. There’s the value of its behavior,
and then there’s the value of its “softness.” Our customers and users expect
us to be able to change that behavior easily and without high cost.
Which of these two values is the greater? Which value should we prioritize
above the other? We can answer that question with a simple thought
experiment.
Imagine two programs. One works perfectly but is impossible to change. The
other does nothing correctly but is easy to change. Which is the more
valuable?
I hate to be the one to tell you this but, in case you hadn’t noticed, software
requirements tend to change, and when they change, that first software will
become useless—forever.
On the other hand, that second software can be made to work, because it’s
easy to change. It may take some time and money to get it to work initially,
but after that, it’ll continue to work forever with minimal effort.
Therefore, it is the second of the two values that should be given priority in
all but the most urgent of situations.
What do I mean by urgent? I mean a production disaster that’s losing the
company $10 million per minute. That’s urgent.
I do not mean a software startup. A startup is not an urgent situation requiring
you to create inflexible software. Indeed, the opposite is true. The one thing
that is absolutely certain in a startup is that you are creating the wrong
product.
No product survives contact with the user. As soon as you start to put the
product into users’ hands, you will discover that the product you’ve built is
wrong in a hundred different ways. And if you can’t change it without making
a mess, you’re doomed.
This is one of the biggest problems with software startups. Startup
entrepreneurs believe they are in an urgent situation requiring them to throw
out all the rules and dash to the finish line, leaving a huge mess in their wake.
Most of the time, that huge mess starts to slow them down long before they do
their first deployment. They’ll go faster and better and will survive a lot
longer if they keep the structure of the software from harm.
When it comes to software, it never pays to rush.
—Brian Marick
Tests
Tests come first. You write them first, and you clean them first. You will
know that every line of code works because you will have written tests that
prove they work.
How can you prevent harm to the behavior of your code if you don’t have the
tests that prove that it works?
How can you prevent harm to the structure of your code if you don’t have the
tests that allow you to clean it?
And how can you guarantee that your test suite is complete if you don’t
follow the three laws of test-driven development (TDD)?
Is TDD really a prerequisite to professionalism? Am I really suggesting that
you can’t be a professional software developer unless you practice TDD?
Yes, I think that’s true. Or rather, it is becoming true. It is true for some of us,
and with time, it is becoming true for more and more of us. I think the time
will come, and relatively soon, when the majority of programmers will agree
that practicing TDD is part of the minimum set of disciplines and behaviors
that mark a professional developer.
Why do I believe that?
Because, as I said earlier, we rule the world! We write the rules that make
the whole world work.
In our society, nothing gets bought or sold without software. Nearly all
correspondence is through software. Nearly all documents are written with
software. Laws don’t get passed or enforced without it. There is virtually no
activity of daily life that does not involve software.
Without software, our society does not function. Software has become the
most important component in the infrastructure of our civilization.
Society does not understand this yet. We programmers don’t really
understand it yet either. But the realization is dawning that the software we
write is critical. The realization is dawning that many lives and fortunes
depend on our software. And the realization is dawning that software is
being written by people who do not profess a minimum level of discipline.
So, yes, I think TDD, or some discipline very much like it, will eventually be
considered a minimum standard behavior for professional software
developers. I think our customers and our users will insist on it.
Best Work
Promise 2. The code that I produce will always be my best work. I will not knowingly
allow code that is defective in either behavior or structure to accumulate.
Kent Beck once said, “First make it work. Then make it right.”
Getting the program to work is just the first—and easiest—step. The second
—and harder—step is to clean the code.
Unfortunately, too many programmers think they are done once they get a
program to work. Once it works, they move on to the next program, and then
the next, and the next.
In their wake, they leave behind a history of tangled, unreadable, code that
slows down the whole development team. They do this because they think
their value is in speed. They know they are paid a lot, and so they feel that
they must deliver a lot of functionality in a short amount of time.
But software is hard and takes a lot of time, and so they feel like they are
going too slowly. They feel like they are failing, which creates a pressure
that causes them to try to go faster. It causes them to rush. They rush to get the
program working, and then they declare themselves to be done—because, in
their minds, it’s already taken them too long. The constant tapping of the
project manager’s foot doesn’t help, but that’s not the real driver.
I teach lots of classes. In many of them, I give the programmers small
projects to code. My goal is to give them a coding experience in which to try
out new techniques and new disciplines. I don’t care if they actually finish
the project. Indeed, all the code is going to be thrown away.
Yet still I see people rushing. Some stay past the end of the class just
hammering away at getting something utterly meaningless to work.
So, although the boss’s pressure doesn’t help. The real pressure comes from
inside us. We consider speed of development to be a matter of our own self-
worth.
Making It Right
As we saw earlier in this chapter, there are two values to software. There is
the value of its behavior and the value of its structure. I also made the point
that the value of structure is more important than the value of behavior. This
is because to have any long-term value, software systems must be able to
react to changes in requirements.
Software that is hard to change also is hard to keep up to date with the
requirements. Software with a bad structure can quickly get out of date.
In order for you to keep up with requirements, the structure of the software
has to be clean enough to allow, and even encourage, change. Software that is
easy to change can be kept up to date with changing requirements, allowing it
to remain valuable with the least amount of effort. But if you’ve got a
software system that’s hard to change, then you’re going to have a devil of a
time keeping that system working when the requirements change.
When are requirements most likely to change? Requirements are most
volatile at the start of a project, just after the users have seen the first few
features work. That’s because they’re getting their first look at what the
system actually does as opposed to what they thought it was going to do.
Consequently, the structure of the system needs to be clean at the very start if
early development is to proceed quickly. If you make a mess at the start, then
even the very first release will be slowed down by that mess.
Good structure enables good behavior and bad structure impedes good
behavior. The better the structure, the better the behavior. The worse the
structure, the worse the behavior. The value of the behavior depends
critically on the structure. Therefore, the value of the structure is the most
critical of the two values, which means that professional developers put a
higher priority on the structure of the code than on the behavior.
Yes, first you make it work; but then you make very sure that you continue to
make it right. You keep the system structure as clean as possible throughout
the lifetime of the project. From its very beginning to its very end, it must be
clean.
What Is Good Structure?
Good structure makes the system easy to test, easy to change, and easy to
reuse. Changes to one part of the code do not break other parts of the code.
Changes to one module do not force massive recompiles and redeployments.
High-level policies are kept separate and independent from low-level
details.
Poor structure makes a system rigid, fragile, and immobile. These are the
traditional design smells.
Rigidity is when relatively minor changes to the system cause large portions
of the system to be recompiled, rebuilt, and redeployed. A system is rigid
when the effort to integrate a change is much greater than the change itself.
Fragility is when minor changes to the behavior of a system force many
corresponding changes in a large number of modules. This creates a high risk
that a small change in behavior will break some other behavior in the system.
When this happens, your managers and customers will come to believe that
you have lost control over the software and don’t know what you are doing.
Immobility is when a module in an existing system contains a behavior you
want to use in a new system but is so tangled up within the existing system
that you can’t extract it for use in the new system.
These problems are all problems of structure, not behavior. The system may
pass all its tests and meet all its functional requirements, yet such a system
may be close to worthless because it is too difficult to manipulate.
There’s a certain irony in the fact that so many systems that correctly
implement valuable behaviors end up with a structure that is so poor that it
negates that value and makes the system worthless.
And worthless is not too strong a word. Have you ever participated in the
Grand Redesign in the Sky? This is when developers tell management that
the only way to make progress is to redesign the whole system from scratch;
those developers have assessed the current system to be worthless.
When managers agree to let the developers redesign the system, it simply
means that they have agreed with the developers’ assessment that the current
system is worthless.
What is it that causes these design smells that lead to worthless systems?
Source code dependencies! How do we fix those dependencies? Dependency
management!
How do we manage dependencies? We use the SOLID5 principles of object-
oriented design to keep the structure of our systems free of the design smells
that would make it worthless.
5. These principles are described in Robert C. Martin, Clean Code (Addison-Wesley, 2009), and Agile
Software Development: Principles, Patterns, and Practices (Pearson, 2003).
Because the value of structure is greater than the value of behavior and
depends on good dependency management, and because good dependency
management derives from the SOLID principles, it follows that overall value
of the system depends on proper application of the SOLID principles.
That’s quite a claim, isn’t it? Perhaps it’s a bit hard to believe. The value of
the system depends on design principles. But we’ve gone through the logic,
and many of you have had the experiences to back it up. So, that conclusion
bears some serious consideration.
Eisenhower’s Matrix
General Dwight D. Eisenhower once said, “I have two kinds of problems,
the urgent and the important. The urgent are not important, and the important
are never urgent.”
There is a deep truth to this statement—a deep truth about engineering. We
might even call this the engineer’s motto:
The greater the urgency, the less the relevance.
Now let’s arrange these in order of priority. The two obvious cases are
important and urgent at the top and neither important nor urgent at the bottom.
The question is, how do you sort the two in the middle: urgent but
unimportant and important but not urgent? Which should you address first?
Clearly, things that are important should be prioritized over things that are
unimportant. I would argue, furthermore, that if something is unimportant, it
shouldn’t be done at all. Doing unimportant things is a waste.
If we eliminate all the unimportant things, we have two left. We do the
important and urgent things first. Then we do the important but not urgent
things second.
My point is that urgency is about time. Importance is not. Things that are
important are long-term. Things that are urgent are short-term. Structure is
long-term. Therefore, it is important. Behavior is short-term. Therefore, it is
merely urgent.
So, structure, the important stuff, comes first. Behavior is secondary.
Your boss might not agree with that priority, but that’s because it’s not your
boss’s job to worry about structure. It’s yours. Your boss simply expects that
you will keep the structure clean while you implement the urgent behaviors.
Earlier in this chapter, I quoted Kent Beck: “First make it work, then make it
right.” Now I’m saying that structure is a higher priority than behavior. It’s a
chicken-and-egg dilemma, isn’t it?
The reason we make it work first is that structure has to support behavior;
consequently, we implement behavior first, then we give it the right structure.
But structure is more important than behavior. We give it a higher priority.
We deal with structural issues before we deal with behavioral issues.
We can square this circle by breaking the problem down into tiny units. Let’s
start with user stories. You get a story to work, and then you get its structure
right. You don’t work on the next story until that structure is right. The
structure of the current story is a higher priority than the behavior of the next
story.
Except, stories are too big. We need to go smaller. Not stories, then—tests.
Tests are the perfect size.
First you get a test to pass, then you fix the structure of the code that passes
that test before you get the next test to pass.
This entire discussion has been the moral foundation for the red → green →
refactor cycle of TDD.
It is that cycle that helps us to prevent harm to behavior and harm to structure.
It is that cycle that allows us to prioritize structure over behavior. And that’s
why we consider TDD to be a design technique as opposed to a testing
technique.
Repeatable Proof
Promise 3. I will produce, with each release, a quick, sure, and repeatable proof that
every element of the code works as it should.
Dijkstra
Edsger Wybe Dijkstra was born in Rotterdam in 1930. He survived the
bombing of Rotterdam and the German occupation of the Netherlands, and in
1948, he graduated high school with the highest possible marks in math,
physics, chemistry, and biology.
In March 1952, at the age of 21, and just 9 months before I would be born, he
took a job with the Mathematical Center in Amsterdam as the Netherlands’
very first programmer.
In 1957, he married Maria Debets. In the Netherlands, at the time, you had to
state your profession as part of the marriage rites. The authorities were
unwilling to accept “programmer” as his profession. They’d never heard of
such a profession. Dijkstra settled for “theoretical physicist.”
In 1955, having been a programmer for 3 years, and while still a student, he
concluded that the intellectual challenge of programming was greater than the
intellectual challenge of theoretical physics, and as a result, he chose
programming as his long-term career.
In making this decision, he conferred with his boss, Adriaan van
Wijngaarden. Dijkstra was concerned that no one had identified a discipline
or science of programming and that he would therefore not be taken
seriously. His boss replied that Dijkstra might very well be one of the people
who would make it a science.
In pursuit of that goal, Dijkstra was compelled by the idea that software was
a formal system, a kind of mathematics. He reasoned that software could
become a mathematical structure rather like Euclid’s Elements—a system of
postulates, proofs, theorems, and lemmas. He therefore set about creating the
language and discipline of software proofs.
Proving Correctness
Dijkstra realized that there were only three techniques we could use to prove
the correctness of an algorithm: enumeration, induction, and abstraction.
Enumeration is used to prove that two statements in sequence or two
statements selected by a Boolean expression are correct. Induction is used to
prove that a loop is correct. Abstraction is used to break groups of statements
into smaller provable chunks.
If this sounds hard, it is.
As an example of just how hard this is, I have included a simple Java
program for calculating the remainder of an integer (Figure 12.2), and the
handwritten proof of that algorithm (Figure 12.3).6
6. This is a translation into Java of a demonstration from Dijkstra’s work.
I think you can see the problem with this approach. Indeed, this is something
that Dijkstra complained bitterly about:
Of course I would not dare to suggest (at least at present!) that it is the programmer’s
duty to supply such a proof whenever he writes a simple loop in his program. If so, he
could never write a program of any size at all.
Dijkstra’s hope was that such proofs would become more practical through
the creation of a library of theorems, again similar to Euclid’s Elements.
But Dijkstra did not understand just how prevalent and pervasive software
would become. He did not foresee, in those early days, that computers would
outnumber people and that vast quantities of software would be running in the
walls of our homes, in our pockets, and on our wrists. Had he known, he
would have realized that the library of theorems he envisioned would be far
too vast for any mere human to grasp.
So, Dijkstra’s dream of explicit mathematical proofs for programs has faded
into near oblivion. Oh, there are some holdouts who hope against hope for a
resurgence of formal proofs, but their vision has not penetrated very far into
the software industry at large.
Although the dream may have passed, it drew something deeply profound in
its wake. Something that we use today, almost without thinking about it.
Structured Programming
In the early days of programming, the 1950s and 1960s, we used languages
such as Fortran. Have you ever seen Fortran? Here, let me show you what it
was like.
Click here to view code image
WRITE(4,99)
99 FORMAT(" NUMERATOR:")
READ(4,100)NN
WRITE(4,98)
98 FORMAT(" DENOMINATOR:")
READ(4,100)ND
100 FORMAT(I6)
NR=NN
NDD=ND
1 IF(NDD-NR)2,2,3
2 NDD=NDD*2
GOTO 1
3 IF(NDD-ND)4,10,4
4 NDD=NDD/2
IF(NDD-NR)5,5,6
5 NR=NR-NDD
6 GOTO 3
10 WRITE(4,20)NR
20 FORMAT(" REMAINDER:",I6)
END
This little Fortran program implements the same remainder algorithm as the
earlier Java program.
Now I’d like to draw your attention to those GOTO statements. You probably
haven’t seen statements like that very often. The reason you haven’t seen
statements like that very often is that nowadays we look on them with
disfavor. In fact, most modern languages don’t even have GOTO statements like
that anymore.
Why don’t we favor GOTO statements? Why don’t our languages support them
anymore? Because in 1968, Dijkstra wrote a letter to the editor of
Communications of the ACM, titled “Go To Statement Considered
Harmful.”7
7. Edsger W. Dijkstra, “Go To Statement Considered Harmful,” Communications of the ACM 11, no.
3 (1968), 147–148.
Why did Dijkstra consider the GOTO statement to be harmful? It all comes
back to the three strategies for proving a function correct: enumeration,
induction, and abstraction.
Enumeration depends on the fact that each statement in sequence can be
analyzed independently and that the result of one statement feeds into the
next. It should be clear to you that in order for enumeration to be an effective
technique for proving the correctness of a function, every statement that is
enumerated must have a single entry point and a single exit point. Otherwise,
we could not be sure of either the inputs or the outputs of a statement.
What’s more, induction is simply a special form of enumeration, where we
assume the enumerated statement is true for some x and then prove by
enumeration that it is true for x + 1.
Thus, the body of a loop must be enumerable. It must have a single entry and
a single exit.
GOTO is considered harmful because a GOTO statement can jump into or out of
the middle of an enumerated sequence. GOTOs make enumeration intractable,
making it impossible to prove an algorithm correct by enumeration or
induction.
Dijkstra recommended that, in order to keep code provable, it be constructed
of three standard building blocks.
Sequence, which we depict as two or more statements ordered in time.
This simply represents nonbranching lines of code.
Selection, depicted as two or more statements selected by a predicate.
This simply represents if/else and switch/case statements.
Iteration, depicted as a statement repeated under the control of a
predicate. This represents a while or for loop.
Dijkstra showed that any program, no matter how complicated, can be
composed of nothing more than these three structures and that programs
structured in that manner are provable.
He called the technique structured programming.
Why is this important if we aren’t going to write those proofs? If something
is provable, it means you can reason about it. If something is unprovable, it
means you cannot reason about it. And if you can’t reason about it, you can’t
properly test it.
Functional Decomposition
In 1968, Dijkstra’s ideas were not immediately popular. Most of us were
using languages that depended on GOTO, so the idea of abandoning GOTO or
imposing discipline on GOTO was abhorrent.
The debate over Dijkstra’s ideas raged for several years. We didn’t have an
Internet in those days, so we didn’t use Facebook memes or flame wars. But
we did write letters to the editors of the major software journals of the day.
And those letters raged. Some claimed Dijkstra to be a god. Others claimed
him to be a fool. Just like social media today, except slower.
But in time, the debate slowed, and Dijkstra’s position gained increasing
support until, nowadays, most of the languages we use simply don’t have a
GOTO.
Test-Driven Development
TDD, the red → green → refactor cycle, is functional decomposition. After
all, you have to write tests against small bits of the problem. That means that
you must functionally decompose the problem into testable elements.
The result is that every system built with TDD is built from functionally
decomposed elements that conform to structured programming. And that
means the system they compose is provable.
And the tests are the proof.
Or, rather, the tests are the theory.
The tests created by TDD are not a formal, mathematical proof, as Dijkstra
wanted. In fact, Dijkstra is famous for saying that tests can only prove a
program wrong; they can never prove a program right.
This is where Dijkstra missed it, in my opinion. Dijkstra thought of software
as a kind of mathematics. He wanted us to build a superstructure of
postulates, theorems, corollaries, and lemmas.
Instead, what we have realized is that software is a kind of science. We
validate that science with experiments. We build a superstructure of theories
based on passing tests, just as all other sciences do.
Have we proven the theory of evolution, or the theory of relativity, or the Big
Bang theory, or any of the major theories of science? No. We can’t prove
them in any mathematical sense.
But we believe them, within limits, nonetheless. Indeed, every time you get
into a car or an airplane, you are betting your life that Newton’s laws of
motion are correct. Every time you use a GPS system, you are betting that
Einstein’s theory of relativity is correct.
The fact that we have not mathematically proven these theories correct does
not mean that we don’t have sufficient proof to depend on them, even with
our lives.
That’s the kind of proof that TDD gives us. Not formal mathematical proof
but experimental empirical proof. The kind of proof we depend on every day.
And that brings us back to the third promise in the Programmer’s Oath:
I will produce, with each release, a quick, sure, and repeatable proof that every element
of the code works as it should.
Quick, sure, and repeatable. Quick means that the test suite should run in a
very short amount of time. Minutes instead of hours.
Sure means that when the test suite passes, you know you can ship.
Repeatable means that those tests can be run by anybody at any time to
ensure that the system is working properly. Indeed, we want the tests run
many times per day.
Some may think that it is too much to ask that programmers supply this level
of proof. Some may think that programmers should not be held to this high a
standard. I, on the other hand, can imagine no other standard that makes any
sense.
When a customer pays us to develop software for them, aren’t we honor
bound to prove, to the best of our ability, that the software we’ve created
does what that customer has paid us for?
Of course we are. We owe this promise to our customers, and our employers,
and our teammates. We owe it to our business analysts, our testers, and our
project managers. But mostly we owe this promise to ourselves. For how can
we consider ourselves professionals if we cannot prove that the work we
have done is the work we have been paid to do?
What you owe, when you make that promise, is not the formal mathematical
proof that Dijkstra dreamed of; rather, it is the scientific suite of tests that
covers all the required behavior, runs in seconds or minutes, and produces
the same clear pass/fail result every time it is run.
13 Integrity
Several promises in the oath involve integrity.
Small Cycles
Promise 4. I will make frequent, small releases so that I do not impede the progress of
others.
Making small releases just means changing a small amount of code for each
release. The system may be large, but the incremental changes to that system
are small.
The bulletin board had labels on it for each of the modules in the system. We
programmers each had our own color thumbtack. I was blue. Ken was red.
My buddy CK was yellow, and so on.
If I wanted to edit the Trunk Manager module, I’d look on the bulletin board
to see if there was a pin in that module. If not, I put a blue pin in it. Then I
took the master tape from the rack and copied it onto a separate tape.
I would edit the Trunk Manager module, and only the Trunk Manager module,
generating a new tape with my changes. I’d compile, test, wash and repeat
until I got my changes to work. Then I’d go get the master tape from the rack,
and I would create a new master by copying the current master but replacing
the Trunk Manager module with my changes. I’d put the new master back in
the rack.
Finally, I’d remove my blue pin from the bulletin board.
This worked, but it only worked because we all knew each other, we all
worked in the same office together, we each knew what the others were
doing. And we talked to each other all the time.
I’d holler across the lab: “Ken, I’m going to change the Trunk Manager
module.” He’d say: “Put a pin in it.” I’d say: “I already did.”
The pins were just reminders. We all knew the status of the code and who
was working on what. And that’s why the system worked.
Indeed, it worked very well. Knowing what each of the other programmers
was doing meant that we could help each other. We could make suggestions.
We could warn each other about problems we’d recently encountered. And
we could avoid merges.
Back then, merges were NOT fun.
Then, in the 1980s, came the disks. Disks got big, and they got permanent. By
big, I mean hundreds of megabytes. By permanent, I mean that they were
permanently mounted—always on line.
The other thing that happened is that we got machines like PDP11s and
VAXes. We got screen editors, real operating systems, and multiple
terminals. More than one person could be editing at exactly the same time.
The era of thumbtacks and bulletin boards had to come to an end.
First of all, by then, we had twenty or thirty programmers. There weren’t
enough thumbtack colors. Second, we had hundreds and hundreds of
modules. We didn’t have enough space on the bulletin board.
Fortunately, there was a solution.
In 1972, Marc Rochkind wrote the first source code control program. It was
called the Source Code Control System (SCCS), and he wrote it in
SNOBOL.1
1. A lovely little string processing language from the 1960s that had many of the pattern-matching
facilities of our modern languages.
Short Cycles
Consider again how we began. How long was the cycle when source code
was controlled by physically possessing a deck of cards?
You checked the source code out by taking those cards out of the cabinet. You
held on to those cards until you were done with your project. Then you
committed your changes by putting the changed deck of cards back in the
cabinet. The cycle time was the whole project.
Then, when we used thumbtacks on the bulletin board, the same rule applied.
You kept your thumbtacks in the modules you were changing until you were
done with the project you were working on.
Even in the late 1970s and into the 1980s when we were using SCCS and
RCS, we continued to use this pessimistic locking strategy, keeping others
from touching the modules we were changing until we were done.
But CVS changed things—at least for some of us. Optimistic locking meant
that one programmer could not lock others out of a module. We still
committed only when we were done with a project, but others could be
working concurrently on the same modules. Consequently, the average time
between commits on a project shrank drastically. The cost of concurrency, of
course, was merges.
And how we hated doing merges. Merges are awful. Especially without unit
tests! They are tedious, time-consuming, and dangerous.
Our distaste for merges drove us to a new strategy.
Continuous Integration
By 2000, even while we were using tools such as Subversion, we had begun
teaching the discipline of committing every few minutes.
The rationale was simple. The more frequently you commit, the less likely
you are to face a merge. And if you do have to merge, the merge will be
trivial.
We called this discipline continuous integration.
Of course, continuous integration depends critically on having a very reliable
suite of unit tests. I mean, without good unit tests, it would be easy to make a
merge error and break someone else’s code. So, continuous integration goes
hand in hand with test-driven development (TDD).
With tools like git, there is almost no limit to how small we can shrink the
cycle. And that begs the question: Why are we so concerned about shrinking
the cycle?
Because long cycles impede the progress of the team.
The longer the time between commits, the greater the chance that someone
else on the team—perhaps the whole team—will have to wait for you. And
that violates the promise.
Perhaps you think this is only about production releases. No, it’s actually
about every other cycle. It’s about iterations and sprints. It’s about the
edit/compile/test cycle. It’s about the time between commits. It’s about
everything.
And remember the rationale: so that you do not impede the progress of
others.
Continuous Deployment
What if we could eliminate the delays between production releases? What if
we could release to production several times per day? After all, delaying
production releases impedes others.
I want you to be able to release your code to production several times per
day. I want you to feel comfortable enough with your work that you could
release your code to production on every push.
This, of course, depends on automated testing: automated tests written by
programmers to cover every line of code and automated tests written by
business analysts and QA testers to cover every desired behavior.
Remember our discussion of tests in Chapter 12, “Harm.” Those tests are the
scientific proof that everything works as it is supposed to. And if everything
works as it is supposed to, the next step is to deploy to production.
And by the way, that’s how you know if your tests are good enough. Your
tests are good enough if, when they pass, you feel comfortable deploying. If
passing tests don’t allow you to deploy, your tests are deficient.
Perhaps you think that deploying every day or even several times per day
would lead to chaos. However, the fact that you are ready to deploy does not
mean that the business is ready to deploy. As part of a development team,
your standard is to always be ready.
What’s more, we want to help the business remove all the impediments to
deployment so that the business deployment cycle can be shortened as much
as possible. After all, the more ceremony and ritual the business uses for
deployment, the more expensive deployment becomes. Any business would
like to shed that expense.
The ultimate goal for any business is continuous, safe, and ceremony-free
deployment. Deployment should be as close as possible to a nonevent.
Because deployment is often a lot of work, with servers to configure and
databases to load, you need to automate the deployment procedure. And
because deployment scripts are part of the system, you write tests for them.
For many of you, the idea of continuous deployment may be so far from your
current process that it’s inconceivable. But that doesn’t mean there aren’t
things you can do to shorten the cycle.
And who knows? If you keep shortening the cycle, month after month, year
after year, perhaps one day you’ll find that you are deploying continuously.
Continuous Build
Clearly, if you are going to deploy in short cycles, you have to be able to
build in short cycles. If you are going to deploy continuously, you have to be
able to build continuously.
Perhaps some of you have slow build times. If you do, speed them up.
Seriously, with the memory and speed of modern systems, there is no excuse
for slow builds. None. Speed them up. Consider it a design challenge.
And then get yourself a continuous build tool, such as Jenkins, Buildbot, or
Travis, and use it. Make sure that you kick off a build at every push and do
what it takes to ensure that the build never fails.
A build failure is a red alert. It’s an emergency. If the build fails, I want
emails and text messages sent to every team member. I want sirens going off.
I want a flashing red light on the CEO’s desk. I want everybody to stop
whatever they are doing and deal with the emergency.
Keeping the build from failing is not rocket science. You simply run the
build, along with all the tests, in your local environment before you push.
You push the code only when all the tests pass.
If the build fails after that, you’ve uncovered some environmental issue that
needs to be resolved posthaste.
You never allow the build to go on failing, because if you allow the build to
fail, you’ll get used to it failing. And if you get used to the failures, you’ll
start ignoring them. The more you ignore those failures, the more annoying
the failure alerts become. And the more tempted you are to turn the failing
tests off until you can fix them—later. You know. Later?
And that’s when the tests become lies.
With the failing tests removed, the build passes again. And that makes
everyone feel good again. But it’s a lie.
So, build continuously. And never let the build fail.
Relentless Improvement
Promise 5. I will fearlessly and relentlessly improve my creations at every opportunity. I
will never degrade them.
Robert Baden Powell, the father of the Boy Scouts, left a posthumous
message exhorting the scouts to leave the world a better place than they found
it. It was from this statement that I derived my Boy Scout rule: Always check
the code in cleaner than you checked it out.
How? By performing random acts of kindness upon the code every time you
check it in.
One of those random acts of kindness is to increase test coverage.
Test Coverage
Do you measure how much of your code is covered by your tests? Do you
know what percentage of lines are covered? Do you know what percentage
of branches are covered?
There are plenty of tools that can measure coverage for you. For most of us,
those tools come as part of our IDE and are trivial to run, so there’s really no
excuse for not knowing what your coverage numbers are.
What should you do with those numbers? First, let me tell you what not to do.
Don’t turn them into management metrics. Don’t fail the build if your test
coverage is too low. Test coverage is a complicated concept that should not
be used so naively.
Such naive usage sets up perverse incentives to cheat. And it is very easy to
cheat test coverage. Remember that coverage tools only measure the amount
of code that was executed; not the code that was actually tested. This means
that you can drive the coverage number very high by pulling the assertions
out of your failing tests. And, of course, that makes the metric useless.
The best policy is to use the coverage numbers as a developer tool to help
you improve the code. You should work to meaningfully drive the coverage
toward 100 percent by writing actual tests.
One hundred percent test coverage is always the goal, but it is also an
asymptotic goal. Most systems never reach 100 percent, but that should not
deter you from constantly increasing the coverage.
That’s what you use the coverage numbers for. You use them as a
measurement to help you improve, not as a bludgeon with which to punish the
team and fail the build.
Mutation Testing
One hundred percent test coverage implies that any semantic change to the
code should cause a test to fail. TDD is a good discipline to approximate that
goal because, if you follow the discipline ruthlessly, every line of code is
written to make a failing test pass.
Such ruthlessness, however, is often impractical. Programmers are human,
and disciplines are always subject to pragmatics. So, the reality is that even
the most assiduous test-driven developer will leave gaps in the test
coverage.
Mutations testing is a way to find those gaps, and there are mutation testing
tools that can help. A mutation tester runs your test suite and measures the
coverage. Then it goes into a loop, mutating your code in some semantic way
and then running the test suite with coverage again. The semantic changes are
things like changing > to < or == to != or x=<something> to x=null. Each such
semantic change is called a mutation.
The tool expects each mutation to fail the tests. Mutations that do not fail the
tests are called surviving mutations. Clearly, the goal is to ensure that there
are no surviving mutations.
Running a mutation test can be a big investment of time. Even relatively small
systems can require hours of runtime, so these kinds of tests are best run over
weekends or at month’s end. However, I have not failed to be impressed by
what mutation testing tools can find. It is definitely worth the occasional
effort to run them.
Semantic Stability
The goal of test coverage and mutation testing is to create a test suite that
ensures semantic stability. The semantics of a system are the required
behaviors of that system. A test suite that ensures semantic stability is one
that fails whenever a required behavior is broken. We use such test suites to
eliminate the fear of refactoring and cleaning the code. Without a
semantically stable test suite, the fear of change is often too great.
TDD gives us a good start on a semantically stable test suite, but it is not
sufficient for full semantic stability. Coverage, mutation testing, and
acceptance testing should be used to improve the semantic stability toward
completeness.
Cleaning
Perhaps the most effective of the random acts of kindness that will improve
the code is simple cleaning—refactoring with the goal of improvement.
What kinds of improvements can be made? There is, of course, the obvious
elimination of code smells. But I often clean code even when it isn’t smelly.
I make tiny little improvements in the names, in the structure, in the
organization. These changes might not be noticed by anyone else. Some folks
might even think they make the code less clean. But my goal is not simply the
state of the code. By doing little minor cleanings, I learn about the code. I
become more familiar and more comfortable with it. Perhaps my cleaning
did not actually improve the code in any objective sense, but it improved my
understanding of and my facility with that code. The cleaning improved me as
a developer of that code.
The cleaning provides another benefit that should not be understated. By
cleaning the code, even in minor ways, I am flexing that code. And one of the
best ways to ensure that code stays flexible is to regularly flex it. Every little
bit of cleaning I do is actually a test of the code’s flexibility. If I find a small
cleanup to be a bit difficult, I have detected an area of inflexibility that I can
now correct.
Remember, software is supposed to be soft. How do you know that it is soft?
By testing that softness on a regular basis. By doing little cleanups and little
improvements and feeling how easy or difficult those changes are to make.
Creations
The word used in Promise 5 is creations. In this chapter, I have focused
mostly on code, but code is not the only thing that programmers create. We
create designs and documents and schedules and plans. All of these artifacts
are creations that should be continuously improved.
We are human beings. Human beings make things better with time. We
constantly improve everything we work on.
Productivity. That’s quite a topic, isn’t it? How often do you feel that it’s the
only thing that matters at your job? If you think about it, productivity is what
this book and all my books on software are about.
They’re about how to go fast.
And what we’ve learned over the last seven decades of software is that the
way you go fast is to go well.
The only way to go fast is to go well.
So, you keep your code clean. You keep your designs clean. You write
semantically stable tests and keep your coverage high. You know and use
appropriate design patterns. You keep your methods small and your names
precise.
But those are all indirect methods of achieving productivity. Here, we’re
going to talk about much more direct ways to keep productivity high.
1. Viscosity—keeping your development environment efficient
2. Distractions—dealing with every day business and personal life
3. Time management—effectively separating productive time from all the
other junk you have to do
Viscosity
Programmers are often very myopic when it comes to productivity. They
view the primary component to productivity as their ability to write code
quickly.
But the writing of code is a very small part of the overall process. If you
made the writing of code infinitely fast, it would increase overall
productivity only by a small amount.
That’s because there’s a lot more to the software process than just writing
code. There’s at least
Building
Testing
Debugging
Deploying
And that doesn’t count the requirements, the analysis, the design, the
meetings, the research, the infrastructure, the tooling, and all the other stuff
that goes into a software project.
So, although it is important to be able to write code efficiently, it’s not even
close to the biggest part of the problem.
Let’s tackle some of the other issues one at a time.
Building
If it takes you 30 minutes to build after a 5-minute edit, then you can’t be very
productive, can you?
There is no reason, in the second and subsequent decades of the twenty-first
century, that builds should take more than a minute or two.
Before you object to this, think about it. How could you speed the build? Are
you utterly certain, in this age of cloud computing, that there is no way to
dramatically speed up your build? Find whatever is causing the build to be
slow, and fix it. Consider it a design challenge.
Testing
Is it your tests that are slowing the build? Same answer. Speed up your tests.
Here, look at it this way. My poor little laptop has four cores running at a
clock rate of 2.8GHz. That means it can execute around 10 billion
instructions per second.
Do you even have 10 billion instructions in your whole system? If not, then
you should be able to test your whole system in less than a second.
Unless, of course, you are executing some of those instructions more than
once. For example, how many times do you have to test login to know that it
works? Generally speaking, once should be sufficient. How many of your
tests go through the login process? Any more than one would be a waste!
If login is required before each test, then during tests, you should short-
circuit the login process. Use one of the mocking patterns. Or, if you must,
remove the login process from systems built for testing.
The point is, don’t tolerate repetition like that in your tests. It can make them
horrifically slow.
As another example, how many times do your tests walk through the
navigation and menu structure of the user interface? How many tests start at
the top and then walk through a long chain of links to finally get the system
into the state where the test can be run?
Any more than once per navigation pathway would be a waste! So, build a
special testing API that allows the tests to quickly force the system into the
state you need, without logging in and without navigating.
How many times do you have to execute a query to know that it works?
Once! So, mock out your databases for the majority of your tests. Don’t allow
the same queries to be executed over and over and over again.
Peripheral devices are slow. Disks are slow. Web sockets are slow. UI
screens are slow. Don’t let slow things slow down your tests. Mock them
out. Bypass them. Get them out of the critical path of your tests.
Don’t tolerate slow tests. Keep your tests running fast!
Debugging
Does it take a long time to debug things? Why? Why is debugging slow?
You are using TDD to write unit tests, aren’t you? You are writing
acceptance tests too, right? And you are measuring test coverage with a good
coverage analysis tool, right? And you are periodically proving that your
tests are semantically stable by using a mutation tester, right?
If you are doing all those things or even just some of those things, your debug
time can be reduced to insignificance.
Deployment
Does deployment take forever? Why? I mean, you are using deployment
scripts, right? You aren’t deploying manually, are you?
Remember, you are a programmer. Deployment is a procedure—automate it!
And write tests for that procedure too!
You should be able to deploy your system, every time, with a single click.
Managing Distractions
One of the most pernicious destroyers of productivity is distraction from the
job. There are many different kinds of distractions. It is important for you to
know how to recognize them and defend against them.
Meetings
Are you slowed down by meetings?
I have a very simple rule for dealing with meetings. It goes like this:
When the meeting gets boring, leave.
You should be polite about it. Wait a few minutes for a lull in the
conversation, and then tell the participants that you believe your input is no
longer required and ask them if they would mind if you returned to the rather
large amount of work you have to do.
Never be afraid of leaving a meeting. If you don’t figure out how to leave,
then some meetings will keep you forever.
You would also be wise to decline most meeting invitations. The best way to
avoid getting caught in long, boring meetings is to politely refuse the
invitation in the first place. Don’t be seduced by the fear of missing out. If
you are truly needed, they’ll come get you.
When someone invites you to a meeting, make sure they’ve convinced you
that you really need to go. Make sure they understand that you can only afford
a few minutes and that you are likely to leave before the meeting is over.
And make sure you sit close to the door.
If you are a group leader or a manager, remember that one of your primary
duties is to defend your team’s productivity by keeping them out of meetings.
Music
I used to code to music long, long ago. But I found that listening to music
impedes my concentration. Over time, I realized that listening to music only
feels like it helps me concentrate, but actually, it divides my attention.
One day, while looking over some year-old code, I realized that my code
was suffering under the lash of the music. There, scattered through the code
in a series of comments, were the lyrics to the song I had been listening to.
Since then, I’ve stopped listening to music while I code, and I’ve found I am
much happier with the code I write and with the attention to detail that I can
give it.
Programming is the act of arranging elements of procedure through sequence,
selection, and iteration. Music is composed of tonal and rhythmic elements
arranged through sequence, selection, and iteration. Could it be that listening
to music uses the same parts of your brain that programming uses, thereby
consuming part of your programming ability? That’s my theory, and I’m
sticking to it.
You will have to work this out for yourself. Maybe the music really does
help you. But maybe it doesn’t. I’d advise you to try coding without music for
a week, and see if you don’t end up producing more and better code.
Mood
It’s important to realize that being productive requires that you become
skilled at managing your emotional state. Emotional stress can kill your
ability to code. It can break your concentration and keep you in a perpetually
distracted state of mind.
For example, have you ever noticed that you can’t code after a huge fight
with your significant other? Oh, maybe you type a few random characters in
your IDE, but they don’t amount to much. Perhaps you pretend to be
productive by hanging out in some boring meeting that you don’t have to pay
much attention to.
Here’s what I’ve found works best to restore your productivity.
Act. Act on the root of the emotion. Don’t try to code. Don’t try to cover the
feelings with music or meetings. It won’t work. Act to resolve the emotion.
If you find yourself at work, too sad or depressed to code because of a fight
with your significant other, then call them to try to resolve the issue. Even if
you don’t actually get the issue resolved, you’ll find that the action to attempt
a resolution will sometimes clear your mind well enough to code.
You don’t actually have to solve the problem. All you have to do is convince
yourself that you’ve taken enough appropriate action. I usually find that’s
enough to let me redirect my thoughts to the code I have to write.
The Flow
There’s an altered state of mind that many programmers enjoy. It’s that
hyperfocused, tunnel-vision state in which the code seems to pour out of
every orifice of your body. It can make you feel superhuman.
Despite the euphoric sensation, I’ve found, over the years, that the code I
produce in that altered state tends to be pretty bad. The code is not nearly as
well considered as code I write in a normal state of attention and focus. So,
nowadays, I resist getting into the flow. Pairing is a very good way to stay
out of the flow. The very fact that you must communicate and collaborate
with someone else seems to interfere with the flow.
Avoiding music also helps me stay out of the flow because it allows the
actual environment to keep me grounded in the real world.
If I find that I’m starting to hyperfocus, I break away and do something else
for a while.
Time Management
One of the most important ways to manage distraction is to employ a time
management discipline. The one I like best is the Pomodoro Technique.2
2. Francesco Cirillo, The Pomodoro Technique: The Life-Changing Time-Management System
(Virgin Books, 2018).
Pomodoro is Italian for “tomato.” English teams tend to use the word tomato
instead. But you’ll have better luck with Google if you search for the
Pomodoro Technique.
The aim of the technique is to help you manage your time and focus during a
regular workday. It doesn’t concern itself with anything beyond that.
At its core, the idea is quite simple. Before you begin to work, you set a
timer (traditionally a kitchen timer in the shape of a tomato) for 25 minutes.
Next, you work. And you work until the timer rings.
Then you break for 5 minutes, clearing your mind and body.
Then you start again. Set the timer for 25 minutes, work until the timer rings,
and then break for 5 minutes. You do this over and over.
There’s nothing magical about 25 minutes. I’d say anything between 15 and
45 minutes is reasonable. But once you choose a time, stick with that time.
Don’t change the size of your tomatoes!
Of course, if I were 30 seconds away from getting a test to pass when the
timer rang, I’d finish the test. On the other hand, maintaining the discipline is
important. I wouldn’t go more than a minute beyond.
So far, this sounds mundane, but handling interruptions, such as phone calls,
is where this technique shines. The rule is to defend the tomato!
Tell whoever is trying to interrupt you that you’ll get back to them within 25
minutes—or whatever the length of your tomato is. Dispatch the interruption
as quickly as possible, and then return to work.
Then, after your break, handle the interruption.
This means that the time between tomatoes will sometimes get pretty long,
because people who interrupt you often require a lot of time.
Again, that’s the beauty of this technique. At the end of the day, you count the
number of tomatoes you completed, and that gives you a measure of your
productivity.
Once you’ve gotten good at breaking your day up into tomatoes like this and
defending the tomatoes from interruptions, then you can start planning your
day by allocating tomatoes to it. You may even begin to estimate your tasks in
terms of tomatoes and plan your meetings and lunches around them.
14 Teamwork
Open/Virtual Office
It is also important that the members of the team see and interact with each
other very frequently. The best way to achieve this is to put them into a room
together.
In the early 2,000s, I owned a company that helped organizations adopt Agile
development. We would send a group of instructors and coaches to those
companies and guide them through the change. Before each engagement
began, we told the managers to rearrange the office space so that the teams
we would be coaching worked in their own team rooms. It happened more
than once that, before we arrived to begin coaching, the managers told us that
the teams were already much more productive just because they were
working together in the same room.
I’m writing this paragraph in the first quarter of 2021. The COVID-19
pandemic is beginning to wane, vaccines are being rapidly distributed (I will
get my second shot today), and we are all looking forward to a return to
normal life. But the pandemic will leave, in its wake, a large number of
software teams who work remotely.
Working remotely can never be quite as productive as working together in the
same room. Even with the best electronic help, seeing each other on screens
is just not as good as seeing each other in person. Still, the electronic systems
for collaboration are very good nowadays. So, if you are working remotely,
use them.
Create a virtual team room. Keep everyone’s face in view. Keep everyone’s
audio channel as open as feasible. Your goal is to create the illusion of a
team room, with everyone in it working together.
Pairing and mobbing enjoy a lot of electronic support nowadays. It is
relatively easy to share screens and program together across distances.
While you do that, keep the faces and the audio up and running. You want to
be able to see each other while you are collaborating on code.
Remote teams should try, as hard as they can, to maintain the same working
hours. This is very difficult when there is a huge East–West distribution of
programmers. Try to keep the number of time zones in each team as small as
possible, and try very hard to have at least six contiguous hours per day when
everyone can be together in the virtual team room.
Have you ever noticed how easy it is to yell at another driver while driving
your car? This is the windshield effect. When you are sitting behind a
windshield, it’s easy to see other people as fools, imbeciles, and even
enemies. It is easy to dehumanize them. This effect happens, to a lesser
degree, behind computer screens.
To avoid this effect, teams should get together in the same physical room
several times per year. I recommend one week each quarter. This will help
the team congeal and maintain itself as a team. It is very hard to fall into the
windshield trap with someone you ate lunch with and physically collaborated
with two weeks ago.
In this section, we’re going to talk about estimating projects and large tasks
—things that take many days or weeks to accomplish. The estimation of small
tasks and stories is described in Clean Agile.1
1. Robert C Martin, Clean Agile: Back to Basics (Pearson, 2020).
Lies
Most estimates are lies because most estimates are constructed backward
from a known end date.
Consider HealthCare.gov, for example. The president of the United States
signed a bill into law that mandated a specific date when that software
system was to be turned on.
The depth of that illogic is nausea inducing. I mean, how absurd. Nobody
was asked to estimate the end date; they were just told what the end date had
to be—by law!
So, of course, all estimates associated with that mandated date were lies.
How could they be anything else?
It reminds me of a team I was consulting for about twenty years ago. I
remember being in the project room with them when the project manager
walked in. He was a young fellow, perhaps twenty-five. He’d just returned
from a meeting with his boss. He was visibly agitated. He told the team how
important the end date was. He said, “We really have to make that date. I
mean, we really have to make that date.”
Of course, the rest of the team just rolled their eyes and shook their heads.
The need to make the date was not a solution for making the date. The young
manager offered no solution.
Estimates, in an environment like that, are just lies that support the plan.
And that reminds me of another client of mine who had a huge software
production plan on the wall—full of circles and arrows and labels and tasks.
The programmers referred to it as the laugh track.
In this section, we’re going to talk about real, worthwhile, honest, accurate,
and precise estimates. The kind of estimates that professionals create.
Story 1: Vectors
The year was 1978. I was working at a company named Teradyne, in
Deerfield, Illinois. We built automated test equipment for the telephone
company.
I was a young programmer, 26 years old. And I was working on the firmware
for an embedded measurement device that bolted into racks in telephone
central offices. This device was called a COLT—central office line tester.
The processor in the COLT was an Intel 8085—an early 8-bit
microprocessor. We had 32K of solid-state RAM and another 32K of ROM.
The ROM was based on the Intel 2708 chip, which stored 1K × 8, so we
used 32 of those chips.
Those chips were plugged into sockets on our memory boards. Each board
could hold 12 chips, so we used three boards.
The software was written in 8085 assembler. The source code was held in a
set of source files that were compiled as a single unit. The output of the
compiler was a single binary file somewhat less than 32K in length.
We took that file and cut it up into 32 chunks of 1K each. Each 1K chunk was
then burned onto one of the ROM chips, which were then inserted into the
sockets on the ROM boards.
As you can imagine, you had to get the right chip into the right socket on the
right board. We were very careful to label them.
We sold hundreds of these devices. They were installed in telephone central
offices all over the country and—indeed—the world.
What do you think happened when we changed that program? Just a one-line
change?
If we added or removed a line, then all the addresses of all the subroutines
after that point changed. And because those subroutines were called by other
routines earlier in the code, every chip was affected. We had to reburn all 32
chips even for a one-line change!
It was a nightmare. We had to burn hundreds of sets of chips and ship them to
all the field service reps all around the world. Then those reps would have to
drive hundreds of miles to get to all the central offices in their district.
They’d have to open our units, pull out all the memory boards, remove all 32
old chips, insert the 32 new chips, and reinsert the boards.
Now, I don’t know if you know this, but the act of removing and inserting a
chip into a socket is not entirely reliable. The little pins on the chips tend to
bend and break in frustratingly silent ways. Therefore, the poor field service
folks had to have lots of spares of each of the 32 chips and suffer through the
inevitable debugging by removing and reinserting chips until they could get a
unit to work.
My boss came to me one day and told me that we had to solve this problem
by making each chip independently deployable. He didn’t use those words,
of course, but that was the intent. Each chip needed to be turned into an
independently compilable and deployable unit. This would allow us to make
changes to the program without forcing all 32 chips to be reburned. Indeed,
in most cases, we could simply redeploy a single chip—the chip that was
changed.
I won’t bore you with the details of the implementation. Suffice it to say that
it involved vector tables, indirect calls, and the partitioning of the program
into independent chunks of less than 1K each.2
2. That is, each chip was turned into a polymorphic object.
My boss and I talked through the strategy, and then he asked me how long it
would take me to get this done.
I told him two weeks.
But I wasn’t done in two weeks. I wasn’t done in four weeks. Nor was I done
in six, eight, or ten weeks. The job took me twelve weeks to complete—it
was a lot more complicated than I had anticipated.
Consequently, I was off by a factor of 6. Six!
Fortunately, my boss didn’t get mad. He saw me working on it every day. He
got regular status updates from me. He understood the complexities I was
dealing with.
But still. Six? How could I have been so wrong?
Story 2: pCCU
Then there was that time, in the early 1980s, when I had to work a miracle.
You see, we had promised a new product to our customer. It was called
CCU-CMU.
Copper is a precious metal. It’s rare and expensive. The phone company
decided to harvest the huge network of copper wires that it had installed all
over the country during the last century. The strategy was to replace those
wires with a much cheaper high-bandwidth network of coaxial cable and
fiber carrying digital signals. This was known as digital switching.
The CCU-CMU was a complete re-architecture of our measurement
technology that fit within the new digital switching architecture of the phone
company.
Now, we had promised the CCU-CMU to the phone company a couple of
years before. We knew it was going to take us a man-year or so to build the
software. But then we just never quite got around to building it.
You know how it goes. The phone company delayed their deployment, so we
delayed our development. There were always lots of other, more urgent
issues to deal with.
So, one day, my boss calls me into his office and says that they had forgotten
about one small customer who had already installed an early digital switch.
That customer was now expecting a CCU/CMU within the next month—as
promised.
Now I had to create a man-year of software in less than a month.
I told my boss that this was impossible. There was no way that I could get a
fully functioning CCU/CMU built in one month.
He looked at me with a sneaky grin and said that there was a way to cheat.
You see, this was a very small customer. Their installation was literally the
smallest possible configuration for a digital switch. What’s more, the
configuration of their equipment just happened—just happened—to eliminate
virtually all of the complexity that the CCU/CMU solved.
Long story short—I got a special-purpose, one-of-a-kind unit up and running
for the customer in two weeks. We called it the pCCU.
The Lesson
Those two stories are examples of the huge range that estimates can have. On
the one hand, I underestimated the vectoring of the chips by a factor of six.
On the other, we found a solution to the CCU/CMU in one-twentieth the
expected time.
This is where honesty comes in. Because, honestly, when things go wrong,
they can go very, very wrong. And when things go right, they can sometimes
go very, very right.
This makes estimating, one hell of a challenge.
Accuracy
It should be clear by now that an estimate for a project cannot be a date. A
single date is far too precise for a process that can be off by as much as a
factor of 6, or even 20.
Estimates are not dates. Estimates are ranges. Estimates are probability
distributions.
Probability distributions have a mean and a width—sometimes called the
standard deviation or the sigma. We need to be able to express our estimates
as both the mean and the sigma.
Let’s first look at the mean.
Finding the expected mean completion time for a complex task is a matter of
adding up all the mean completion times for all of the subtasks. And, of
course, this is recursive. The subtasks can be estimated by adding up all the
times for the sub-subtasks. This creates a tree of tasks that is often called a
work breakdown structure (WBS).
Now, this is all well and good. The problem, however, is that we are not
very good at identifying all the subtasks and sub-subtasks, and sub-sub-
subtasks. Generally, we miss a few. Like, maybe, half.
We compensate for this by multiplying the sum by two. Or sometimes three.
Or maybe even more.
Kirk: How much refit time until we can take her out again?
Scotty: Eight weeks, sir. But you don’t have eight weeks, so I’ll do it for you
in two.
Kirk: Mr. Scott, have you always multiplied your repair estimates by a factor
of four?
Scotty: Certainly, sir! How else can I keep my reputation as a miracle
worker?3
3. Star Trek II: The Wrath of Khan, directed by Nicholas Meyer (Paramount Pictures, 1982).
Precision
Every estimate is wrong. That’s why we call it an estimate. An estimate that
is correct is not an estimate at all—it’s a fact.
But even though the estimate is wrong, it may not be all that wrong. Part of
the job of estimation, then, is to estimate how wrong the estimate is.
My favorite technique for estimating how wrong an estimate is, is to estimate
three numbers: the best case, the worst case, and the normal case.
The normal case is how long you think the task would take you if the average
number of things go wrong—if things go the way they usually do. Think of it
like a gut-level call. The normal case is the estimate you would give if you
were being realistic.
The strict definition of a normal estimate is one that has a 50 percent chance
of being too short or too long. In other words, you should miss half of your
normal estimates.
The worst-case estimate is the Murphy’s law estimate. It assumes that
anything that can go wrong will go wrong. It is deeply pessimistic. The
worst-case estimate has a 95 percent chance of being too long. In other
words, you would only miss this estimate 1 in 20 times.
The best-case estimate is when everything possible goes right. You eat the
right breakfast cereal every morning. When you get into work, your
coworkers are all polite and friendly. There are no disasters in the field, no
meetings, no telephone calls, no distractions.
Your chances of hitting the best-case estimate are 5 percent: 1 in 20.
Okay, so now we have three numbers: the best case, which has a 5 percent
chance of success; the normal case, which has a 50 percent chance of
success; and the worst case, which has a 95 percent chance of success. This
represents a normal curve—a probability distribution. It is this probability
distribution that is your actual estimate.
Notice that this is not a date. We don’t know the date. We don’t know when
we are really going to be done. All we really have is a crude idea of the
probabilities.
Without certain knowledge, probabilities are the only logical way to
estimate.
If your estimate is a date, you are really making a commitment, not an
estimate. And if you make a commitment, you must succeed.
Sometimes you have to make commitments. But the thing about commitments
is that you absolutely must succeed. You must never promise to make a date
that you aren’t sure you can make. To do so would be deeply dishonest.
So, if you don’t know—and I mean know—that you can make a certain date,
then you don’t offer that date as an estimate. You offer a range of dates
instead. Offering a range of dates with probabilities is much more honest.
Aggregation
Okay, let’s say that we’ve got a whole project full of tasks that have been
described in terms of Best (B), Normal (N), and Worst (W) case estimates.
How do we aggregate them all into a single estimate for the whole project?
We simply represent the probability of each task and then accumulate those
probabilities using standard statistical methods.
The first thing we want to do is represent each task in terms of its expected
completion time and the standard deviation.
Now remember, 6 standard deviations (3 on each side of the mean)
corresponds to a probability of better than 99 percent. So, we’re going to set
our standard deviation, our sigma, to Worst minus Best over 6.
The expected completion time (mu) is a bit trickier. Notice that N is probably
not equal to (W-B), the midpoint. Indeed, the midpoint is probably well past
N because it is much more likely for a project to take more time than we
think than less time. So, on average, when will this task be done? What is the
expected completion time?
It’s probably best to use a weighted average like this: mu = (2N + (B +
W)/2)/3.
Now we have calculated the mu and sigma for a set of tasks. The expected
completion time for the whole project is just the sum of all the mus. The
sigma for the project is the square root of the sum of the squares of all the
sigmas.
This is just basic statistical mathematics.
What I’ve just described is the estimation procedure invented back in the late
1950s to manage the Polaris Fleet Ballistic Missile program. It has been
used successfully on many thousands of projects since.
It is called PERT—the program evaluation and review technique.
Honesty
We started with honesty. Then we talked about accuracy, and we talked about
precision. Now it’s time to come back to honesty.
The kind of estimating we are talking about here is intrinsically honest. It is a
way of communicating, to those who need to know, the level of your
uncertainty.
This is honest because you truly are uncertain. And those with the
responsibility to manage the project must be aware of the risks that they are
taking so that they can manage those risks.
But uncertainty is something that people don’t like. Your customers and
managers will almost certainly press you to be more certain.
We’ve already talked about the cost of increasing certainty. The only way to
truly increase certainty is to do parts of the project. You can only get perfect
certainty if you do the entire project. Therefore, part of what you have to tell
your customers and managers is the cost of increasing certainty.
Sometimes, however, your superiors may ask you to increase certainty using
a different tactic. They may ask you to commit.
You need to recognize this for what it is. They are trying to manage their risk
by putting it onto you. By asking you to commit, they are asking you to take on
the risk that it is their job to manage.
Now, there’s nothing wrong with this. Managers have a perfect right to do it.
And there are many situations in which you should comply. But—and I stress
this—only if you are reasonably certain you can comply.
If your boss comes to you and asks if you can get something done by Friday,
you should think very hard about whether that is reasonable. And if it is
reasonable and probable, then, by all means, say yes!
But under no circumstances should you say yes if you are not sure.
If you are not sure, then you must say NO and then describe your uncertainty
as we’ve described. It is perfectly okay to say, “I can’t promise Friday. It
might take as long as the following Wednesday.”
In fact, it’s absolutely critical that you say no to commitments that you are not
sure of, because if you say yes, you set up a long domino chain of failures for
you, your boss, and many others. They’ll be counting on you, and you’ll let
them down.
So, when you are asked to commit and you can, then say yes. But if you can’t,
then say no and describe your uncertainty.
Be willing to discuss options and workarounds. Be willing to hunt for ways
to say yes. Never be eager to say no. But also, never be afraid to say no.
You see, you were hired for your ability to say no. Anybody can say yes. But
only people with skill and knowledge know when and how to say no.
One of the prime values you bring to the organization is your ability to know
when the answer must be no. By saying no at those times, you will save your
company untold grief and money.
One last thing. Often, managers will try to cajole you into committing—into
saying yes. Watch out for this.
They might tell you that you aren’t being a team player or that other people
have more commitment than you do. Don’t be fooled by those games.
Be willing to work with them to find solutions, but don’t let them bully you
into saying yes when you know you shouldn’t.
And be very careful with the word try. Your boss might say something
reasonable, like “Well, will you at least try?”
The answer to this question is
NO! I am already trying. How dare you suggest that I am not? I am trying as hard as I
can, and there is no way I can try harder. There are no magic beans in my pocket with
which I can work miracles.
You might not want to use those exact words, but that’s exactly what you
should be thinking.
And remember this. If you say, “Yes, I’ll try,” then you are lying. Because you
have no idea how you are going to succeed. You don’t have any plan to
change your behavior. You said yes just to get rid of the manager. And that is
fundamentally dishonest.
Respect
Promise 9. I will respect my fellow programmers for their ethics, standards, disciplines,
and skill. No other attribute or characteristic will be a factor in my regard for my fellow
programmers.
We, software professionals, accept the weighty burden of our craft. We brave
folks are men and women, straight and gay, black, brown, yellow, and white,
republicans, democrats, religious believers, and atheists. We are humans, in
all the many forms and varieties that humans come in. We are a community of
mutual respect.
The only qualifications for entry into our community and for receiving the
acceptance and the respect of each and every member of that community are
the skills, disciplines, standards, and ethics of our profession. No other
human attribute is worthy of consideration. No discrimination on any other
basis can be tolerated.
’Nuff said.
Never Stop Learning
Promise 10. I will never stop learning and improving my craft.
A
Abstraction
clarifying levels of, 201
Dijkstra’s development of, 317–319, 320
polymorphism for, 225
stepdown rule for, 202
underlying, 235–236
Acceptance tests, 14, 18, 246–249
Accidental duplication, 237–238
Accounting, 20–22, 23
Accurate estimates, 360–361, 365–367
Ada, Countess of Lovelace, 5, 280. See also N345TS
Adaptability, inexpensive, 256–257
Ader, Clément, 2
Aeronautics, 1–4
Aggregate estimates, 369–370
Agile methods
readiness via, 258, 259
story points, 275
test automation via, 226
XP as, 14
ALGOL, 6, 7, 282
Algorithms
bowling game scoring, 62–79
bubble sorting, 82–87
to find prime factors, 52–62
incremental derivation of, 61, 82, 86
integer stacking, 36–52
proving correctness of, 316–323
quick sorting, 94–95
sine calculation, 127–138
text wrapping, 95–103
video store rentals, 164–183
API function, 269
Architecture, 143–145
Arrange/Act/Assert test pattern, 103–104, 247, 267
Authenticator interface, 111
Automated tests, 246, 267–269, 340
Automatic Computing Engine (ACE), 6, 280
B
Babbage, Charles, 4–5, 280
Backus, John, 6, 282, 333
Balance, 20
Beck, Kent
design principles of, 226, 239
Extreme Programming, 14
on making it right, 306
on refactoring, 219
on simple design, 224, 228
Behavior
anticipating, 300
refactoring to preserve, 199
vs. structure, 72, 303–304, 307, 311
Behavior-Driven Development (BDD), 104–105, 107
Bergensten, Jens, 289
Best work, 306–315
Boeing, 4
Boundaries, 143–144
Bowling score computation, 62–79, 104, 203
Branches, toggles vs., 338–340
Bubble sort algorithms, 87, 164
Bugs. See also Debugging
accumulation of, 314
as found by QA, 266, 267
quick patches for, 302
quick tests for, 218
as unacceptable, 264
Builds
continuous, 249, 341–342
speeding up, 347–348
Business analysis (BA) teams, 246, 248
Business rules
decoupling databases from, 148
isolated from tests, 268–269
testing the, 150
user interface vs., 141, 142
C
C language, 284
Cascading Style Sheets (CSS) code, 255–256
Catastrophes, 291–293
Cayley, Sir George, 2
CCU-CMU, 364
Certainty
cost of, 358, 370
fragile tests and, 139, 140
London school of, 141
Changes
fear of making, 31–32, 219, 263
inexpensive, 256–257, 262
keeping up with, 373–374
learning to prepare for, 277
as software necessity, 303–304, 307
to user interface, 268
via git, 335
via good structure, 308–310
Chicago school of TDD, 142
Chief Technical Officer (CTO), 252
Church, Alonzo, 5, 62
Circle of Life, 14–15
Clean Agile (Martin), 358
Clean Architecture (Martin), 143, 145
Clean Code (Martin), 198
Clean Craftsmanship (Martin), xxv–xxvi
COBOL, 282, 283
Code
building blocks of, 321
cleaning of, 32–34, 306, 342, 345
decoupled, 30–31, 232
decoupling tests from, 161–184, 218, 232
designing testable, 30
do no harm to, 302–303
early example of, 233
expressive, 233–235
fear of broken, 31–32, 219
fragile tests and, 160–161
by the inexperienced, 9–10
proving correctness of, 316–323
structural vs behavioral changes, 72
test and code communication, 236
test coverage of, 230
two schools of TDD, 140–142
Coding
history of, 280–286
specific tests for generic code, 59, 82, 183–184
test suite holes, 28–29
and three laws of TDD, 23–26
transformations, 185
Collaborating components, 144–145
Collaborative programming
discipline of, 242–244
improvement via, 356
remotely, 357
to replace yourself, 272, 273
as XP practice, 14, 17–18
COLT (Central Office Line Tester), 361
Commitments, 368–369, 370–372
Competence, fearless, 263–264
Compilation tests, 36–52
Complexity, incremental, 92
Components, 144–145
Computers
history of, 4–9, 280–286
in pop culture, 287
Concurrent Versions System (CVS), 333, 334, 336
Constant to variable transformation, 187–188
Continuous build, 249, 341–342
Continuous integration, 337
Continuous learning, 277–278, 373–374
Controllers
in component hierarchy, 144–145
and GUI input tests, 153–154
in Humble Object test pattern, 159
testing, 153
Countess. See Ada
Coupling. See also Decoupling
fragile test problem and, 139
high- and low-level details, 152, 218, 224–225
minimizing, 184
one-to-one, 161–162
Courage, 271–278
Coverage, test, 229–231, 239, 343–344
Craftsmanship
in the aeronautics industry, 4
defined, xxi–xxii
as developer responsibility, 303
foundation disciplines of, 14–18, 293
history of computer, 4–9
by programmers, 9–10
provable code as, 324–325
Craig, Philip, 108
D
Dahl, Ole-Johann, 283
Database testing, 148–150
DaVinci, Leonardo, 2
Debugging
avoiding, 199
commitment to total, 264, 266
minimizing, 26–27
by QA, 265–267
speeding up, 349
Decidability problem, 62
Decision matrix, Eisenhower’s, 310–312
Decision problem, 280
Decomposition, functional, 322–323
Decoupling
code, for testability, 232
databases, 148
high- and low-level details, 236, 308
importance of, 224–225
production code/test, 161–184, 218
Degenerate tests
solutions via, 83–84, 98–99
stairstep tests as, 66
as TDD rule, 53–54
Dependencies
dependency inversion principle, 143
eliminating unnecessary, 112–113, 114
management of, via SOLID, 309–310
with production data, 116
Dependency Rule, 144–145
Deployment
and best work, 307
continuous, 340–341
readiness for, 258
speeding up, 349
for startups, 304–305
and structural changes, 308
Derivation, incremental. See Incremental derivation
Design
changes to, 232
four principles of, 239
fragility as problematic, 160–161
high test coverage, 230–232
holes, 28–29
minimal size in, 239
outside-in vs. inside-out, 141–142
prompt fixing of flaws, 70–71
simple, 17, 224–226, 228–229
smells, 308, 309
test design, 160–184
YAGNI, 226–228
Development, test-driven. See Test-driven development (TDD)
Digital Equipment Corporation, 284
Digital switching, 364
Dijkstra, Edsger, 7, 282, 316–323
Diophantine equation, 62
Disciplines
acceptance tests, 18, 248–249
Alan Turing on, 281
in Clean Craftsmanship, xxv
collaborative programming, 17–18
failing at, 255, 256
focus and, 13
refactoring, 16, 217–221
self- vs. other-imposed, xxv
simple design, 17
test-driven development (TDD), 15–16
YAGNI, 227
Distractions, managing, 349–354
Double-entry bookkeeping, 20–22
Dummies, 111–114
Duplication
accidental, 238
duplicate tests, 66
minimizing code, 237–238
E
Eisenhower’s decision matrix, 310–312
Elements variable, 47, 51
Emotional stress, 351
“Endo-Testing: Unit Testing with Mock Objects” (Freeman, McKinnon,
Craig), 108
Engineering practices, 14
Enigma machine, 5
Entangled software, 224
Enumeration, 317, 321
Errors, 70, 71
Estimates, honest and fair, 274–275, 358–372
Ethics
in Clean Craftsmanship, xxvi
oath of, 293
single vs. multiple, xxii
Expectations, 252
Explanatory variables, 204
Exploratory testing, 267
Expressivity
code language for, 233–234
as design principle, 239
and real vs. accidental duplicates, 238
role of code and tests in, 236
External services, 108, 110
Extract Field refactoring, 204–216
Extract Method refactoring, 68, 201–202, 208
Extract Superclass refactoring, 214
Extract Variable refactoring, 202–204
Extreme Programming Explained (Beck), 228
Extreme Programming (XP), 13–14, 219, 226, 258
F
Factors variable, 54–55
Failure tests
algorithms via, 82
input/output (IO), 108
for integer sort algorithm, 83
in integer stack example, 36–52
multiple solutions to, 88
in prime factors example, 52–62
Fakes, 124–126
Fear
of broken code, 31–32, 219
fearless competence, 263–264
Fibonacci kata, 191–194
Finite state machines, 105–107
First-in-last-out (FILO) behavior test, 50, 52
FitNesse, 247, 337
Flaws, design, 70–71
Flexibility, 141, 142, 345
Flight, history of, 1–4
Flow, the, 352
Fortran, 282, 319–320
Fowler, Martin
Chicago school of TDD, 142
on refactoring, 164, 198, 221
on simple design, 229
Fragile tests
certainty and, 139, 140
as design problem, 160–161
mocks and, 139–140
refactoring to prevent, 218
test specificity to prevent, 59, 232
user interface links and, 268
Fragility, software, 308
Freeman, Steve, 108, 141
Fun, 30
Functional decomposition, 322–323
Functional programming, 62, 196
Functions
extract refactoring of, 201
function to nil transformation, 186–187
harm to, 299–301
minimizing size of, 239
misplaced responsibility of, 71
replacing comments with, 75
testing query, 148–150
tests as constraining, 127
tests tied to implementation of, 120
uncertainty principle in, 132
and underlying abstraction, 235
vs. cleaning code, 306
G
Gateway interface, 149
Generalization, 59, 183–184
Generic code, 59
Git, 334–336
Git reset —hard, 220
Given-When-Then (GWT), 104, 105, 248
Goethe, Johann Wolfgang von, 21
GOTO statements, 320, 321, 322
Green step
as transformative, 185
in video rental algorithm, 165, 166, 167, 169, 173
GUI (graphic user interface)
inputs, 153–154
testing, 150–153
H
Haines, Corey, 229
Handwashing, 12–13
Harm, do no
to function, 299–301
to society, 296–297
to structure, 302–303
via easy changeability, 303–304
via tests, 305–306
HealthCare.gov, 297–298
Hendrickson, Chet, 224
History of flight, 1–4
Honest estimates, 274–275, 360–364, 370–372
Hooks, 226, 227–228
Hopper, Grace, 6, 282
Horn, Michael, 289
Humble Object pattern, 157–160
I
IBM, 6, 7, 282, 283
If statements, 58, 60, 202
Immobility, 309
Implement Interface command, 114
Implementation, function, 120
Improvement, continuous, 262, 342–346
Incremental derivation
as solution to getting stuck, 95, 98
solving failing tests as, 86
TDD as technique for, 61, 82
writing tests via, 103
Induction, 317, 321
Inflexible software, 256–257
Inside-out design, 142
Integration tests, 182–183
Integrity
via high productivity, 346–354
via relentless improvement, 342–346
via small cycles, 328–342
Interactors
in component hierarchy, 144–145
GUI, 152, 154
Iteration, 321
J
JBehave, 104
Jennings, Jean, 282
JUnit Recipes (Rainsberger and Stirling), 154
K
Keogh, Liz, 105
Knight Capital Group, 299–300, 302
Koss, Bob, 62–79
L
Languages
continuous learning of, 277, 373–374
expressive code, 233–234
first programming, 7, 282–284
Law of trichotomy, 89
Learning
continuous, 277–278, 373–374
from mentors, 278, 286
via cleaning code, 345
Lies, estimates as, 359
Login test, 348
London school of TDD, 141
Loops, unwound, 57
Lovelace, Countess of, 5. See also Ada
M
MacKinnon, Tim, 108, 120
Managers
commitments to, 370–372
emotional confrontation by, 276
information for, xxvi
perspective of, 252
Manual testing, 267, 270
Marick, Brian, 305
Matz, Chris, 104
McCarthy, John, 282
Mean and sigma, 365–367
Meetings, 350
Memory, core, 6
Mentoring, 278, 286
Merges, 332, 334, 336, 337
Meszaros, Gerard, 110, 154
Minimizing size, 237, 239
Misplaced responsibility, 71
Mob programming, 242, 244, 357
Mock objects
defined, 109
fragility of testing with, 139
in test doubles, 108–110, 121–123
when to use, 126
Mojang, 289
Mood, 351
Music, 350–351
Mutation testing, 344
N
Nil to constant transformation, 187
“No,” saying, 276, 371
North, Dan, 104
N345TS. See Ada
Nygard, Kristen, 283
O
Object-oriented models, 64
One-to-one correspondences, 161–162, 218
Open offices, 356–358
Optimistic locking, 333
Outside-in design, 141
P
Pair programming, 242, 244, 352, 357
Parent-child algorithms, 299
Patterns, test, 155–160
Perspective, 252
Pessimistic locking, 334
Polymorphism, 110, 225
Pomodoro technique, 242, 352–354
Power Peg code, 299, 302
Precise estimates, 360–361, 367–369
Presenters
in component hierarchy, 144–145
defined, 151
GUI, 151–152
in Humble Object test pattern, 157–159
Pride, 33–34
Prime factors algorithm, 52–62, 185
Probability distributions, 365
Procedural Programming, 62
Production data, 116
Production releases, 340
Productivity
high, 346–354
inexpensive adaptability in, 256–257
perpetual readiness in, 258–259
quality standard for, 254–256
stable, 259–260
Program Evaluation and Review Technique (PERT), 369–370
Programmer tests, 182
Programmers
continuous improvement by, 262
as craftsmen, xxi
demographics of, 284–286
Edsger Dijkstra, 316–323
as heroes and villains, 289–290
history of, 5–9, 280–286
inexperience of, 9–10
as nerds and saviors, 287–288
one path for, xxii
perspective of, 252
pop culture depictions of, 287–288
responsibility of. See Responsibility
senior, 297
senior/junior partnerships, 244
as stakeholders, 312–313
teaching new, 278
undisciplined, 256
Programming
collaborative, 17–18, 242–244, 356
decidability problem in, 62
as finite state transitions, 107
functional, 196
history of, 280–286
London school of, 141
modular, 330–332
procedural and functional, 62
structured, 319–322
Proof, repeatable, 316
Property testing, 140
Pryce, Nat, 141
Public class stack, 36–52
Public trust, xxiv–xxv
Push function, 41
Q
Quality
continuously improved, 262
definition of poor, 255
extreme, 264
fearless competence on, 263–264
readiness for QA review, 265–267
test automation, 267–269
testing the user interface, 269–270
Quality assurance (QA) department
acceptance test readability by, 246
readiness for review by, 265–267
tests by, 248
Queries, testing, 148–150
Quick sort algorithm, 94–95
Quick tests, 218, 324
R
Rainsberger, J.B., 154
Readiness, 258–259
RealAuthenticator, 111–112, 116
Recursion, 100
Red/Green/Refactor loop
cleanup via, 39
as continuous, 219
as fourth TDD law, 34–35
as functional decomposition, 323
moral foundation for, 312
as refactor cycle, 199
transformative priorities and, 185
Red step
as additive, 185
in video rental algorithm, 165, 166, 168–169, 172
Refactor step
as restorative, 185
in video rental algorithm, 165–166
Refactoring
basic toolkit for, 200–217
basis of, 232–233
cleaning as, 345
code generalization via, 184
consistent, 219
defined, 72, 199
disciplines of, 16, 217–221
as fourth TDD law, 34–35
and fragile tests, 139
IDE’s Extract Method, 68
Red/Green/Refactor loop, 34–35, 39, 185
restorative, 185
as Rubik’s cube solutions, 215
in video rental algorithm, 164–183
vs. placing hooks, 227
as XP practice, 14
Refactoring (Fowler), 198, 221
Remote work, 357
Rename refactoring, 200–201
Respect, 372–373
Responsibility
and collaborative programming, 243
of continuous learning, 374
to do no harm, 296–303
misplaced, 71
of programmers, 290–293, xxiv
recognition of, 306
to replace yourself, 273
single responsibility principle, 201, 214
of software developers, xxiv
as a stakeholder, 312–313
Revision Control System, 333
Rigidity, 308
Ritchie, Dennis, 284
Rochkind, Marc, 333
Role Models, 289
RSpec, 104
Rubik’s cube, 215
S
Science, software, 323–324
Scrum, 258
Selection, 321
Selection to iteration transformation, 190
Self-Shunt pattern, 156
Semantic stability, 344–345
Semmelweis, Ignaz, 13
Sequence, 321
Setup method, 103
Short cycles, 334–336
Sieve of Eratosthenes, 53, 61
Sigma, 366, 369
Simplicity
design, 14, 17, 224–226
fundamental aspect of, 235
incremental derivation, 61, 82, 86
rule of test, 95–96, 103
via expressive language, 233–234
via technology advances, 227
Simula 67, 7, 283
Sine calculation, 127–138
Single responsibility principle, 201, 214, 239, 246
Small cycles, 328–342
Smalltalk, 108
Society
dependence on software, 9, 291, 306
harm to, 297–298
responsibility to, 292
views on programmers, 286–288
Software
catastrophes, 291–293
changes to, 224, 226, 303–304
failed estimates on, 358
GUI, 150–151
invention of, 5
messy, harmful, 302
pervasiveness of, 290–291
proofs, 316–323
public trust in, xxiv–xxv
rigid, fragile, immobile, 256, 308–309, 345
as a science, 323–324
similarity to accounting, 23
tests as documentation of, 26–28
untangled, 224
SOLID principles, 17, 143, 309–310
Sorting algorithms
bubble sort, 82–87, 95
quick sort, 94–95
Source code
continuous build of, 249
control, 328–334
dependencies, 143, 153, 225
editors, 237
harm to, 303
Source Code Control System (SCCS), 333
Specificity, 183–184, 202
Spies
fragility of testing with, 138, 139
mocks as, 121
in sine calculation algorithm, 133–135
in test doubles, 118–120
for testing across boundaries, 144
Stabilization sprints, 258
Stack of integers, writing a, 36–52
StackTest, 37
Stairstep tests, 66
Stakeholders, 312–313
Standards
in Clean Craftsmanship, xxvi
courage, 271–278
defined, 252
differing perspectives on, 252
productivity, 253–260
provable code, 324–325
quality, 261–270
responsibility and, xix
single vs. multiple, xxii
Startups, 304
State/Transition diagram, 105–106
Statement to recursion transformation, 189–190
Stepdown rule, 202
Stevenson, Chris, 104
Stirling, Scott, 154
Story points, 275
Structure
behavior vs., 303–304, 307, 311
and collaborative programming, 244
duplication and, 237
good, 308–310
harm to, 302–303
Structured programming, 7, 319–323
Stubs
and business rule isolation, 269–270
spies as, 118–119
in test double process, 115–117
Subversion, 334
Subway turnstiles, 105–106
Sullenberger, Chesley, 4
Surviving mutations, 344
T
Taylor series, 128, 130, 132–133
Teamwork
collaborative programming, 242–244
continuous integration and, 337
covering for each other, 272
honest/fair estimates as, 358–372
knowledge sharing via, 356
open/virtual offices, 356–358
respect, 372–373
Technological advances, 227–228
Teradyne, 361
Test doubles
boundary crossing and, 144
dummies, 111–114
fakes, 124–126
mock objects, 108–110, 121–123
spies, 118–120
structure of, 109–110
stubs, 115–117
Test-driven development (TDD)
algorithm simplicity via, 61–62, 78–79
Arrange/Act/Assert pattern in, 104
BDD as, 105
benefits of, 26–34
continuous integration and, 337
as design technique, 312
documentation in, 26–28
as double-entry bookkeeping, 23
examples, 36–79
as foundational discipline, 14, 15–16
four laws of, 23–35
fragile test problem and, 160–161
getting stuck with, 95–103
incremental complexity in, 92
long-term mastery of, 148
necessity for, 305–306
proofs of, 323–325
test suite holes, 28–29
two schools of, 140–143
uncertainty principle, 126–139
Test-Specific Subclass pattern, 155–156
Tests
acceptance, 18, 246–249
Arrange/Act/Assert pattern, 103–104
assertions and spy behavior, 123
automated, 246, 267–269, 340
code and test communication, 236
continuous running of, 249
coverage by, 229–231, 239, 343–344
for databases, 148–150
decoupling production code from, 161–184, 218
degenerate, 53–54, 57, 66
deleting stairstep, 66
design of, 160–184
excluding production data from, 116
as finite state transitions, 107
fragile, 59, 139–140, 160–161
functions constrained by, 127
for GUIs, 150–154
historic, 226
integration, 182–183
mutation testing, 344
patterns, 100, 155–160
to prevent harm, 305–306
programmer, 182–183
as refactoring discipline, 218, 219–220, 221
rule of simplicity for, 95–96
semantic stability of, 344–345
specificity of, for generic code, 59, 82, 183–184
speeding up, 348–349
streamlining, 113
as theories, 323
transformations for passing, 184–196
vs. placing hooks, 227
written by/for BA and QA, 248–249
Thompson, Ken, 284
ThoughtWorks, 142
Tichy, Walter, 333
Time management, 352–354
Toggles, branches vs., 338–340
Toyota, 300–301, 302
Transformation priority premise, 195–196
Transformations, 184–196
Transistors, 6
Transition triplets, 106
Traversal code, 237–238
Trim call, 101
Turing, Alan, 5–6, 62, 280–281
U
UML diagrams, 64, 73, 144
Uncertainty principle, 126–139, 140
Unconditional to selection transformation, 188–189
Underflow exceptions, 47
UNIX, 284
Untangling software, 224
Unwound loops, 57, 60
User interface
and the Chicago school, 142
and the London school, 141
product as changed via, 304, 308
and test automation, 268–269
testing the, 269–270, 348
User perspective, 252
V
Vacuum tube computer, 6
Values
testing, 140
value to list transformation, 189
value to mutated value transformation, 190–191
Variables, 202–204, 235
View model, 151
Virtual teamwork, 356–358
Viscardi, Stacia, xix
Viscosity, 347–349
Volkswagen, 289, 296, 297
Von Neumann, Jon, 282
W
Wake, Bill, 103
While statements, 58, 60, 202
Whitehead, Gustave, 3
Work, best, 306–315
Work breakdown structure (WBS), 366–367
World War II, 3, 5
Wrap call, 100
Wright brothers, 3
X
XRay class, 155
XUnit Test Patterns (Meszaros), 110, 154
Y
YAGNI, 226–228
Z
Zuse, Konrad, 5
Code Snippets
Many titles include programming code or configuration examples. To optimize
the presentation of these elements, view the eBook in single-column,
landscape mode and adjust the font size to the smallest setting. In addition to
presenting code and configurations in the reflowable text format, we have
included images of the code that mimic the presentation found in the print
book; therefore, where the reflowable format may compromise the
presentation of the code listing, you will see a “Click here to view code
image” link. Click the link to view the print-fidelity code image. To return to
the previous page viewed, click the Back button on your device or app.