0% found this document useful (0 votes)
19 views3 pages

Don't Play Developer Testing Roulette - How To Use Test Coverage

The document discusses the importance of code coverage in software testing and why partial code coverage should not be accepted. Code coverage provides a checklist to focus testing efforts and identify untested code where bugs could be hiding. While complete coverage is difficult, using coverage tools and techniques like mocking dependencies can help achieve high coverage without unnecessary risk.

Uploaded by

jordan1412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views3 pages

Don't Play Developer Testing Roulette - How To Use Test Coverage

The document discusses the importance of code coverage in software testing and why partial code coverage should not be accepted. Code coverage provides a checklist to focus testing efforts and identify untested code where bugs could be hiding. While complete coverage is difficult, using coverage tools and techniques like mocking dependencies can help achieve high coverage without unnecessary risk.

Uploaded by

jordan1412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Don't Play Developer Testing Roulette:

How to Use Test Coverage


ROBERT V. BINDER

OCTOBER 14, 2019

Suppose someone asked you to play Russian Roulette. Although your odds of surviving are 5 to 1 (83
percent), it is hard to imagine how anyone would take that risk. But taking comparable risk owing to
incomplete software testing is a common practice. Releasing systems whose tests achieve only
partial code coverage--the percentage of certain elements of a software item that have been
exercised during its testing--is like spinning the barrel and hoping for the best, or worse, believing
there is no risk. This post is partly a response to questions I'm frequently asked when working with
development teams looking for a definitive answer to what adequate testing means: Is code coverage
really useful? Is 80 percent code coverage enough? Does it even matter?

A software testing coverage report identifies untested code and, more importantly, is essential for
designing compact and effective test suites. Coverage should never be used as the primary criterion
of test completeness, but it should always be checked to reveal test design misunderstandings and
omissions. Although this idea is neither new nor controversial--see Brian Marick's, "How to Misuse
Code Coverage" published in the 1999 proceedings of the 16th International Conference on Testing
Computer Software--the extent to which it is unknown, misunderstood, or ignored continues to
surprise me. This post, the first of two, explains how code coverage is computed, what it means, and
why partial coverage is an unnecessary risk. In the second post, I offer a definition of done that uses
adequate coverage and six best practices to routinely achieve high consistency and effectiveness in
developer-conducted testing. Together, they explain in practical terms how to achieve an effective
software testing strategy using coverage. Links are provided to results from research and practice that
support key elements of the case for testing coverage.

What is Code Coverage?

Code coverage is the percent of certain elements of a software item that have been exercised during
its testing. There are many ideas about which code elements are important to test and therefore many
kinds of code coverage. Some are code-based (white-box) and some are behavior-based (black-
box). Open source and commercial coverage tools are available for all popular and many specialized
programming languages. A coverage tool typically adds trace statements to the software item under
test (SIUT) before it is tested. This instrumented SIUT is built and a suite of test cases runs it, which
produces a trace of elements executed. Next, the coverage tool analyzes this trace to report which
elements were executed. A coverage report is specific to both the tests used and tested version of the
software item. The instrumented code is typically discarded.

For example, if a certain test suite causes 400 out of 500 SIUT source code statements to execute at
least once, we say this test suite achieves 80 percent statement coverage. A test suite that causes
each code block contingent on a conditional expression to be executed at least once is said to
achieve decision or branch coverage. For certification of aircraft software systems, the FAA
requires modified condition decision coverage (MCDC), an advanced form of decision coverage.
Coverage is never the number of tests run or the percentage of the number of tests run that pass or
fail. When coverage is used without qualification, statement coverage of code is usually assumed.
When used without a giving a percentage, 100 percent is usually assumed. Measuring code coverage
is useful for developer testing, but much less so for integration or system scope testing.

Why Should We Care about Code Coverage?

Designing practical and effective software tests is a fascinating and often frustrating puzzle. Think of
code coverage as a checklist for places to look for bugs. Just like looking for your misplaced keys,
you'll probably try (usually without success) the most obvious places first: the pockets of the coat you
last wore, the kitchen counter, etc. You wouldn't skip those places, but neither would you conclude
your keys are irreplaceably lost if they are not there.

We need focus when we test software because truly exhaustive software testing of any kind is
impossible. There are an astronomically large number of execution conditions where bugs and
vulnerabilities can hide, so exhaustive testing would require an astronomically large number of tests.
Even very extensive test suites can reach only a very tiny subset of all execution conditions.
Moreover, unless SIUT code is truly atrocious, a very high proportion of its execution conditions will
perform correctly, even when bugs are present. Simply executing a buggy statement is not
sufficient. The data it uses must be such that the bug is triggered and then produces an observable
failure. The Y2K software pandemic is just one example of the interplay of these criteria. When code
that had worked without trouble for decades tried to process a century-less date of the new
millennium, it would crash and/or produce incorrect results. So why, you may ask, should we give
code coverage any credit?

That's because we have a zero chance of revealing bugs in code that isn't tested. Some might
ask, But doesn't testing exercise a code unit as a whole? Yes, but it is very easy to run lots of tests
and not exercise all the elements of a code unit. Recall that 100 percent statement coverage means
that every line of code has been executed in a test run at least once. It is however, the barest
minimum adequacy criterion--it is the dirt floor of white-box testing strategies--because it is a certainty
that not executing a buggy statement means there is no chance of revealing a bug in that statement.
This is why a test suite must at least execute (cover) every statement to have a slim chance (but not a
guarantee) of revealing latent bugs.

The essential task of test design is to wisely select from a practically infinite number of possible tests,
knowing that bugs are present but not knowing exactly which test cases will reveal them. To choose
wisely, therefore, test design tries to identify test cases that have a good chance of revealing bugs
within available time and resources. Coverage helps to focus and make sure we have tried certain
code elements at least once. Even though exhaustive code-based testing cannot reveal the omission
of a necessary feature, testing evaluated with coverage can often lead to insights that reveal
omissions.

The coverage reports from a comprehensive test suite can also reveal vulnerabilities and malicious
code. Uncovered code may indicate an intentionally created logic bomb or "Easter egg" exploit. This
situation is most likely to occur with software of unknown provenance (SOUP). Likewise, uncovered
code can reveal dead or unused code that can be overwritten with malicious code, a vulnerability that
should be addressed.

Code Coverage Roulette

Many developers see 100 percent coverage (of any kind) as impractical and therefore claim that
some lesser number is acceptable. I've heard and seen partial coverage thresholds touted many
times over 30 years: 70 percent, 80 percent, or 8­­­­­5 percent. The exact number is not important.
What is important is that a partial coverage threshold provides a ready excuse for arbitrary and weak
testing. The justification for a partial coverage threshold usually goes something like this:
Execution of code elements often depends on conditions that the SIUT does not control, such as
runtime exceptions, messages from external systems, or shared data state. If the conditions that
cause such a blockage cannot be achieved using the SIUT's public interface, they can't be easily
executed, hence covered. While it may be possible to get blocked code to execute using additional
test code, stubs, mocks, or drivers, the incremental coverage is not worth the additional work.

Therefore, many developers are adamant that requiring 100 percent statement coverage for all test
suites is a goal that only Dilbert's pointy haired boss would insist on. Thus 80-ish percent statement
coverage is touted as a practical compromise.

I have never seen any evidence to support a particular partial coverage threshold. I have, however,
seen many cases where such a threshold has become the acceptable test completion criterion,
regardless of whether there are actual technical obstacles or not. Using a partial coverage as a
criterion for test completeness usually results in a superficial exercise of easily reachable elements,
without exercising interaction of the covered and blocked code. Worse, it is an open invitation to
scrimp on testing even when there is no blockage.

What can we do about blockages? When a blockage results from a hard-to-control dependency, it is
good evidence of a latent bug or a code smell (i.e., dead code, hard-coded "fixes," poor assumptions,
lack of robustness) that would benefit from refactoring. For benign blockages, readily available tools
for test mocking can achieve conditions that trigger exceptions, return values from third-party libraries,
or stand-in for unavailable software or hardware (See, for example, this exchange on StackOverflow.).
Certain features of object-oriented languages can also obstruct coverage. For example, methods of
abstract base classes and private methods cannot be directly invoked. Test design patterns to
address this kind of blockage are well-established.

Hard blockages that cannot be resolved do occur, especially in legacy systems. Code that cannot be
tested owing to a true blockage is arguably more likely to be buggy because the SIUT does not
completely control the behavior of the blocked code. In this case, skipping verification of blocked code
is playing coverage roulette. So, while you may not be able to test it, you certainly should verify it by
other means. Team inspection of the blocked code is often simple and effective. Pay special attention
to imagining all the conditions under which it may be executed and how it will respond. This analysis
is like pondering a chess move--consider how the runtime environment might respond to each
execution alternative.

Looking Ahead

So, now you know why accepting partial coverage is playing code coverage roulette. In the second
part of this post, I'll describe specific testing practices that result in full coverage and consistently
effective testing.

You might also like