0% found this document useful (0 votes)
19 views69 pages

03 DBC PBT

Uploaded by

kevinlin13588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views69 pages

03 DBC PBT

Uploaded by

kevinlin13588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

SE320

Software Verification &


Validation
Week 3: Design by Contract and Property-Based Testing

Fall 2024
Quick survey
• Please grab a paper survey, fill it out quickly, and return it to the front
of the room
• There are pencils for those who don’t have a writing implement.
Design by Contract
Open-Box Testing
Review: What is open-box testing?

Open-box Testing
Testing with awareness and knowledge of the software’s internals.
Why Open-box Testing
If we can already do closed-box testing, why should we do open-box
testing?
• Why aren’t we done after we fix bugs from closed-box testing?
• Our earlier discussions made appeals like:
• If we haven’t run some part of the program (i.e., not full code coverage), we
don’t know what it does
• It might reveal missing equivalence partitions or boundary conditions
• For the whole program to work, the pieces should work, too
• These justifications are a little handwavy
• Can we make them more precise?
Modular Software Correctness
When we reason about software, we implicitly treat it like Legos:
1. We assume the pieces work (i.e., aren’t cracked)
2. We assume if we put together correct pieces in the correct way, we
should get the desired result
Modular Software Correctness
Modular Software Correctness
Modular Software Correctness
Building on Shaky Foundations
• Can you build Legos using (some) broken pieces?
• Yes, but it’s harder
• Can you build software from broken pieces?
• Yes, but it’s harder
Building on Shaky Foundations
public class Foo { How do we know this is correct?
private int x = 0;
// Sets value to y
public void set(int y) { Foo f = new Foo();
x = y; …
} f.set(3);
// Adds 1 to value
public void inc() { f.inc();
x++;
} assert f.get() == 4;
// Retrieves value
public int get() {
return x;
}
}
Building on Shaky Foundations
public class Foo { How do we know this is correct?
private int x = 0;
// Sets value to y
public void set(int y) { Foo f = new Foo();
x = y; …
} f.set(3);
// Adds 1 to value // Now f’s value is 3
public void inc() { f.inc();
x++;
} assert f.get() == 4;
// Retrieves value
public int get() {
return x;
}
}
Building on Shaky Foundations
public class Foo { How do we know this is correct?
private int x = 0;
// Sets value to y
public void set(int y) { Foo f = new Foo();
x = y; …
} f.set(3);
// Adds 1 to value // Now f’s value is 3
public void inc() { f.inc();
x++; // Now f’s value is 3+1=4
} assert f.get() == 4;
// Retrieves value
public int get() {
return x;
}
}
Design by Contract
Design by Contract
Designing software so each method has its assumptions
(preconditions) and guarantees (postconditions) explicitly specified,
formalizing the responsibilities of each component
• If a dynamic precondition check fails, the caller has done something
wrong
• If a dynamic postcondition check fails, the method has done
something wrong
• Invariants are properties that should hold at specific times/places in
the code
Preconditions
Preconditions
A precondition is a condition that is required to be true before some
code executes, in order for that code to work properly
Examples of preconditions include:
• An argument is non-null
• A collection is non-empty
• An integer is within some range
• If one boolean argument is true, then another argument must be…
Preconditions are almost input validation
Using Preconditions
• Preconditions are usually checked at the start of a method
• If a precondition check fails, then:
• The immediate caller passed invalid arguments (direct input)
• You may conclude there is a bug in the caller (or its caller…)
Preconditions vs. Input Validation
• Both preconditions and general input validation check the input
• Both “fail” when the input is unacceptable
• Key distinction: purpose and consequence:
• Preconditions check that the current software system is passing valid inputs
• General input validation checks that external sources provided valid inputs
• Precondition failures à Bug in your program, end the program!
• General input validation failure à External source gave invalid input, but this
is an expected possibility so produce some kind of recoverable error
• Common to use exceptions for both validations, which obscures this
key difference!
Postconditions
Postconditions
A postcondition is a condition that should be true after some code
completes
Examples of postconditions include:
• The return value is non-null
• The return value is within some range
• Particular state was updated
• The value of the counter is now the old value plus one
Postconditions are after-the-fact checks that your code did what it
should
Using Postconditions
• Postconditions are checked at the end of a method, just before
returning/finishing
• Some systems encourage writing them next to preconditions, but handle
them differently
• If a postcondition check fails, then:
• Maybe the preconditions were too weak, and should have checked something
else
• Otherwise the method’s implementation does not work as intended!
Invariants
Invariants
Invariants are facts that should be true at specific points in the code
• A class invariant is something that should be true whenever no
method of the class (instance) is executing
• A field is non-null
• An index field points to a valid array offset
• A loop invariant is a condition that should be true before and after
every iteration of a loop
• An assertion is a condition that should be true at some other point in
the code
Building on Shaky Foundations
method Remainder(x:int, y:int)
returns (q:int,r:int)
requires y > 0 On the left is a method in
ensures q*y+r == x
{ Dafny, which uses Design by
r := x;
Contract ideas to prove
q := 0;
while y <= r
programs correct
invariant q*y+r == x • requires specifies a precondition
{ • ensures specifies a postcondition
r := r - y; • invariant specifies a loop
q := 1 + q; invariant
}
• assert checks an additional
assert r < y; property
return q,r;
}
Modular Reasoning with Design by Contract
Premise of Design by Contract
If a unit of code satisfies its postconditions for every input satisfying
its preconditions, it is correct.

Code can be written assuming that if it provides input matching


preconditions, operations it calls will return satisfying their
postconditions.

Applied at the level of a whole program, you can reason about your
program’s correctness one piece at a time.
Design by Contract vs. Verification
Design by Contract is derived in part from formal verification methods
• DbC does dynamic checking
• Lets you only specify what you care to
• Side-steps tricky issues with specifying contracts on private state
• Only get checking on the inputs you try
• Verification checks statically
• Requires full specification of everything
• This is why it’s not yet practical for most software
• But provides guarantees about behavior on all valid inputs (when successful)
Design by Contract in Common Programming
Languages
• While Dafny checks all contracts statically, most implementations
need to check dynamically
• Dynamic checks slow down code
• So in practice, few contracts are checked in Release builds, but most
are turned on in Debug builds
• The mechanisms vary widely
Contracts in C#
• .NET has a very robust implementation of contracts: CodeContracts
• Provides the System.Diagnostics.Contracts namespace
• Contract.Requires for preconditions
• Contract.Ensures for postconditions
• Contract.Assert for assertions
• Each is present only if certain compile-time constants are defined
• Roughly, think about C/C++ #define
• Special forms for exceptions, collections
• Special support for writing object invariants in a [ContractInvariant] method
• Special tool rewrites Contract.Ensures into actual checks, and inserts calls to the
invariant method
Rewriting Postconditions
A tool rewrites:
Pair<Integer,Integer> remainder(Integer x, Integer y) {
Contract.Requires(y>0);
Contract.Ensures(return.left*y+return.right==x);
...
}
into
Pair<Integer,Integer> remainder(Integer x, Integer y) {
Contract.Requires(y>0);
Pair<Integer,Integer> retval = ...;
...
Contract.Ensures(retval.left*y+retval.right==x);
return retval;
}
Rewriting Continued
• The tool also has to make choices about handling multiple return
points
• The tool also has to add code to catch and re-throw exceptions for
exceptional postconditions
• i.e., Contract.EnsuresOnThrow<IllegalArgument...>(x.foo > 0);
• These are the same issues you face when manually checking
postconditions
• This is why preconditions are the most commonly used
Contracts in Java
• Java lacks conditional compilation, leaving 3 options for each contract
1. Don’t check it at all
• Sensible for very expensive checks
• Risks out-of-date status
• Can still help w/ design and documentation
2. Always check, e.g., with Google Guava’s Precondition class
• Still works for validation you already need, like non-null checks
• Also works without special support: throwing IllegalArgumentException, etc.
3. Use Java’s native assert, which can be toggled at JVM launch
Java Assertions
• Java’s assert statement can be used to implement DbC, or on its
own
• Assertions must be enabled by pasing -ea or -enableassertions
to the JVM (the java command)
• Can also enable on per-package or per-class basis:
• -ea:edu.drexel.se320
• Two forms:
• assert e; fails if e evaluates to false
• assert e : "message" fails as above, but gives the specified error message
Google Guava Preconditions
• Google’s Guava libraries include
com.google.common.base.Preconditions
• The method assortment resembles JUnit’s, including:
• checkArgument(boolean)
• checkNotNull(T)
• checkState(boolean)
• checkPositionIndex(int, int)
• Each has various overloads for messages or format strings
• No postcondition checks, because postconditions are more complicated
• They didn’t want to require rewriters
Exploiting Contracts for Testing
• A key part of every test is the oracle, which tells us whether the
observed behavior is correct.
• Design by Contract (with dynamic checking) extends the oracle for
every test
• In addition to checking the per-test oracle, also checks if the software
functions as expected along the way
• Were all preconditions satisfied?
• Were postconditions?
• Do all public methods establish object invariants?
• Only after those checks pass is the per-test oracle consulted
Exploiting Contracts for Open-box Testing
• Where do the specifications for open-box testing come from?
• Memory
• Maybe some Javadoc comments
• Talking to a colleague
• Never the top-level spec
• Contracts offer detailed specifications for internal components
When a Contract Check Fails, and You Don’t
Know Why
If you’re convinced the failure isn’t a bug in your implementation:
• Maybe you wrote the wrong assertion
• Ever find yourself debugging some code, wondering why the property you
were thinking about didn’t hold, when you realized it shouldn’t?
• Maybe you lost track of which code you should have written
• i.e., maybe you wrote the right contracts, and working code, but the code you
wrote is for a different or overly narrow specification
Contracts vs. Assertions
• Assertions alone are a subset of Design by Contract
• Possibly an implementation detail
• The primary distinction between Design by Contract and “using
assertions” is how they are used
• Using arbitrary assertions here and there is still helpful, but diagnosing
failures is hard
• Design by Contract is a systematic approach
• Design by Contract makes it easier to diagnose failures
Benefits of Design by Contract
Like most interesting things in life, Design by Contract is a trade-off.
Advantages include:
• More thorough testing of invariants
• Amplifies the bug-finding power of every test
• Guides thinking about how a program works
• Encourages you to write precise (partial) specifications
• Studies have shown simply writing down semi-formal specifications like this reduces
bugs
• Guides open-box testing
• Internal components aren’t in the top-level spec
• Contracts serve a similar role
Costs of Design by Contract
Disadvantages include:
• More code to write
• You can get the contracts themselves wrong
• Tooling becomes more complex
• Might need a rewriter for postconditions
• Need to ensure assertions are turned on for Java
• Runtime costs
• Slower if enabled in Release
• Presence in Debug builds can still lead to Heisenbugs
A Caveat with Public APIs
Let’s say you’re implementing a JSON message service: requests enter
as JSON.
Your main dispatch method uses an assertion to enforce the
precondition that the argument is valid JSON.
What happens?
Any time your service receives an invalid input, the program crashes
because that precondition fails!
Preconditions are intended for internal use. Public APIs (web service,
command line flags, etc.) should use standard validation checks and
give good error messages.
A Parallel with Exceptions
• Uncaught exceptions terminate your program
• If a user interface or public API (e.g., website) gives your program bad
input, you don’t throw an exception
• Instead, you present a nice error message, visually or by sending back
appropriate error codes, error pages, etc.
• If an internal call provides bad input, that indicates a bug in your
program
• This is a time it’s often reasonable to crash the program
• Preconditions are similar: they are for internal calls, not validation of
external input
Property-Based Testing
Random Testing
• Writing test cases manually, however you do it, can be effort-intensive
• Experience has shown that throwing completely random nonsense if
often surprisingly effective at revealing faults
• Natural consideration: why not just test everything by generating
inputs randomly until something breaks?
Issues with Naïve Random Testing
• What’s the input space?
• Requires either precise specification or coarse approximation (e.g., parameter
type)
• How do we choose things from the input space?
• What’s the probability distribution? Don’t want to test mostly invalid inputs
• How do we construct inputs?
• Integers are easy. Pointer data structures?
• How do we know if the behavior was correct?
• Still requires oracles, which have to be matched to the random inputs!
So Why Care?
• Generating random tests is hard, so why bother?
• Manual testing is highly effective but time consuming
• Computers can, in principle, generate many many many … many tests quickly
• If the computers generate good tests with minimal developer involvement,
developer time can be spent elsewhere (testing more things, new features)
• Even if generation is expensive, can save and reuse tests
• Specifying properties instead of examples is nice
30-Second History of Random Testing
• In the 1980s, probabilistic analysis and simple testing showed that
random testing could find more bugs than manual (partition-based)
testing
• e.g., Duran & Ntafos IEEE TSE 1984, Hamlet & Taylor, IEEE STVA 1988
• This didn’t translate into practical success
• Made strong, arguably unrealistic assumptions about the partitions
• Test programs consumed numbers, or arrays of numbers; no solutions for
inputs with more structure
The Core Challenge of Random Testing
• Early theoretical results were skewed because the assumptions about
the distribution of bugs among some notion of partition and
likelihood of hitting a given partition were mismatched
• This can be mitigated by controlling the sampling more carefully
• But we still need a good way to impose this control
The Secondary Challenge of Random Testing
• If the computer is generating all inputs, having a human generate all
oracles… defeats the purpose of the exercise
• Need to either:
• Have the computer generate the test oracle, or
• Have a way to automatically reuse the same oracle across many inputs
• The first option is fundamentally hard for theoretical reasons
• The second option is already easy to use
Solving Challenge 2: Theories
• Theories – a.k.a. parameterized unit tests – are a widely used feature
of modern testing frameworks
• Standard unit tests include a specific input and an oracle
• Note that standard unit test methods have 0 arguments
• But it’s common to notice that many tests differ only in the choice of
input
• Theories / PUTs are a test framework feature for running the same
test logic – including oracle – with many inputs
Parameterized Tests in Junit 5
• JUnit 5 includes support for parameterized tests
• JUnit 4 called them theories
• Instead of marking a test with @Test, use @ParameterizedTest
• The test does not hard-code inputs; instead it takes arguments
• The test is matched with a source of inputs, and is executed on all
inputs from that source
Parameterized Tests in Junit 5

@ParameterizedTest
@ValueSource(strings =
{ "racecar", "radar", "able was I ere I saw elba" })
void palindromes(String candidate) {
assertTrue(StringUtils.isPalindrome(candidate));
}
Parameterized Test Sources
• @ValueSource (as in last slide) takes an array of constant primitive
values
• @NullSource passes null
• @EmptySource passes an empty collection as appropriate
• @EnumSource yields all values of the argument enum type
• @MethodSource gets inputs from a named method of the same class
that returns a stream
• @ArgumentsSource does the same with a class
• @CsvFileSource reads inputs from a CSV file…
Method Sources

@ParameterizedTest
@MethodSource(”range")
void testFoo(int candidate) {
assertTrue(foo(candidate));
}
static IntStream range() {
return IntStream.range(0, 20).skip(10);
}
Method Sources
• Return a Java Stream containing inputs
• As with value sources, test method is invoked with every value in the
stream
• Unlike canned values, can write code to produce every value tested,
rather than listing them explicitly
• Potentially much more concise
Demo: To the terminal!
• https://github.com/Drexel-se320/examples
• If you’ve already cloned, “git pull origin main” will get latest version
• ./gradlew test --tests TestMinTheory
Solving Challenge 1&2: Property-Based
Testing
• Parameterized unit tests were prefigured by a solution to both
challenges, by about 5 years: Property-based testing
• PBT uses general-purpose oracles called properties
• Similar to parameterized tests
• So-named because general oracles generally check properties of outputs
rather than exact output values
• PBT also uses a highly-stylized combination of built-in random
generators for primitive types, plus rich libraries for building custom
generators from scratch or by combining and transforming existing
generators.
Basic Property-Based Test
@Property
public boolean checkMin(@ForAll int x, @ForAll int y) {
return min(x,y) <= y && min(x,y) <= x;
}
• @Property marks a property
• @ForAll controls generation of inputs
• x and y inputs generated independently, here with default int generators
• Property returns a boolean indicating if the check succeeded
• Can also return void and just assert
• Checks a general property true of all x and y
Demo: To the terminal!
• https://github.com/Drexel-se320/examples
• If you’ve already cloned, “git pull origin main” will get latest version
• ./gradlew test --tests PropertyTesting
PBT vs. Theory
• Same test could be written as a @Parameterized test with a value
source; what’s the difference?
• Theory runs the same inputs every time (highly repeatable)
• PBT generates fresh inputs every time (variable)
• But: over time tests more inputs than a theory
• For integer inputs: not a huge difference vs. a theory source produced
by good equivalence partitioning.
• For more interesting inputs, equivalence partitioning becomes harder,
and random generation *can* be more important
PBT vs. Theory (Part 2)
• Major difference is perspective
• Parameterized tests still focus on human choice of input
• Easy to write tests that only make sense for a couple inputs
• Easy to understand what’s happening in the tests
• Built-in to many standard testing frameworks
• Property-based tests want you to avoid thinking about specific inputs
• Emphasize describing general properties or classes of behavior
• Discourages tunnel vision on specific values
• Easier to map to equivalence partitions
• Can be harder to think about, has a learning curve and some pitfalls, which
we’ll discuss
Writing Your Own Generators
@Provide
public Arbitrary<BinaryTree<Integer>> binarytrees(){
return Arbitraries.integers().list().ofMaxSize(10).map(BinaryTree#fromIntList);
}

• Arbitrary<T> is the type of things that generate random choices of T


• @Provide marks a method tests can ask for (next slide)
• Arbitraries.integers().list() is a built-in generator of lists, with a fluent
API to add constraints
• map applies a function to every generated element
• Here changing each list into a binary tree
Using Your Own Generators
@Property
public boolean rebalancePreservesKeys(@ForAll(“binarytrees”) BinaryTree<Integer> b){
Set<Integer> originalKeys = b.keys();
b.rebalance();
return b.keys().equals(originalKeys);
}

• @ForAll takes arguments to use custom generators


• Similar to @MethodSource, but again generating fresh inputs every time
Implementing Generators
• Implementing random test generation from scratch is hard
• Another reason the early experiments never caught on
• PBT emphasizes libraries and combinators for building Arbitrary
generators by combining and transforming existing Arbitraries
• Result: Exceptionally rare to touch a random number directly!
• Heavy use of functional programming idioms
• map takes a Function<T,U> to turn Arbitrary<T> into Arbitrary<U>
• E.g., transform lists into trees
• filter takes a predicate on the generated type, and drops unsatisfactory values
• E.g., filter to only positive integers, trees of a particular size
Probability of Generating Good Inputs
• Generation and filtering don’t always work well together
• Arbitraries.integers().filter(x -> isPrime(x) && x > 200) will rarely
succeed
• Instead, almost certainly better to write an Arbitrary that emits only (or at
least mostly) prime numbers
Assumptions
• Jqwik API also includes Assume.that(boolean):
• Used for one-off filtering: simple filters for a specific test
• Included at the start of a property, throws a special exception when
the boolean is false, which tells the framework to ignore that input
• If it throws out too many values, the framework will assume it failed
• If the property only really checks 3 values when you asked it to check 1000, is
it really giving the confidence you’d expect?
maxDiscardRatio and Statistics.collect
• The throw-away rate is controlled by a flag in @Property
• @Property(maxDiscardRatio = n) means the property can pass as long
as at least 1 valid (succeeding) input is generated for every n
discarded inputs
• Good for fine-tuning, over-use just makes your tests fail slower
• Statistics.collect can help you see how broadly your inputs are being
generated
• Important double-check for any custom generator
Shrinking
• A secondary feature of property-based testing, present in most, but
not all, implementations, is input shrinking
• Essentially, the framework has systematic approaches to take an input
that fails the property, and try to produce a smaller/simpler input that
also fails
• Usually produces much easier-to-understand failures
• Occasionally shrunken input fails for a different reason
@Example
• A natural question: what if we want to make sure a couple known-
critical values are really tested every single time?
• @Example marks a method to treat as a traditional unit test
So What Properties Do We Write?
• Validity
• Check that an operation produces valid result
• E.g., BST insertion preserves sortedness
• Postconditions
• Check that an operation did the right thing
• Sometimes requires taking a generated input and tweaking it just enough that
we can predict an exact output
• E.g., generate a BST, key, and value, and check that inserting that key-value is right
• Metamorphic properties
• Check that two equivalent ways of doing the same thing yield the same
results
So What Properties Do We Write?
• Model-based properties
• Implement an abstract model of the system under test (e.g., a finite state
machine)
• Generate sequences of operations
• Check that the abstract model predicts the behavior of the code you’re
testing – that running the operations on the real code and model gives same
results
• These and other great insights are in a paper How to Specify It! by
John Hughes (coauthor of the original QuickCheck paper)
• The Jqwik author has translated it from Haskell to Java:
https://johanneslink.net/how-to-specify-it/
PBT in the Real World
• PBT was originally popularized for Haskell, following the original
QuickCheck paper
• Quickly became a dominant testing methodology for Erlang
• Used primarily for telecommunications systems and distributed systems
• Now available for virtually every mature PL
• jqwik for Java (see homework 2…)
• ScalaCheck, FsCheck (F#), fast-check & JSVerify (JS), Hypothesis (Python)…
PBT in Industry
• PBT has been heavily used for testing systems that demand high-
reliability
• Automobile software (AUTOSAR specs)
• Not radios… but things like anti-lock brakes
• File synchronizers (i.e., DropBox)
• Distributed databases
• Smart contracts
• Robotics software
• And more

You might also like