Topic
Property-Based Testing
42
Доверяй, но проверяй (Trust, We recommend writing unit tests for your
but verify) functions. You do that by thinking about
typical things that might be a problem,
Russian proverb based on your knowledge of the thing
you're testing.
There's a small but potentially significant problem lurking in that para‐
graph, though. If you write the original code and you write the tests, is it
possible that an incorrect assumption could be expressed in both? The
code passes the tests, because it does what it is supposed to based on your
understanding.
One way around this is to have different people write tests and the code
under test, but we don't like this: as we said in Topic 41, Test to Code, one
of the biggest benefits of thinking about tests is the way it informs the
code you write. You lose that when the work of testing is split from the
coding.
Instead, we favor an alternative, where the computer, which doesn't
share your preconceptions, does some testing for you.
Contracts, Invariants, and Properties
In Topic 23, Design by Contract, we talked about the idea that code has
contracts that it meets: you meet the conditions when you feed it input,
and it will make certain guarantees about the outputs it produces.
There are also code invariants, things that remain true about some piece
of state when it's passed through a function. For example, if you sort a
list, the result will have the same number of elements as the original—the
length is invariant.
Once we work out our contracts and invariants (which we're going to
lump together and call properties) we can use them to automate our test‐
ing. What we end up doing is called property-based testing.
Tip 71 Use Property-Based Tests to Validate Your Assumptions
As an artificial example, we can build some tests for our sorted list. We've
already established one property: the sorted list is the same size as the
original. We can also state that no element in the result can be greater
than the one that follows it.
We can now express that in code. Most languages have some kind of
property-based testing framework. This example is in Python, and uses
the Hypothesis tool and pytest, but the principles are pretty universal.
Here is the full source of the tests:
proptest/sort.py
from
importgiven
hypothesis
importhypothesis.strategies assome
@given
(some.lists(some.integers()))
deftest_list_size_is_invariant_across_sorting(a_list):
original_length = len(a_list)
a_list.sort()
assertlen(a_list) == original_length
@given
(some.lists(some.text()))
deftest_sorted_result_is_ordered(a_list):
a_list.sort()
fori inrange(len(a_list) - 1):
asserta_list[i] <= a_list[i + 1]
Here's what happens when we run it:
$
pytest
sort.py
======================= test session starts
========================
...
plugins: hypothesis-4.14.0
sort.py ..
[100%]
===================== 2 passed in 0.95 seconds
=====================
Not much drama there. But, behind the scenes, Hypothesis ran both of
our tests one hundred times, passing in a different list each time. The lists
will have varying lengths, and will have different contents. It's as if we'd
cooked up 200 individual tests with 200 random lists.
Test Data Generation
Like most property-based testing libraries, Hypothesis gives you a mini‐
language for describing the data it should generate. The language is based
around calls to functions in the hypothesis.strategies module, which we
aliased as some, just because it reads better.
If we wrote:
@given
(some.integers())
Our test function would run multiple times. Each time, it would be passed
a different integer. If instead we wrote the following:
@given
lambdax:
(some.integers(min_value=5, max_value=10).map(
x * 2))
then we'd get the even numbers between 10 and 20.
You can also compose types, so that
@given
(some.lists(some.integers(min_value=1), max_size=100))
will be lists of natural numbers that are at most 100 elements long.
This isn't supposed to be a tutorial on any particular framework, so we'll
skip a bunch of cool details and instead look at a real-world example.
Finding Bad Assumptions
We're writing a simple order processing and stock control system (be‐
cause there's always room for one more). It models the stock levels with a
Warehouse object. We can query a warehouse to see if something is in
stock, remove things from stock, and get the current stock levels.
Here's the code:
proptest/stock.py
classWarehouse:
def__init__(self, stock):
self.stock = stock
defin_stock(self, item_name):
return(item_name inself.stock) and
(self.stock[item_name] > 0)
deftake_from_stock(self, item_name, quantity):
ifquantity <= self.stock[item_name]:
self.stock[item_name] -= quantity
else
:
raiseException("Oversold {}"
.format(item_name))
defstock_count(self, item_name):
returnself.stock[item_name]
We wrote a basic unit test, which passes:
proptest/stock.py
deftest_warehouse():
"shoes"
wh = Warehouse({ : 10, "
hats" "umbrellas"
: 2, : 0})
assertwh.in_stock(
"shoes"
)
assertwh.in_stock(
"hats"
)
assertnotwh.in_stock(
"umbrellas"
)
"shoes"
wh.take_from_stock( , 2)
assertwh.in_stock(
"shoes"
)
"hats"
wh.take_from_stock( , 2)
assertnotwh.in_stock(
"hats"
)
Then we wrote a function that processes a request to order items from
the warehouse. It returns a tuple where the first element is either "ok" or
"not available", followed by the item and requested quantity. We also
wrote some tests, and they pass:
proptest/stock.py
deforder(warehouse, item, quantity):
ifwarehouse.in_stock(item):
warehouse.take_from_stock(item, quantity)
return( "ok"
, item, quantity )
else
:
return( "not available"
, item, quantity )
proptest/stock.py
deftest_order_in_stock():
"shoes"
wh = Warehouse({ : 10, "
hats" "umbrellas"
: 2, : 0})
status, item, quantity = order(wh, "
hats"
, 1)
assertstatus
== "
ok"
assertitem
== "
hats"
assertquantity == 1
assertwh.stock_count(
"hats"
) == 1
deftest_order_not_in_stock():
"shoes"
wh = Warehouse({ : 10, "
hats" "umbrellas"
: 2, : 0})
status, item, quantity = order(wh, "
umbrellas"
, 1)
assertstatus
== "
not available"
assertitem
== "
umbrellas"
assertquantity == 1
assertwh.stock_count(
"umbrellas"
) == 0
deftest_order_unknown_item():
"shoes"
wh = Warehouse({ : 10, "
hats" "umbrellas"
: 2, : 0})
status, item, quantity = order(wh, "
bagel"
, 1)
assertstatus
== "
not available"
assertitem
== "
bagel"
assertquantity == 1
On the surface, everything looks fine. But before we ship the code, let's
add some property tests.
One thing we know is that stock cannot appear and disappear across our
transaction. This means that if we take some items from the warehouse,
the number we took plus the number currently in the warehouse should
be the same as the number originally in the warehouse. In the following
test, we run our test with the item parameter chosen randomly from "hat"
or "shoe" and the quantity chosen from 1 to 4:
proptest/stock.py
@given
(item "shoes"
= some.sampled_from([ , "
hats"
]),
quantity = some.integers(min_value=1, max_value=4))
def
test_stock_level_plus_quantity_equals_original_stock_level(item,
quantity):
"shoes"
wh = Warehouse({ : 10, "
hats" "umbrellas"
: 2, : 0})
initial_stock_level = wh.stock_count(item)
(status, item, quantity) = order(wh, item, quantity)
ifstatus == "ok"
:
assertwh.stock_count(item) + quantity ==
initial_stock_level
Let's run it:
$
pytest
stock.py
. . .
stock.py:72:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _
stock.py:76: in
test_stock_level_plus_quantity_equals_original_stock_level
(status, item, quantity) = order(wh, item, quantity)
stock.py:40: in order
warehouse.take_from_stock(item, quantity)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _
self = <stock.Warehouse object at 0x10cf97cf8>, item_name =
'hats'
quantity = 3
def take_from_stock(self, item_name, quantity):
if quantity <= self.stock[item_name]:
self.stock[item_name] -= quantity
else:
>
raise
Exception(
"Oversold {}".format(item_name))
E Exception: Oversold hats
stock.py:16: Exception
---------------------------- Hypothesis ----------------------
------
Falsifying example:
test_stock_level_plus_quantity_equals_original_stock_level(
item='hats', quantity=3)
It blew up in warehouse.take_from_stock: we tried to remove three hats
from the warehouse, but it only has two in stock.
Our property testing found a faulty assumption: our in_stock function
only checks that there's at least one of the given item in stock. Instead we
need to make sure we have enough to fill the order:
proptest/stock1.py
defin_stock(self, item_name, quantity):
» return(item_name inself.stock) and
(self.stock[item_name] >= quantity)
And we change the order function, too:
proptest/stock1.py
deforder(warehouse, item, quantity):
» ifwarehouse.in_stock(item, quantity):
warehouse.take_from_stock(item, quantity)
return( "ok"
, item, quantity )
else
:
return( "not available"
, item, quantity )
And now our property test passes.
Property-Based Tests Often Surprise You
In the previous example, we used a property-based test to check that
stock levels were adjusted properly. The test found a bug, but it wasn't to
do with stock level adjustment. Instead, it found a bug in our in_stock
function.
This is both the power and the frustration of property-based testing. It's
powerful because you set up some rules for generating inputs, set up
some assertions for validating output, and then just let it rip. You never
quite know what will happen. The test may pass. An assertion may fail. Or
the code may fail totally because it couldn't handle the inputs it was
given.
The frustration is that it can be tricky to pin down what failed.
Our suggestion is that when a property-based test fails, find out what pa‐
rameters it was passing to the test function, and then use those values to
create a separate, regular, unit test. That unit test does two things for you.
First, it lets you focus in on the problem without all the additional calls
being made into your code by the property-based testing framework.
Second, that unit test acts as a regression test. Because property-based
tests generate random values that get passed to your test, there's no guar‐
antee that the same values will be used the next time you run tests.
Having a unit test that forces those values to be used ensures that this bug
won't slip through.
Property-Based Tests Also Help Your Design
When we talked about unit testing, we said that one of the major benefits
was the way it made you think about your code: a unit test is the first
client of your API.
The same is true of property-based tests, but in a slightly different way.
They make you think about your code in terms of invariants and con‐
tracts; you think about what must not change, and what must be true.
This extra insight has a magical effect on your code, removing edge cases
and highlighting functions that leave data in an inconsistent state.
We believe that property-based testing is complementary to unit testing:
they address different concerns, and each brings its own benefits. If
you're not currently using them, give them a go.
Related Sections Include
Topic 23, Design by Contract
Topic 25, Assertive Programming
Topic 45, The Requirements Pit
Exercises
Exercise 31 (possible answer)
Look back at the warehouse example. Are there any other properties that
you can test?
Exercise 32 (possible answer)
Your company ships machinery. Each machine comes in a crate, and each
crate is rectangular. The crates vary in size. Your job is to write some code
to pack as many crates as possible in a single layer that fits in the delivery
truck. The output of your code is a list of all the crates. For each crate, the
list gives the location in the truck, along with the width and height. What
properties of the output could be tested?
Challenges
Think about the code you're currently working on. What are the proper‐
ties: the contracts and invariants? Can you use property-based testing
framework to verify these automatically?