A Formal Analysis of Iterated TDD: Abstract
A Formal Analysis of Iterated TDD: Abstract
We demonstrate that outside this context iterated TDD will exhibit chaotic
behavior, implying unpredictable messy amount of code churn. We argue that
the research finding of “ineffective” iterated TDD found by earlier researches are
due to missing this context, while the findings of “effective” iterated TDD is due
to accidentally falling into the context or simply placebo.
1.2. Narrative. The previous definition does not talk about any formal goals for
iterative TDD. Hence, we formalize the objective of TDD as follows:
2010 Mathematics Subject Classification. Primary 68N30 ; Secondary 37B99, 68Q99, 93C99,
93D99, 68Q30.
Key words and phrases. Software ; Testing ; Test Driven Development; Formal Specifica-
tion; Equivalence Class Partitioning; Dynamical Systems ; Chaotic Dynamics; System Stability;
Lyapunov Exponent .
Hemil Ruparel : Dedicated to my parents and family without their presence we are nothing.
To ensure that we end up having a formally verifiable software in each step and
in the end when all the “scenarios” are exhausted. Another “optional” objective is
given as to “Improve the implementation design”.
Note that it is not defined anywhere between one implementation to other what
can be “improvement”.
This is not a good starting point to formally analyze the methodology, as success
metrics are not possible to be created on top of it. It is very imprecise, and open
to interpretations.
In this paper we propose a formal methodology and provably demonstrate how
“provably correct software” can emerge with clear metric of “amount of code churn”
was done to attain it over the iterations - albeit in a very narrow context.
This practice we shall call formal Iterated TDD. We are calling the “canon”
practice followed in the industry as “Iterated TDD” for reasons which would be
apparent shortly.
2. Definitions
We would need some definitions to formalize the (Iterated) TDD pseudo algo-
rithm.
2.1.2. Completeness. For functions which are well behaved this makes some sense.
But even for well behaved functions this is not a good enough approximation.
Take a nice function like f (x) = x , identify function, but one can not define
this function by keeping on adding pairs of specification values.
A much more interesting function like f (x) = sin(x) is much harder to describe,
although we can always define them pointwise, and that would ensure the resulting
“sampling” looks much and much like the target function, one must understand
infinite pairs would be required to specify sin(x). Even with ℵ0 points specified,
there would be set of infinite family of functions who are not sin(x) but just gives
off the exact same value at all those specific points. This has a name, called
pointwise convergence [4].
A FORMAL ANALYSIS OF ITERATED TDD 3
Outside those fixed set of points the family of functions can take arbitrary values,
and thus specification via point pairs arguably pose a problem.
Luckily, for software we can do much better, which is the topic for the next
section.
S : Iˆ → Ô
where Iˆ :=< xi > is the input vector while Ô :=< yj > is the output vector.
These vectors are defined not in physics sense, but pure mathematical sense. The
only change between the pointwise defined function vs specified software is about
being “Computable” [5].
2.3. Software Test. A “Software” test is defined as a higher order function [6] :
T : t < Iˆt , St , Ôe >→ (St (Iˆt ) := Ôt ) = Ôe
In plain English, a test is comprise of Input vector Iˆt , the software under test
St , and the expected output vector Ôt , it runs the St with the input, and checks
whether or not the expected output Ôt matches against the actual output of the
system St (Iˆt ) := Ôt , and it simply checks whether or not Ôt = Ôe , hence the
range of the test is Boolean.
A software test, then contains a single point specification for the desired Soft-
ware, this is the test vector [7].
A software test does not need to be computable in general. Unfortunately, any
automated test, by definition needs to be computable. This also pose a problem
for testing in general. Example of a test that is not computable [8] [9] can be : a
human reporting software has hung or went into infinite loop. This is impossible to
do algorithmically, unless we bound the time. This sort of scenarios comes under
Oracles in computation [10].
Treating multiple iterations of the same cycle as a single cycle, we can evidently
say given the nodes of the graph is finite, there would be finite (but incredibly
high) number of flow paths in the graph.
Given individual EQCP are depicting unique paths in the control flow graph
(CFG), then coupling said to exists between EQCPs Ex with path Px and Ey with
path Py if and only if Px ∩ Py 6= ∅.
That is, if paths [13] Px , Py has some common nodes, then Ex , Ey are coupled.
In fact we can define the amount of coupling using similarity measures now, most
easy one would be a Jaccard distance [20] like measure:
|Px ∩ Py |
C(Ex , Ey ) = (2.1)
|Px ∪ Py |
This essentially says - “Measure of the coupling between two equivalent classes is
the amount of code shared between them relative to all the unique code path they
have together”. We need to understand that even code shared for good reason, like
applying DRY [21] and not doing it even methodically also would create coupling
via this definition. Any shared function between two EQCP would mean coupling
exists. As we shall see Coupling becomes a key phenomenon while analyzing the
stability of software under Iterative TDD.
formal non iterated model is one single shot transformation of bunch of specifica-
tions points into code via transforming them into EQCP.
2.8. Practical Correctness of TDD. The correctness of TDD for a practical
application hinges on the following :
(1) Is the specification complete enough ( to take care of all the equivalent
classes )?
(2) Is the specification non contradictory ?
That it is impossible to get (1,2) done together follows from Godel’s Incomplete-
ness theorems [22], but that is applicable to any specification, not only Software.
Thus this argument should not be admissible as failure of TDD in itself.
Now we ignore the notion of contradiction and focus on completeness and sta-
bility when one tests gets added at one time ( iterated or incrementally changed
specification TDD ).
2.9. Practical Completeness of TDD Spec. The business specification should
be such that the formal specification of all possible Equivalence classes must be
drawn from it. As it is bounded by O(2B ) - this itself is not remotely possible. To
understand how this bound works, a simple program unix cat has more than 60
branches [23]. The equivalent class specification of this program is bounded by 260
and the total stars in the universe are estimated to be 2 × 1024 for comparison.
But this huge numbers does not disprove the crux of TDD, it only points to the
fact that formal EQCP is a practical challenge and to be handled pragmatically,
probably via reducing the specification scope further and further.
along with:
At most it can impact every equivalence class and at least it adds O(2K ) classes
and hence tests. So, at the best case scenario, the total branches would become
O(2B +2K ) = O(2B ) given B >> K. The complexity increases, but not drastically,
unless B = O(K).
3.5. Impact of Coupling. What happens when there is coupling? Instead of
adding the terms, now because of dependency, the terms gets multiplied. Thus,
with coupling the resulting complexity becomes O(2B × 2K ) = O(2B+K ) . The
delta change results in exponential growth even if B 6= O(K).
This is a problem.
If the implementation of those equivalent class was such a way that there was
minimal coupling, then less classes would be impacted via this step in the iteration.
But this is not a principle of TDD in the first place in any form in any practical
application of software development. In fact software principle like DRY and
modular programming would mandate code sharing, and hence there would always
be some coupling.
3.6. Iterated TDD as a Dynamical System. At this point we can formally
represent iterated TDD as a dynamical system [26].
As discussed, this EQCP merging culminates into a lot of those equivalence
classes being thrown out, new classes being created - a fixed point iteration on the
abstract space of the EQCP itself, which we can now formally define as follows:
E1 = τ (∅, δ0 )
This is how formally iterated or incremental TDD looks like. These equations
now depicts a dynamical, complex system with am initial boundary value or start-
ing condition.
3.7. Stability Space. While EQCP space is nice to visualize what is happening
for real in terms of Software Specification and Test cases, it is not descriptive
enough to translate into numbers so that we can track the trajectory of the Dy-
namical System.
How much change in the EQCP space is happening on each iteration of iterated
TDD? It is impossible to comprehend that in the EQCP space.
A FORMAL ANALYSIS OF ITERATED TDD 9
For gaining this insight we would need a metric, that would define how stable
the system is over the iterations in terms of retaining past EQCPs - how much
code remained same between iterations.
We define the stability metric as follows :
|En ∩ En+1 |
Σn+1 = 1 − ; Σn ∈ Q ∩ (0, 1) (3.4)
|En ∪ En+1 |
The stability metric Σ also depicts a metric space [27] with distance between
two stability points a, b ∈ Σ as defined to be : d(a, b) = |a − b|.
3.7.1. Stable Point : 0. Observe the following, if we ensure that no EQCP has any
shared code, then the only way to make change is to simply add new code, and
thus En ⊂ En+1 , and that gives minimum value of Σ if and only if |En+1 \ En | can
be minimized .
A value of Σ close to 0 shows the system has been very stable between
last to the current iteration. This is when “very loose” coupling ensured that
we can create branches which do not interact with existing branches that much.
We present order of magnitude estimates for “highly stable” uncoupled U Σ value
as follows:
U |En | O(2B ) 1
Σn+1 ≈ 1 − ≈1− B K
≈1− ≈ 0 ; B >> K (3.5)
|En+1 | O(2 + 2 ) 1 + 2K−B
We note that it is impossible to reach value 0 under any circumstances other
than when En = En+1 which means, the specification δn did not change anything
in EQCP space, e.g. a complete dud or spurious specification.
Importantly, there can be cases where even without coupling, as demonstrated
by : |En | << |En+1 | , then even though En ⊂ En+1 , the stability would be going
for a toss - this is driven by having B = O(K).
3.7.2. Unstable Point : 1. Now the other side of the coin is when En ∩ En+1 ≈ ∅,
in this case the value of Σ goes to 1.
A value of Σ close to 1 shows the system has been very unstable
between last to the current iteration. This is when “strong” coupling ensured
that we need to rewrite a lot of the EQCP implementations in code.
The “reasonably coupled” C Σ estimate would be as follows:
3.9. Chaos in Stability space. We now proceed to demonstrate that the itera-
tion driven by (τ, δn ) in Stability Space Σ has characteristics of a system capable
of showcasing chaotic dynamical behavior [28].
Given there is no universally agreed definition of chaos - we - like most people
would accept the following working definition [29] [30]:
Chaos is aperiodic time-asymptotic behavior in a deterministic sys-
tem which exhibits sensitive dependence on initial conditions.
These characteristics would now be demonstrated for iterated TDD.
(1) Aperiodic time-asymptotic behavior : this implies the existence of
phase-space trajectories which do not settle down to fixed points or periodic
orbits. For practical reasons, we insist that these trajectories are not too
rare. We also require the trajectories to be bounded : i.e., they should not
go off to infinity.
The sequence Σn ∈ Q ∩ (0, 1) is bounded by definition. The trajectories
are not rare, and it is practically impossible for the sequence to settle down
to periodic orbits or converging sequence. Note that w/o the presence of
coupling this sequence can be made to orbit around approximating 0 most
of the time.
(2) Deterministic : this implies that the equations of motion of the system
possess no random inputs. In other words, the irregular behavior of the
system arises from non-linear dynamics and not from noisy driving forces.
A FORMAL ANALYSIS OF ITERATED TDD 11
Then, a stretch happens when ∆(x, y, δ) > 1 and a fold happens when
∆(x, y, δ) < 1.
This is to say, stretch increases the distance between the trajectories
starting with (a, b) while fold reduces it. We notice that the definition of
Lyapunov exponent of the Σ would be as follows:
C O(2Bx ) 1
Σ(x, δ) ≈ 1 − B
≈ 1 − Kx
O(2 x +K x ) 2
This when substituted reduces to:
1
ǫ<
2Ka +Kb
Thus making the smallest bound possible for ∆ as :
4.2. Domain of Stability for Iterated TDD. Let’s imagine the worst case,
almost all of the sequences would be chaotic.
What is so problematic about chaotic dynamics appearing in the phase of “stabil-
ity” of EQCP ? This means there might a unpredictable amount of churn in terms
of the changes in the EQCP. And that means churns in the “pair points specifi-
cations” e.g tests which were to “hold the correctness of the software”, implying a
unpredictable, possibly a very high implementation change MUST happen.
If in one iteration which was created by a tiny change in specification impacted
50% of the test cases to refactor source code and tests thoroughly, evidently this
would become a huge problem.
The chaotic thesis suggests that not this is only possible, but also highly likely
due to the mixing of EQCPs in terms of coupling, and a direct result of code
refactoring trying to apply DRY principle.
Hence the formal idea of just fixing input output points and rapid, small iteration
on specification can not work in general unless we keep on reducing the scope of
the specification.
It is only guaranteed to work (produce provably correct software and predictable
amount of code churn) at the lowest abstraction level if there are very less coupling
by definition. Unfortunately the proponents of TDD want to make it work even
at user specification level - where it entirely lose out its rigor and has no provable
applicability to either improve the quality of the product or the code itself.
contexts are in fact the unit tests with very less coupling which guarantee of
becoming chaos free!
We can now formally define scope for formal iterated TDD, which is guaranteed
to work - e.g. create formal verifiable correct software as follows without ever
destabilizing source code:
Unit like tests where implementation of such features do not share
any source code, e.g. Independent (completely decoupled) - such
that in every iteration the decoupling holds true guarantee to hold
to verifiable correct behavior.
And it is in this context TDD reigns supreme. Anything other than that -
correctness or stability can not be guaranteed. Just like one can try to use a
scalpel to dig a canal, it just won’t work. Any effort of using the scalpel to create
a canal is not only misguided, but futile, and not even wrong.
Do iterative TDD, just ensure all EQCPs are completely decoupled, this, now
becomes a formally correct software producing code churn wise stable methodology.
Now, in practice it is hard to do, even for Unit tests, so a small amount coupling
should not really harm the effectiveness via that much - but at that point Chaotic
behavior stems in.
Principles like AHA, WET [21] comes in extremely handy in this regard. Even
with very less coupling there is no absolute guarantee of code stability, due to
emergence of chaos but at least we are in the right track by being formally correct,
and the resulting chaos can be tamed.
However, it gives a lot of people something to talk about and mental peace
just like Homeopathy sans effectiveness other than placebo as it was found out in
another research : [38].
We can also safely say, any low level, low coupled, EQCP based formal TDD
method would be reasonably successful, if those practices were to be followed,
iterated TDD would definitely be very effective. There are some publications
where it has been shown to do exactly that [37].
4.8. Cargo Cult “Software” Engineering? We can therefore conclude that
iterated TDD without understanding the applicability context is like washing your
hand with water before you eat, while the “washing hand” would be a good practice,
but if the water used was filthy, it would degenerate to numerable problems. This
is the status of industry with respect to TDD, for those who are into the right
context, it works, give or take. Those who are not, it does not.
We conclude by making a much more starker remark, the proponents of TDD,
or “industry best practices” stopped asking “is this effective or provable” a long
time ago. Their new established position is : “No evidence required for common
sense practices”. In fact, this is the verbatim response when asked about efficacy
and provability of some of the best practices:
You want to debate seriously? Then you have to drop the ridiculous
sense that “Good Practices” require scientific evidence before they
can be realized to work - which would disprove much of the “Good
Practices” which are “successfully used” in the industry.
Even if we ignore the irony of the previous quote, one but just wonder if evidently
Software had become entirely cargo cult [39], the above quote proves it beyond
doubt. Very few admit it openly, but it is what it has become.
5. Closing Remarks
Formal iterated TDD, as presented here, is shown to produce correct software
code. The issue with such production requires a lot more formal and practical
considerations.
When done correctly (by EQCP and reducing coupling between them) it en-
sures we can further add more features to the existing software while maintaining
stability as well as correctness as we go.
If that reduction of coupling is not followed, then the addition of more equiva-
lence classes could and most definitely would modify a significant amount EQCP
mapping by ensuring one must rewrite a very significant amount of tests, as well
as implementations. This is also seen in reality. Anything at any further higher
level of abstraction that Unit like tests would have impact like placebo.
Hence we propose iterated TDD is to be done at the Unit Testing level only,
where it works correctly and satisfactorily because of Units should be essentially
maximally decoupled keeping an constant eye on the coupling generated by those
A FORMAL ANALYSIS OF ITERATED TDD 17
tests being constantly added, which is hard, but not impossible to do and shows
provable theoretical efficacy: provably correct software production along with pre-
dictable code churn.
References
[1] Various, “Test Driven Development.” https://en.wikipedia.org/wiki/Test-driven_development,
2024. [Online; accessed 3-July-2024].
[2] K. Beck, “Canon TDD.” https://tidyfirst.substack.com/p/canon-tdd, 2023. [Online;
accessed 3-July-2024].
[3] “Aleph Numbers.” https://en.wikipedia.org/wiki/Aleph_number, 2024. [Online; ac-
cessed 3-July-2024].
[4] “Pointwise Convergence.” https://en.wikipedia.org/wiki/Pointwise_convergence,
2024. [Online; accessed 3-July-2024].
[5] “Computability.” https://en.wikipedia.org/wiki/Computability, 2024. [Online; ac-
cessed 3-July-2024].
[6] “Higher Order Function.” https://en.wikipedia.org/wiki/Higher-order_function,
2024. [Online; accessed 3-July-2024].
[7] “Test Vector.” https://en.wikipedia.org/wiki/Test_vector, 2024. [Online; accessed 3-
July-2024].
[8] “Decidability in Logic.” https://en.wikipedia.org/wiki/Decidability_(logic), 2024.
[Online; accessed 3-July-2024].
[9] “Halting Problem.” https://en.wikipedia.org/wiki/Halting_problem, 2024. [Online;
accessed 3-July-2024].
[10] “Oracle Machines.” https://en.wikipedia.org/wiki/Oracle_machine, 2024. [Online; ac-
cessed 3-July-2024].
[11] “Turing Completeness.” https://en.wikipedia.org/wiki/Turing_completeness, 2024.
[Online; accessed 3-July-2024].
[12] “Control Flow Graph.” https://en.wikipedia.org/wiki/Control-flow_graph, 2024.
[Online; accessed 3-July-2024].
[13] “Path in Graph Theory .” https://en.wikipedia.org/wiki/Path_(graph_theory), 2024.
[Online; accessed 3-July-2024].
[14] “Boundary Value Analysis.” https://en.wikipedia.org/wiki/Boundary-value_analysis,
2024. [Online; accessed 3-July-2024].
[15] “Equivalent Classes.” https://en.wikipedia.org/wiki/Equivalence_class, 2024. [On-
line; accessed 3-July-2024].
[16] “Gray Box Testing.” https://en.wikipedia.org/wiki/Gray-box_testing, 2024. [Online;
accessed 3-July-2024].
[17] “Equivalent Partitioning.” https://en.wikipedia.org/wiki/Equivalence_partitioning,
2024. [Online; accessed 3-July-2024].
[18] “Big Oh Notation.” https://en.wikipedia.org/wiki/Big_O_notation, 2024. [Online; ac-
cessed 3-July-2024].
[19] “Software Coupling.” https://en.wikipedia.org/wiki/Coupling_(computer_programming),
2024. [Online; accessed 3-July-2024].
[20] “Jaccard Index.” https://en.wikipedia.org/wiki/Jaccard_index, 2024. [Online; ac-
cessed 3-July-2024].
[21] “Do Not Repeat Yourself.” https://en.wikipedia.org/wiki/Don%27t_repeat_yourself,
2024. [Online; accessed 3-July-2024].
18 HEMIL RUPAREL AND NABARUN MONDAL
Pune
Email address: [email protected]
Hyderabad
Email address: [email protected]