From LCF To Isabelle:HOL
From LCF To Isabelle:HOL
Abstract. Interactive theorem provers have developed dramatically over the past four decades, from prim-
itive beginnings to today’s powerful systems. Here, we focus on Isabelle/HOL and its distinctive strengths.
They include automatic proof search, borrowing techniques from the world of first order theorem proving,
but also the automatic search for counterexamples. They include a highly readable structured language of
proofs and a unique interactive development environment for editing live proof documents. Everything rests
on the foundation conceived by Robin Milner for Edinburgh LCF: a proof kernel, using abstract types to
ensure soundness and eliminate the need to store proofs. Compared with the research prototypes of the
1970s, Isabelle is a practical and versatile tool. It is used by system designers, mathematicians and many
others.
Keywords: LCF, HOL, Isabelle, interactive theorem proving.
1. Introduction
Today’s interactive theorem provers originated in research undertaken during the 1970s on the verification
of functional programs. Two quite different tools were built: the Boyer/Moore theorem prover (now ACL2,
described elsewhere in this volume) and Edinburgh LCF [GMW79, Pau18].
Descendants of the latter include every member of the HOL family (HOL4, HOL Light, ProofPower)
[GM93] as well as Coq [BC04] and Isabelle [NPW02]. As we shall see in the sequel, the achievements of
the past 40 years lie on several dimensions, including type systems, proof languages and user interfaces.
Automation is the key to usability; this includes automated search for counterexamples as well as for proofs.
These developments are reflected in the size of the tools themselves. In 1977, the Edinburgh LCF dis-
tribution was 900 KB, including an implementation of the ML programming language in Lisp. In 1986, the
Isabelle distribution was a mere 324 KB (not counting ML, now a separate distribution). By 1991, Isabelle
exceeded LCF: 937 KB. In 2019, the Isabelle distribution had reached 133 MB! And by way of comparison,
HOL Light (which is much more closely related to LCF) was 84 MB. Much of this bulk consists of proof
libraries rather than executable code, though libraries make a crucial contribution to a system’s capabilities.
Today’s mature systems also include documentation and examples.1
Correspondence and offprint requests to: Lawrence C. Paulson, Computer Laboratory, University of Cambridge, 15 JJ Thomson
Avenue, Cambridge CB3 0FD, England. e-mail: [email protected]
1 The figures are for uncompressed distribution directories containing no binaries, but possibly PDF files.
2 Lawrence C. Paulson, Tobias Nipkow and Makarius Wenzel
Isabelle is a leading interactive theorem prover. It is generic, supporting a number of different formal cal-
culi, but by far the most important of these is its instantiation to higher-order logic: Isabelle/HOL. Already
during the 1990s, Isabelle/HOL was being applied with great success to the task of verifying cryptographic
protocols [Pau98]. Turning to mathematics, it played a critical role in Hales’s Flyspeck project, which verified
his proof of the Kepler conjecture [HAB+ 17]. It is the basis for the seL4 project, under which an entire oper-
ating system kernel was verified, proving full functional correctness [KAE+ 10]. It was adopted by researchers
outside the verification milieu for specifying and verifying algorithms for replicated datatypes that provide
“eventual consistency” [GKMB17]. Numerous other projects are underway around the world. Like other
proof assistants, Isabelle is not directly concerned with program verification, i.e. with verifying code written
in a programming language, but it can be used as a back end to prove verification conditions. Isabelle can
also be used as a verified programming environment, where mathematical functions can be proved correct
and then automatically translated to executable code in one of several different programming languages. The
translation process itself is currently unverified, but even this is likely to change in the near future.
This essay focuses on Isabelle/HOL and therefore has little to say about techniques common to most sys-
tems. For example, simplification by rewriting — coupled with recursive simplification to handle conditional
rewrite rules — was already realised in both the Boyer/Moore theorem prover [BM79] and Edinburgh LCF
by the end of the 1970s. Recursive datatype and function definitions, as well as inductive definitions, were
commonplace by around 1995. Linear arithmetic decision procedures were also widely available by then.
Our title echoes Mike Gordon’s paper “From LCF to HOL: a Short History” [Gor00]. Like Mike, we
begin with LCF, the source of the most fundamental ideas. But we pass over this material quickly in order
to focus on Isabelle. There is no way to surpass Mike’s account of the early years. He starts in 1969, with
Dana Scott’s Logic for Computable Functions, and covers the original LCF project at Stanford University.
He describes in detail the development of the successor LCF systems at Edinburgh and then at Cambridge.
He names all the people involved and finally outlines his own development of HOL.
We begin with LCF because of its seminal importance, continuing to HOL because it is so strongly
linked (§2). Then we focus exclusively on Isabelle. We begin with the core ideas of a generic reasoner built
around unification and backtracking (§3). Then we consider the task of supporting higher-order logic, which
required the introduction of type classes (§4). There follows an account of automatic proof search and its
dual, the search for counterexamples (§5–6). We also discuss the generation of code from logical functions
(§7). We then turn to Isabelle’s most distinctive features: its structured proof language (§8) and the powerful
user interface architecture supporting it (§9). The next section describes Isabelle’s Archive of Formal Proofs
(§10). To conclude, we discuss the synergy among these ideas, making them more powerful in combination
than individually (§11).
Coq and other systems built around constructive type theories represent a distinctive strand of develop-
ment and fall outside our scope.
type, thm, was the type of theorems and ML’s type checker was our guarantee that anything of type thm had
definitely been created exclusively by the application of inference rules. Just as a dictionary doesn’t need to
keep a record of the operations applied to it but only the dictionary itself, type thm doesn’t need to store
the proofs of theorems but only their statements. The resultant savings of space are as important now as
they were in the 1970s, for although today we have more memory, we also have vastly bigger proofs.
Edinburgh LCF also introduced proof tactics and tacticals to express backward, goal-directed reasoning.
An LCF tactic typically applies a specific inference rule, while tacticals denote control structures, e.g., THEN
(one tactic followed by another), ORELSE (which allows one of several tactics to be attempted) and REPEAT
(repeated execution of a tactic until it fails). Remarkably, even tactics lie outside the proof kernel. An LCF
tactic is a function that takes a goal (a formula with its assumptions) and returns a list of subgoals that it
claims are logically sufficient. To justify this claim, it returns a function operating on values of type thm,
but we have no guarantee that this function will deliver the promised theorem in the end. We see that the
LCF architecture makes proof procedures harder to implement, while reducing the amount of code that has
to be trusted.
We see incidentally two meanings of the word proof :
1. formal deductions of theorems from axioms using the inference rules of a logical calculus;
2. executable code written using tactics or other primitives, expressing the search for such deductions.
To resolve this ambiguity, the former are sometimes called proof objects or proof terms and the latter, proof
scripts or proof texts. Thus we see that the LCF approach eliminates the need to store proof terms and
allows proof scripts to be coded in ML using tactics and tacticals. Nevertheless, proof assistants based on
constructive type theories retain proof objects, as they are intrinsic to such formalisms.
The most fundamental question in the design of a theorem prover is what calculus to support. Boyer and
Moore made the inspired choice of Pure Lisp, which was sufficient for verifying simple functional programs and
formalising elementary number theory in the 1970s, and which has grown to support advanced applications
today. For Edinburgh LCF, Milner chose the Logic for Computable Functions, which was perfect for the
domain-theoretic investigations topical at the time but proved to be too quirky for general adoption.
Mike Gordon, who was one of the designers of Edinburgh LCF, ultimately adopted higher-order logic
as the basis for his research into hardware verification [Gor86, Gor00]. Higher-order logic is most easily
understood as a typed predicate calculus including function and Boolean types, and therefore also set types.
Gordon launched his HOL system in 1986 [Gor86, GM93], presenting both the logic and its application to
hardware specification and verification. HOL turned out to be extremely versatile and it soon attracted a
global user community. New versions emerged, in particular HOL Light [Har96]. Members of the HOL family
have been used in verification projects of every description, including the formalisation of great bodies of
mathematics.
However, it wasn’t obvious in the 1980s that one or two formalisms could be sufficient for the needs of
verification. Type theories, dynamic logics and many other formalisms were being proposed. Gordon himself
had reworked LCF twice in his hardware verification research. This was the origin of the idea that a theorem
prover could be generic: supporting a spectrum of formalisms through a common framework. Today this
includes syntactic tools (parsing, pretty printing), inference tools (rewriting, unification), a common proof
language (Isar) and user interface foundation (PIDE).
and backtracking would allow the use of logic programming techniques, as in Prolog [CM87]. For example,
in Martin-Löf’s type theory we have the following (derived) rule:
c∈A×B
fst(c) ∈ A.
Backward chaining, using this rule to prove something of the form fst(c) ∈ A, leaves us with a subgoal of
the form c ∈ A × ?B, where the question mark indicates that ?B is a hole to be filled: in Prolog parlance,
a logical variable. Unification fills in the holes later, while backtracking can orchestrate alternative ways of
filling the holes. In this example, the form of c will determine the choice of ?B. The ability to defer choices
until they become clear is as valuable in theorem proving as it is in Prolog programming.
It may be worth mentioning that Prolog was a hot topic in the 1980s. However, standard first-order
unification was simply not applicable to Martin-Löf type theory, which is based on a form of typed λ-
calculus. Higher-order unification would be necessary. Undecidable in its full generality, a reasonably practical
unification procedure had recently been published by Gerard Huet [Hue75].
Higher-order unification is strikingly different from first-order unification. The latter is easily implemented
and delivers a unique answer modulo variable renaming. Higher-order unification allows even functions to
be variables and can instantiate such variables with λ-abstractions generated on the fly, yielding multiple
results. For example, it is possible to unify F M with 3 in two different ways: F = λx. 3 (with no constraint
on M ) and F = λx. x, M = 3. Unifying F 3 with 3 + 3 could yield four different results: F = λx. x + x,
F = λx. x + 3, F = λx. 3 + x and F = λx. 3 + 3. The search for such unifiers works by descending through the
terms one level at a time, attempting two kinds of steps: projections (use of a bound variable) and imitations
(copying the opposite term). The examples above show how it can find different ways of abstracting one
term from another.
With such a sophisticated syntactic mechanism built in at the lowest level, it became clear that the
LCF approach to inference rules could be radically changed [Pau86]. Consider the proof rule of universal
elimination:
∀x. φ(x)
(1)
φ(a)
Its LCF representation is an ML function taking two arguments: a theorem (which must have the form
∀x. φ(x)) and a term a (which must have the same type as x). It then generates the desired conclusion,
namely φ(a), raising an exception if any of the preconditions is violated. This approach is general; the
drawback is the tedium and attendant risk of error when there are dozens of rules. With our λ-calculus
framework, inference rules such as the one above can simply be written out in the form of a template
and instantiated using unification. Better still, unification could be applied either to the premise or to the
conclusion, yielding forward or backward proof through a single mechanism. This is Isabelle’s central idea
[Pau86].
The precise nature of these templates remained to be determined, and the best approach turned out to be
a sort of logical framework. These are specialised formalisms whose purpose is to encode other formalisms. The
Edinburgh Logical Framework [HHP93] is the best known of these. It differs from Isabelle’s by incorporating
proof objects into the calculus itself, neutralising one of the key advantages of the LCF architecture: that
proofs do not have to be stored.
So Isabelle can be seen as an instance of the traditional LCF approach, but where type thm formalises
a logical framework or meta-logic, where other formalisms (the object-logics) can later be encoded. Such
encodings can also be proved correct [Pau89]. Our logical framework approach is less general than the
original LCF representation, where an inference rule can undertake an arbitrary computation. Nevertheless,
it captures a variety of possibilities, and moreover, it is open-ended: for example, Isabelle defines intuitionistic
first-order logic, which in a succession of formal theories is extended with classical logic, then with Zermelo–
Fraenkel set theory, then with the axiom of choice. With the original LCF approach, once you define the
type thm, it can never be extended.
The first object-logic was constructive type theory (Isabelle/CTT). It was followed by a classical first-
order sequent calculus (Isabelle/LK) and by natural deduction calculi for intuitionistic and classical first-
order logic (Isabelle/IFOL and Isabelle/FOL) [Pau93]. Later, a substantial development of Zermelo–Fraenkel
set theory was developed on top of Isabelle/FOL [PG96]. This is one of the leading tools for formal reasoning
in axiomatic set theory.
From LCF to Isabelle/HOL 5
Although Isabelle took a declarative approach to defining logics, the system architecture was still based
on LCF’s philosophy that everything was done in ML. All interactions with Isabelle took place at the ML
toplevel. For example, the declaration of new types, constants with their definitions, and axioms all required
calling appropriate ML functions with any necessary data as arguments. This began to change in the early
1990s [Pau94] when logics could be defined in a theory file with types, constants and proof rules declared
using a natural syntax.2 Proofs, however, were still expressed in ML. The structured proof language Isar
(Section 8.1) came later. Of today’s LCF-based systems, the HOL family has remained the most faithful to
the original conception, with proofs coded in ML.
Paulson’s original implementation of higher-order unification [Pau86] was practical, but still slow. Yet
in Isabelle practice, many unification problems are first-order, or almost so. Dale Miller [Mil91] discovered
a subclass of λ-terms, later called (higher-order) patterns [Nip91a], which behave like first-order terms:
unification is decidable and if two terms are unifiable, they have a most-general unifier. A term (in β-
normal form) is called a higher-order pattern if every free occurrence of a variable has as arguments a list
of distinct bound variables. Most unification problems in Isabelle are of this form. Nipkow gave a succinct
implementation of pattern unification and added it to Isabelle [Nip93a]. Full higher-order unification is
invoked only if pattern unification encounters a non-pattern.
A special case of unification is matching where the variables of only one of the two terms are instantiated.
Isabelle’s rewrite engine (aka the simplifier ) is based on higher-order pattern matching. Thus the simplifier
can deal with many standard transformations of quantified terms, for example the following ones:
(∀x. P (x) ∧ Q(x)) = (∀x. P (x)) ∧ (∀x. Q(x))
(∀x. P ∨ Q(x)) = P ∨ (∀x. Q(x))
(∀x. x = t ∧ P (x)) = P (t)
It appears that Isabelle was the first theorem prover to support higher-order rewrite rules [NP98].
2 This represented a return to the original Edinburgh LCF, which also supported theory files. The Isabelle version was inspired
by Goguen’s OBJ system [Gog79, FGJM85], in particular concerning the declaration of new mixfix syntax.
3 Beware of terminological confusion regarding the word sort, which for first-order logic is synonymous with type. Our use of
sort below will be entirely different.
6 Lawrence C. Paulson, Tobias Nipkow and Makarius Wenzel
be unsuitable to a particular object-logic. In first-order logic, the type of x in ∀x. φ(x) must not involve
functions or Booleans.4 Order-sorted polymorphism solves such difficulties by introducing a hierarchy of
sorts on types [Nip91b]. A sort is a finite intersection of type classes, which are essentially collections of
types. For first-order logic, we may introduce the type class FO for all types for which quantification is
permitted. For higher order logic, we would have a different type class (say HO), containing Boolean and
function types. We write τ :: C if type τ has class C.
Now we need to express how (possibly nullary) type constructors such as nat (the type of natural num-
bers), Cartesian product, function space and list (the type of lists) act on type classes. For example, for
first-order logic we want that nat :: FO and that if τ :: FO then τ list :: FO. These assertions can be codified
in Isabelle as instance declarations:
instance nat :: FO
instance list :: (FO) FO
Thus nat list :: FO holds, but in the absence of corresponding instances, bool :: FO and bool list :: FO do not:
truth values, or lists of them, do not belong to FO. Technically, these instance declarations form the signature
of the (term) algebra of types. Under certain natural conditions on the interaction of the class hierachy
with the instance declarations, order-sorted unification of types, like unsorted unification, is still unitary:
solvable unification problems have most-general unifiers. Therefore these conditions guarantee that we still
have principal types. This theory, combined with extensions to higher-order unification [Nip91b], allows full
support for higher-order logic. And so the “templates” of our framework contain explicit type variables, in
addition to the ordinary variables that we had before. This version was first released in 1991 [NP92]. At the
same time, the correspondence to Haskell’s type classes was worked out [NS91, NP93].
4 Quantification over Booleans (and therefore, relations) requires at least second-order logic [Bar77, page 7].
From LCF to Isabelle/HOL 7
as the extended reals (R∪{+∞, −∞}) and the nonnegative extended reals; both of these belong to a number
of ordering and topological type classes.
An exposition of axiomatic type classes with fully worked out examples is available in a paper by Paulson
[Pau04]. It’s slightly outdated — we have a much more elaborate type class hierarchy now and a somewhat
different syntax — but the principles are the same.
Wenzel [Wen97] reduced type classes to (1) overloaded constant definitions (OCDs) for the signature and
(2) predicate definitions over a single type variable for the specification. Getting the notion of definition
with type-polymorphism right is notoriously tricky. For example, in an early version of Gordon’s HOL
system, constant definitions could introduce inconsistencies [Art16]. The problem is simply to detect ill-
formed definitions, in particular circularities, automatically. Circularities can lead to inconsistencies. Type
classes link the level of types with the level of terms, opening up new opportunities to create highly obscure
circularities.
Wenzel analysed the impact of OCDs on the consistency of an arbitrary theory. He sketched conditions
under which they were meta-safe, which roughly means that they can be removed without affecting provabil-
ity. Meta-safety implies conservativity, which in turn implies consistency preservation. However, his proof
sketch considered an idealized version of Isabelle’s actual definition facilities, which were only partially im-
plemented and in conflict with existing application theories. In particular—in order to stay within Gordon’s
HOL and the Isabelle logical framework—he excluded the interleaving of OCDs with type definitions.
Obua [Obu06] found that Isabelle accepted OCDs that introduced inconsistencies. He gave a more rigorous
and general formulation of Wenzel’s conditions [Wen97] and implemented them using an external termination
checker, outside of the inference kernel. He also sketched a proof that these conditions ensured conservativity.
As an alternative, Wenzel and Haftmann proposed a much simpler (and stronger) check on OCDs [HW06]
inside the logical kernel, but this excluded a few application theories with ambitious overloading.
Several years later Kunčar and Popescu [KP19, KP18] rediscovered that combining overloaded constant
definitions with type definitions could introduce circularities that were missed by the previous analyses.
They gave extended checks to avoid these circularities and proved that these checks ensure consistency of
theories that extend the initial HOL theory (comprising the standard HOL axioms) with overloaded constant
definitions and type definitions. In the end [KP18], they strengthened the result to show that such definitional
extensions are in fact meta-safe over the initial HOL theory.
These particular concerns were specific to detecting circular definitions, but all proof assistants potentially
contain errors, like any other software. A notable case is PVS [Owr06], which lacking an LCF-style proof
kernel was particularly vulnerable to soundness errors (at least in the 1990s):
PVS still seems to contain a lot of bugs and frequently new bugs show up. . . . It would be desirable
that the bugs in PVS would only influence completeness and not soundness. Unfortunately, this is
not the case, as some recent proofs of true==false have shown [15, bug numbers 71, 82, 113 and
160]. . . . It is reasonable to assume that PVS will continue to contain soundness bugs. The obvious
question thus arises, why use a proof tool that probably contains soundness bugs? Our answer is
threefold: PVS is still a very critical reader of proofs. PVS lets fewer mistakes slip through than many
of our human colleagues. . . . Furthermore, history tells us that the fixed soundness bugs are hardly
ever unintentionally explored, we know of only a single case. Thirdly, most mistakes in a system that
is to be verified are detected in the process of making a formal specification.[GH98, page 134–5]
Notwithstanding such forbearance, users deserve a much higher standard of correctness than this. At the
same time, it’s vital to stress that users need to take responsibility for their own definitions: we know how
to detect circularity, but no system can check whether a definition conforms to the user’s true intentions.
As mentioned at the start of this section, the purpose of the foregoing work was to make possible an Isa-
belle instantiation of higher-order logic. Isabelle/HOL emerged in 1991 [NP92] and was soon a fully capable
8 Lawrence C. Paulson, Tobias Nipkow and Makarius Wenzel
alternative to HOL [NPW02]. In some ways it emulated the latter, particularly in its axiom system. Never-
theless, it is thoroughly Isabelle in its treatment of theorems, tactics, etc; we cannot regard Isabelle/HOL
as a member of the HOL family the way that HOL Light is.
Both Isabelle and HOL implement Milner’s idea of a proof kernel implementing a formal calculus as an
abstract type called thm. The essential difference is that Isabelle’s formal calculus is a logical framework in
which other formalisms can be defined, while HOL’s is simply higher-order logic. Either way, a function of
type thm->thm implements an inference rule. In the case of Isabelle, this would be an operation at the level
of the logical framework, such as the resolution rule through which proofs are constructed. In the case of
HOL, this would be an inference in higher-order logic itself, like the quantifier rule (1). In HOL, a tactic must
be coded separately corresponding to each rule of inference, while in Isabelle, inference rules such as (1) are
all expressed declaratively and applied using the generic resolution rule.
Isabelle’s generic nature can be seen in how many of its capabilities are shared among its various instances.
The common libraries include a general representation of syntax, with parsing, pretty printing, type checking,
simplification and other proof procedures, as well as the proof language and user interface. They are available
in other instances of Isabelle, such as Isabelle/ZF, which has been the basis for substantial developments in
Zermelo–Fraenkel set theory [Pau03].
5. Automation
It was clear from the outset that machine proof was extremely laborious and could only be feasible if the
machine itself provided as much automation as possible. But it was also clear that fully automatic theorem
proving was not practically achievable. Significant advances in automation had already been made by the
end of the 1970s:
1. decision procedures for arithmetic, arrays, lists, etc., as well as methods for combining decision procedures
[NO80];
2. resolution for first-order logic [Rob65, Ove75], complete in principle but frequently disappointing in
practice;
3. the signature automation of the Boyer/Moore theorem prover [BM79]: conditional rewriting plus powerful
heuristics for induction.
Of these, simplifiers based on rewriting (by previously proved theorems of the form t = u) quickly found
their way into LCF, HOL, etc. All such simplifiers eventually supported conditional rewriting, for rewrites
of the form φ =⇒ t = u; in such a case, the necessary instance of φ would be proved recursively by the
simplifier itself. Arithmetic decision procedures were eventually adopted in many systems. On the other
hand, resolution had acquired a bad reputation. As late as 2002, Shankar could write
The popularity of uniform proof methods like resolution stems from the simple dogma that since first-
order logic is a generic language for expressing statements, generic first-order proof search methods
must also be adequate for finding proofs. This central dogma seems absurd on the face of it. . . . A more
sophisticated version of the dogma is that a uniform proof method can serve as the basic structure
for introducing domain-specific automation. There is little empirical evidence that even this dogma
has any validity. . . . Automated reasoning has for too long been identified with uniform proof search
procedures in first-order logic. This approach shows very little promise. [Sha02, pages 3–4]
Of the various proof assistants, only Isabelle saw a sustained effort to incorporate ideas from resolution.
As mentioned in Section 3 above, Isabelle supported both unification and backtracking from the start, with
the aim of incorporating ideas from first-order automatic proof procedures. In the context of interactive
proof, unification provided the ability to prove a subgoal of the form ∃x. φ(x) by removing the quantifier and
proving φ(?t), where ?t stood as a placeholder for a concrete term to be supplied later. Through unification,
this term could even be built up incrementally. Dually, unification provided a means of using a universally
quantified fact ∀x. φ(x), when the required instances were not immediately obvious.
From LCF to Isabelle/HOL 9
5.2. Sledgehammer
By the early 2000s, resolution theorem provers [RV02, Sch04, Wei01] were demonstrating power far beyond
anything the classical reasoner could ever achieve. The idea of some sort of interface between Isabelle and
these systems was beguiling. This idea wasn’t new: such linkups have been attempted on a number of past
occasions, always unsuccessfully. The key was to make such a linkup useful:
The guiding idea is that user interaction should be minimal. The system should invoke automatic
provers spontaneously or in response to a trivial gesture such as a mouse click. These proof attempts
should run in the background, not disturbing the user unless a proof is found. Proofs should refer to
a large library of known lemmas: users should not have to select the relevant ones. The automatic
prover should not be trusted; instead, proofs should be translated back into the formalism of the
interactive prover. Proofs should be delivered in source form to the user, who can simply paste them
into her proof script. [MQP06, p. 1576]
A major difficulty with building an interface between Isabelle/HOL and first-order automatic theorem provers
is that they operate on quite different formalisms. Isabelle/HOL has λ-abstractions, types and type classes,
while first-order logic has none of these. One click invocation could only be achieved if the system itself
took care of everything: the translation of higher-order syntax, some representation of type constraints, etc.
An additional requirement was relevance filtering: to identify the most suitable of the thousands of facts
available in an Isabelle session, since providing too many would overwhelm the first-order provers.
The final difficulty was to translate the proofs discovered by the external reasoning tools into Isabelle’s
proof kernel. This translation moreover had to be expressed as a source-level proof so that the expensive
proof search would not have to be repeated. Resolution theorem provers typically print a formal justification
of their reasoning, but it is difficult to interpret and frequently ambiguous. The approach eventually adopted
10 Lawrence C. Paulson, Tobias Nipkow and Makarius Wenzel
Fig. 1. Sledgehammer GUI in Isabelle/jEdit: clicking on highlighted output inserts the proposed proof snippet
into the text
was to extract from their output nothing but the list of axioms—each a known Isabelle theorem—used in the
proof. This list would typically contain no more than eight axioms, yielding a problem simple enough to be
proved by simple tactics integrated with Isabelle’s kernel [PS07]. This tool became known as Sledgehammer.
The original version was developed at Cambridge, but Sledgehammer was comprehensively rewritten at
Munich, particularly by Blanchette [BBP13], who greatly increased its scope and power.
Sledgehammer helps beginners by identifying and using theorems that they didn’t know existed. It puts
the latest theorem provers, such as Vampire and Z3 [dMB08], at their disposal with a single click (Figure 1).
But even advanced users are often surprised by the proofs Sledgehammer comes up with. It is now seen as
indispensable, and similar subsystems are being developed for other interactive theorem provers.
6. Counterexample Search
Isabelle’s proof methods and Sledgehammer are effective for proving theorems, but given an invalid conjecture
they normally fail to detect the invalidity, let alone produce an informative counterexample. Novices and
experts alike state invalid theorems and find themselves wasting hours on impossible proofs. To make proving
more enjoyable and productive, Isabelle includes counterexample generators that complement the proof tools.
The main ones are Quickcheck and Nitpick. As a simple example, suppose that the user types in this lemma
statement (where rev reverses a list and @ appends two lists):
lemma rev (xs @ ys) = rev xs @ rev ys
Quickcheck is invoked automatically and displays the counterexample xs = [a1 ], ys = [a2 ].
6.1. Quickcheck
As the name suggests, Quickcheck is Isabelle’s counterpart to the QuickCheck testing tool [CH00] for Haskell.
The original Quickcheck [BN04] combined Isabelle’s code generation infrastructure (Section 7) with random
testing, covering both recursive functions and inductive predicates. It aimed at providing fully automatic
counterexample search, in contrast to Haskell’s QuickCheck, which is an infrastructure for building special-
ized random generators for testing. Therefore it worked automatically whenever all functions and inductive
predicates were executable, but, depending on the property, it might take forever to find a counterexample.
Consider especially conditional properties φ =⇒ ψ where most of the random values simply falsify φ; this
situation requires a special random generator that yields only values that satisfy φ.
Therefore Bulwahn [Bul12a, Bul12b, Bul12c, BBN11] refined Quickcheck in three directions:
• Exhaustive enumeration of small values, an idea due to Runciman et al. [RNL08].
From LCF to Isabelle/HOL 11
• Generating values not randomly but synthesizing generators from the premises of conditional properties.
• Symbolic testing with narrowing, a technique from functional-logic programming [AH10].
Isabelle’s Quickcheck was inspired by the work of Dybjer et al. [DHT03] in the theorem prover Agda: they
had followed the original QuickCheck design, which expected users to set up specialized random generators
in the host language/logic. Over the next decade, other major theorem provers adopted analogous checkers:
PVS [Owr06], ACL2 [CDKM11] and Coq [PHD+ 15].
Quickcheck is one of Isabelle’s best-loved tools (after Sledgehammer), partly because it is invoked auto-
matically and silently every time the user types in a lemma. You may not even be aware of its existence
and it suddenly announces that it has found a counterexample to your purported lemma. The result is a
surprised and grateful user. Quickcheck is most effective in the context of functional programming combined
with inductive predicates.
6.2. Nitpick
A radically different approach to Quickcheck is based on systematic model enumeration using a SAT solver.
This approach was pioneered by the tool Refute [Web05, Web08] and is now embodied by Nitpick [BN10,
Bla12]. Nitpick looks for finite fragments (substructures) of infinite countermodels, soundly approximating
problematic constructs. Common Isabelle idioms, such as inductive and coinductive predicates and datatypes
as well as recursive and corecursive functions, are treated specially to ensure efficient SAT solving [Bla13].
The actual reduction to SAT is performed by the Kodkod library [TJ07]. Given a conjecture, Nitpick (via
Kodkod and the SAT solver) searches for a standard set-theoretic model that falsifies it while satisfying any
relevant axioms and definitions. Nitpick is innately better suited to problems from set theory and logic than
Quickcheck. Nitpick revels in particular in finite combinatorial problems.
The first tool that exploits SAT-solving for finding counterexamples in a theorem prover (ACL2) seems
to be due to Sumners [Sum02]. A second prototype tool, again for ACL2, was developed by Spiridonov and
Khurshid [SK07] and was based on Kodkod. Blanchette was unaware of this work when developing Nitpick.
7. Code Generation
Code generation is the process of generating efficiently executable code from definitions in the logic of a
theorem prover. It serves three main purposes: to obtain actual runnable software, to validate definitions by
executing them on concrete values, and to search for counterexamples to properties by testing.
Most major theorem provers are based on logics that have an executable sublanguage. In the Boyer and
Moore theorem prover, from its earliest incarnation [BM79] to present-day ACL2 [HKMS17], the logic is
itself a purely functional fragment of Lisp: all expressions can be executed according to Lisp semantics. In
Martin-Löf type theory [ML84], the term language — including proof terms — has a well-defined operational
semantics. Terms are also executable, subject to certain conditions, in other type-theory based systems such
as Coq [BC04]. Note that the types (which correspond to the formulas of predicate logic) are not executable.
In the case of HOL, a sublanguage can be identified that corresponds to a functional programming
language. Then we superimpose an operational semantics on this fragment: programs are sets of equations
that are to be used as rewrite rules, from left to right. This idea goes back to term rewriting and was
expressed succinctly by titles like Computing in Systems Described by Equations [O’D77] and Programming
with Equations [HO82]. The step-by-step execution of an equational program by rewriting corresponds to
performing a proof in equational logic. If the rewrite rules come from recursive function definitions in the first
place, it is natural to translate them to programs in an ML-like functional language with pattern-matching.
The first such tool was created by Rajan [Raj93] for Gordon’s HOL system [GM93].
7.1. History
Berghofer and Nipkow [BN02] realised the approach above, creating a compiler in Isabelle from a subset
of HOL into ML. Datatypes and recursive functions defined by pattern matching are translated directly
into their ML counterparts. They also extended the approach by making a subset of inductive predicates
12 Lawrence C. Paulson, Tobias Nipkow and Makarius Wenzel
executable. The idea is to view them as Prolog programs and to perform a mode analysis [DW88]. Each
possible mode partitions the arguments of an inductive predicate into inputs and outputs, and for each mode
the inductive predicate is compiled into an ML function that maps an input tuple to a stream of output
tuples. A subset of HOL terms and certain queries involving inductive predicates can now be compiled and
executed in ML. This formed the basis of Isabelle’s initial Quickcheck (Section 6.1).
Isabelle’s code generator is part of the trusted kernel. Work by Haftmann [Haf09] and others improved
its reliability and functionality significantly: code generation for inductive predicates was moved out of the
kernel by translating inductive predicates into recursive functions inside HOL, where the equivalence is
proved [BBH09]. Reliability of the compilation of the purely functional sublanguage was improved by a
pen-and-paper correctness proof [HN10]. Code generation was extended to type classes by eliminating them
in a first step. Code generation was extended to support data refinement [HKKN13], i.e. the automatic
implementation of abstract types like sets by concrete types like search trees. Due to a new modular design,
further target languages are easier to add: as of this writing, the code generator supports Standard ML,
OCaml, Haskell and Scala.
Most recently, Hupel [HN18] has provided an alternative verified code generator that translates HOL into
CakeML, an ML-like functional language with a verified compiler [KMNO14]. CakeML is the only backend
for code generation that doesn’t have to be trusted, since the CakeML compiler has itself been verified
formally. This yields a verified tool chain from HOL to machine code, except that the CakeML compiler was
verified in HOL4 [SN08] rather than Isabelle/HOL. Eliminating this gap is the subject of current research.
7.2. Applications
The code generator has been an enabling technology for a large number of applications. The following are
some representative examples:
Imperative HOL Bulwahn et al. [BKH+ 08] developed a monadic embedding of imperative programs in
HOL. That is, on the HOL level everything is still purely functional, but in a monadic style. They
extended the code generator such that it translates these monadic definitions into actual imperative code
in the target language (SML etc.). This leads to substantially improved performance and is used in a
number of applications below.
Refinement Framework Lammich [Lam13, Lam16, Lam19] has developed an HOL framework for the
stepwise refinement of (possibly nondeterministic) algorithms down to (possibly imperative) executable
code. This framework is used in a number of applications below. Lammich and others [LT12, Lam14, LS19]
have used the framework extensively for the verification of efficient graph algorithms.
Model Checking and SAT Solvers Esparza et al. [ELN+ 13] developed an executable verified model LTL
model checker (10–50 times slower than SPIN [Hol97] on standard benchmarks) that was later extended
with partial order reduction [BL18]. Brunner and Lammich built on work by Peled [Pel96] but found
that one of his lemmas was incorrect; thus they were unable to use his actual reduction algorithm.
Siegel [Sie19] found a counterexample to the correctness of Peled’s algorithm with the help of the Alloy
analyzer [Jac06].
Wimmer and Lammich [WL18] developed an executable model checker for timed automata whose through-
put is about one order of magnitude lower than Uppaal’s [LPY97] on standard benchmarks (but degen-
erates for large state spaces).
Lammich [Lam17] verified an executable checker for unsatisfiability certificates emitted by SAT solvers
which is twice as fast as the standard unverified checker.
Term Rewriting Thiemann has been developing a huge formalisation of the theory of term rewriting
called IsaFoR/CeTA5 over more than a decade now [TS09]. Initially, IsaFoR/CeTA was aimed primarily
at checking termination proofs found by automatic tools like AProVE [GAB+ 17]. The code generator
produces these proof checkers from their verified HOL formalizations. Today, IsaFoR/CeTA can also
check proofs of a term rewriting system’s complexity [DJK+ 18] and confluence [NM16].
Computer Algebra As representative examples we mention decision procedures for univariate real polyno-
mials (based on Sturm [Ebe15] and on cylindrical algebraic decomposition [LPP19]) and the Berlekamp-
Zassenhaus factorization algorithm [DJTY19].
5 http://cl-informatik.uibk.ac.at/software/ceta/
From LCF to Isabelle/HOL 13
Programming Languages One key application of theorem provers has been the formalization of program-
ming languages and compilers [Ler09, KMNO14, NK14]. The code generator was used by Lochbihler and
Bulwahn [LB11] to generate an interpreter directly from the semantics of a Java-like language with
threads.
Isabelle framework, allows independent checking of sub-proofs—even with parallel checking on multiple cores
enabled by default, due to Matthews and Wenzel [MW10, Wen13b].
The Isar proof language reuses some ideas from Mizar, a legendary software tool for doing mathematics by
machine [GKN15]. But the many impressive facilities of the Mizar language are intertwined with its unusual
set theoretic formalism, making the key ideas difficult to extract, especially given Mizar’s notorious lack of
documentation and closed sources. In contrast, Isar’s generic principles of proof—fixing local variables and
making local assumptions—are identified with the corresponding elements of Isabelle’s logical framework.
Thus it works with any object-logic that uses the builtin Natural Deduction paradigm of Isabelle, e.g. HOL,
FOL, ZF. Various derived Isar language elements provide explicit proof structure for typical reasoning seen
in object-logics: existential elimination and case-splitting, calculational chains of equalities and inequalities
(including substitution), structured induction etc. All of this is parameterised by declarations in the library.
The Isar proof language is parametrised by user-defined Isar proof methods, which are the old idea of ML
tactics fitted into the richer structure of the Isar engine. Thus new reasoning patterns may be added to a
theory library without revisiting the design of the proof language itself. For example, the Isabelle standard
library defines method rule for declarative forward/backward chaining explained in Section 3, and method
auto for a combination of the simplifier and the classical reasoner explained in Section 5.1. Proof methods are
either defined in ML (as in LCF or HOL), or in the Eisbach language [MMW16] (similar to Ltac definitions
in Coq): Eisbach uses the source notation of existing proof methods to define new ones via simple recursion
and pattern matching.
Despite its importance for the Isabelle ecosystem, the Isar proof language has not been adopted by other
interactive theorem provers. There are no fundamental obstacles to doing so, but it takes some effort to do
properly. Some isolated aspects of Isar have made it into the SSReflect language for Coq [GM10], notably the
have keyword for local claims within a proof. A bit more Isar syntax made it into the Lean prover [dMKA+ 15]
with slightly different meaning, though. Lean’s fix, assume, have, show, obtain construct certain λ-terms in
a more elementary manner than the Isabelle/Isar proof context export and goal refinement operations; e.g.
see the treatment of fix-assume-show in [Wen07, §2.2]. The experiments by Wiedijk towards supporting
structured proofs in HOL Light [Wie01] are not directly related to Isar. They are simply a family of HOL
Light tactics that allow a proof script written in ML to have some similarity to a Mizar text. This style of
working has not caught on in the HOL world.
Although some detailed aspects of Isar are specific to Isabelle—forward/backwards refinement via higher-
order unification (Section 3) and local proof contexts (Section 8.3)—the language design as a whole is generic.
It requires nothing from the underlying calculus but primitive notions of terms and types and logical notions
equivalent to “implies” and “for all”. It’s not too hard to envisage a common language to be shared among
proof assistants, offering portability of proofs exactly as today’s programming languages offer portability
between different machine architectures. The crucial missing ingredient is the ability to write assertions and
prove subgoals while minimising explicit references to a particular calculus.
The original LCF approach (Section 2) declares an abstract ML type thm of theorems, but the background
theory is implicit in the system state: during a session there is only one big theory under development,
which grows monotonically. There are primitives to introduce new types, terms, definitions and axioms
which augment the theory, but without any way to undo such a change. HOL provides some tricks to hide
unwanted constants via name-space manipulation, to help interactive development.
In contrast, as early as 1986, Isabelle declared the abstract ML type theory alongside the type of
theorems. Type theory makes available the initial theory of the logical framework and allows for its extension
as an acyclic graph of application theories: there are operations to extend and merge theories. Multiple
theories can coexist in a single Isabelle/ML session, but end-users only work with a single theory document
at a time. This may import other theory documents—in foundational order from bottom to top. The concrete
syntax looks like this:
theory Test
imports Main HOL-Library.Finite Map HOL-Library.Finite Lattice
begin
definition constant = term
From LCF to Isabelle/HOL 15
[Pure]
[HOL]
Fib [HOL-Library]
[HOL-Computational_Algebra]
Mod_Exp Totient
Residues
Euler_Criterion Pocklington
Gauss Residue_Primitive_Roots
Quadratic_Reciprocity
Number_Theory
to exist and can be returned to. This also includes add-on content like the ML environment or hints for proof
tools: thus the theory context provides a default set of parameters, according to the imports from the library.
In summary, Isabelle theories provide large-scale structure to formalisation projects, with a built-in notion
of monotonic reasoning over an acyclic graph of theory nodes.
LCF and the HOL family lack an explicit notion of proof context. There are usually some auxiliary structures
to manage the current proof state (maybe just a list of subgoals) for interactive theorem proving. However,
proof tools cannot refer to those; they merely see an isolated goal. Such a goal is essentially a sequent: a
list of formulas (the assumptions) paired with another formula (the goal itself). Proof assistants based on
dependent type theories (like Coq) do have a formal context Γ that declares variables, but this again is
essentially a sequent. When applying HOL tactics it is frequently necessary to re-state the types of variables
present in the statement of the theorem being proved, and one can even introduce two variables called x
with different types. So at its most basic, we need a context to associate a type with each variable involved
in the current proof. In fact, they do much more.
A genuine Local Proof Context implemented by the ML type Proof.context first appeared in Isabelle99,
as infrastructure for the then emerging Isar proof language (Section 8.1). The idea is to support a block-
structured notepad with recursive nesting of local declarations (e.g. type parameters, term parameters,
assumptions) and local conclusions that are generalized when leaving a nested block: there is a generic
export operation to move results from a nested context into the enclosing context:Vthis usually turns context
elements into the rule structure of the Isabelle logical framework (the quantifier and connective =⇒). In
concrete syntax, this looks as follows:
notepad
begin
{
fix x y z
assume A x and B y
have C x y z hproof i
} V
note h x y z . A x =⇒ B y =⇒ C x y z i
end
Here the final note recalls the result from the preceding proof block (enclosed in the curly brackets). This
block introduces three bound variables, assumed to be fixed and to satisfy A x and B y; from these we prove
C x y z. The effect of this block is to prove the formula shown in the note.
The notepad leaves nothing behind in the theory context: it is merely an experiment. In contrast, a
theorem statement produces an initial proof context with a pending claim: the theorem to be proved. The
subsequent proof body needs to solve that in the context, potentially with further nesting of local contexts
and auxiliary claims. The final result extends the enclosing theory (section 8.2) by the new fact.
Proof tools may access the Proof.context value at each point. It contains the logical content (pa-
rameters, assumptions, proved facts), local syntax, and hints to guide proof strategies. Thus the universal
Proof.context replaces Isabelle’s earlier tool-specific contexts like simpset for the simplifier or claset for
the classical reasoner. This uniformity is important to combine tools: e.g. the simplifier uses the context to
extract its own information (rewrite rules and auxiliary proof procedures); during the simplification process it
augments the context and passes it on to other procedures, which in turn extract their own information from
it (e.g. for classical reasoning). Users can even extend the context with additional components to support
proof procedures specific to their own applications. Contexts provide extensibility along with modularity.
In summary, a proof context in Isabelle represents a local situation derived from the enclosing theory.
Nested contexts appear and disappear; only exported results remain. For proof tools, the context is a universal
environment for storing tool-specific information.
From LCF to Isabelle/HOL 17
Isabelle locales provide an infrastructure for structured specifications: definitions, statements and proofs of
a theory may depend on local parameters (type and term variables) and local premises (hypotheses). The
subsequent example specifies partial orders axiomatically and defines a derived operation and theorem in
that specification context:
locale partial order =
fixes le :: 0a ⇒ 0a ⇒ bool (infixl v 50 )
assumes refl : x v x
and trans: x v y =⇒ y v z =⇒ x v z
and antisym: x v y =⇒ y v x =⇒ x = y
This introduces a new locale, called partial order. The locale declares le, effectively a constant to which we
attach infix syntax. The locale asserts what are effectively axioms. But in reality, a locale abbreviates a
predicate taking in this case a single argument, asserting that it satisfies the given assumptions. The point
is to allow this small specification to be imported simply by quoting the name partial order.
definition (in partial order )
less :: 0a ⇒ 0a ⇒ bool (infixl @ 50 )
where x @ y ←→ x v y ∧ x 6= y
HOL. The original locale concept was inspired by experiments (since abandoned) that had been done in
HOL using higher-order predicates. Coq has a richer logic with a built-in notion of “sections”: the original
Isabelle locales from 1999 [KWP99] were inspired by that, but without augmenting the logical framework
of Isabelle. This design principle of building higher concepts without extending the existing logic was later
adopted by Coq: its type classes are built on top of existing concepts like records, predicates, and implicit
arguments.
LCF and HOL can be seen as being nothing but libraries of proof procedures written in ML. The user can call
those procedures via the ML toplevel but does not have to prove theorems and could instead, say, calculate
π to 10,000 digits. In this sense, using LCF or HOL is the same as working with any other subroutine library.
This type of ML toplevel no longer exists in Isabelle. Instead, ML has become a sub-language of the
framework. Syntactically, ML expressions (or whole modules) may appear within the theory and proof
language of Isabelle/Isar: for example, in the command method setup to define a new proof method.
Semantically, program snippets may depend on symbolic entities from the logical context: types, terms,
facts, etc. The ML compiler is invoked at run-time within an augmented environment; it refers to logical
entities as well as its own environment (for ML types, values, modules). The static result is an updated ML
environment within the context: thus it also conforms to the parallel evaluation model of Isabelle/ML.
Specialised proof procedures can be implemented in this manner within the normal theory document,
using the regular Prover IDE (see Section 9). This integration of programming with logic works without
augmenting the logic: like in original LCF, ML has access to the implementation of the object-logic, but is
not part of the logical formalism. This is in contrast to Coq, where users often implement proof tools inside
the logical language itself (with correctness proofs), but genuine extensions in OCaml need to be assembled
outside the system as “plugins”.
The original LCF proof assistant from 1979 used a line-oriented terminal or teletype. This model is known
as a read-eval-print loop (or REPL): the user types one command after another, reacting on output printed
by the prover. When computer screens and multiple windows arrived, there was often a split into two areas:
the editor to work on a growing “proof script” and the terminal with the REPL to update its state, using
manual copy-paste operations from the editor.
Around 1998, the highly influential Proof General interface for Emacs was released for the first time
[Asp00], including support for Coq and Isabelle. Here a refined model of copy-paste and state-synchronization
is baked into the editor (which is freely programmable in Lisp): the user can move a frontier of already checked
text either forwards (apply command) or backwards (undo command); only the unchecked part may be freely
edited. Proof General requires suitable undo operations of the prover, and for robustness it is better to have
a restricted command-language instead of arbitrary ML. Both are missing in HOL (any version), which
consequently still uses the original prover REPL, with some support through Emacs.
From LCF to Isabelle/HOL 19
Today, Coq remains as the main back-end for Proof General Emacs, there is also a popular Proof General
clone written in OCaml: CoqIde. In contrast, Isabelle discontinued both the REPL and its Proof General
mode in 2014: interaction now works exclusively via the document-oriented Prover IDE.
From 2008, multi-threaded ML programming became routinely available in Isabelle, for parallel processing
of theories and proofs within a single Poly/ML process. This posed some challenges to the robustness and
performance of the prover engine, addressed subsequently by Matthews and Wenzel [Wen13b].
Parallel processing is also in conflict with the traditional interaction model: the REPL acts like a single
focus of single-threaded command application. In order to remove many built-in assumptions of sequential
evaluation from the interaction model and to provide rich semantic information in proof authoring process,
Wenzel introduced the document-oriented Prover IDE (PIDE) approach [Wen11, Wen13a, Wen14, Wen19].
The main principles of PIDE are as follows:
• The prover supports document edits and markup reports natively. Interaction works via protocol com-
mands (like Document.update) that take regular prover commands as data (e.g. definition, theorem).
It has its own policies to process proof documents in parallel, according to the structure of the text.
• The editor connects the physical world of editor input events and GUI painting to the mathematical
document-model of the prover. There are pipelines to stream input and output events asynchronously,
with explicit identification of document versions.
• Add-on tools may participate in the ongoing document processing by conventional means, as isolated
functions from input to output that are managed by PIDE. External tools merely need to ensure that
interrupts work correctly: this is required when the user continues editing and old versions of the document
are discontinued eventually.
Unlike Proof General, PIDE never locks the source text: edits by the user may lead to instantaneous
updates by the prover, or significant delays for slow proof tools. Thanks to overall performance improvements
of Isabelle and its underlying Poly/ML implementation, this ambitious interaction model works smoothly.
Isabelle/PIDE is delivered to end-users as a fully integrated desktop application called Isabelle/jEdit: it is
based on the Java-based text editor jEdit (see http://www.jedit.org). This explains the initial motivation
to use the Java platform for the outwards facing side of PIDE, which is implemented in Scala (on the JVM).
Non-Java editors (e.g. VSCode) may be connected to PIDE via an extra socket connection that exchanges
JSON records. There is also a Headless PIDE server with a similar protocol; this allows PIDE to run under
control of another program.
Generally speaking, Isabelle/ML works best for pure applications of mathematical logic inside the prover,
but Isabelle/Scala allows us to connect to the physical world: IDE front-ends, database engines, TCP services
etc. Such technologies are not available in the same quality in Standard ML, nor even in OCaml (which
underlies HOL Light and Coq).
and with the purpose of providing a stable interface to users. However, true preservation requires ensuring
that entries continue to work as Isabelle itself evolves. The AFP deals with this conflict as follows. For each
Isabelle release there is a corresponding AFP release. There is a separate development version of the AFP
that is updated by Isabelle developers and AFP authors: Isabelle developers maintain all entries to be up
to date with the current Isabelle development version; authors can update their articles monotonically by
adding further material while ensuring that all entries that depend on theirs still work.
As Isabelle evolves, the self-imposed requirement to maintain all AFP articles in working order puts a
burden on the Isabelle developers. At the same time, it acts as a reality check, helping developers to evaluate
the impact of their changes on the user community. Isar’s structured proofs assist the maintenance effort by
localising the impact of changes. Sledgehammer is also invaluable when fixing unfamiliar proofs.
The model for the AFP is the Mizar Mathematical Library [Miz, BBG+ 18] which was started in 1989.
The statistics today are similar: 250 authors, 1300 articles, 60,000 theorems, 3,000,000 lines (97 MB) of text.
Of course one has to take into account that the languages and logics are different and Mizar has less proof
automation.
Acknowledgement We thank the referees, Jasmin Blanchette, Cliff Jones, Michael Norrish and Andrei
Popescu for valuable comments on drafts of this paper. The work reported above was funded by the British
EPSRC, the German DFG and various European Union funding agencies.
References
[AH10] Sergio Antoy and Michael Hanus. Functional logic programming. Commun. ACM, 53(4):74–85, 2010.
6 Included in each release is a file entitled CONTRIBUTORS, but it only goes back to 2005 and has many omissions.
From LCF to Isabelle/HOL 21
[Art16] Rob Arthan. On definitions of constants and types in HOL. J. Automated Reasoning, 56(3):205–219, 2016.
[Asp00] David Aspinall. Proof General: A generic tool for proof development. In Susanne Graf and Michael Schwartzbach,
editors, European Joint Conferences on Theory and Practice of Software (ETAPS), volume 1785 of LNCS. Springer,
2000.
[Bal06] Clemens Ballarin. Interpretation of locales in Isabelle: Theories and proof contexts. In Jonathan M. Borwein and
William M. Farmer, editors, Mathematical Knowledge Management, 5th International Conference, MKM 2006,
volume 4108 of Lecture Notes in Computer Science, pages 31–43. Springer, 2006.
[Bal14] Clemens Ballarin. Locales: A module system for mathematical theories. J. Autom. Reasoning, 52(2):123–153,
2014.
[Bar77] J. Barwise. An introduction to first-order logic. In J. Barwise, editor, Handbook of Mathematical Logic, pages
5–46. North-Holland, 1977.
[BBG+ 18] Grzegorz Bancerek, Czeslaw Bylinski, Adam Grabowski, Artur Kornilowicz, Roman Matuszewski, Adam Naumow-
icz, and Karol Pak. The role of the Mizar Mathematical Library for interactive proof development in Mizar. J.
Autom. Reasoning, 61(1-4):9–32, 2018.
[BBH09] Stefan Berghofer, Lukas Bulwahn, and Florian Haftmann. Turning inductive into equational specifications. In
Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel, editors, Theorem Proving in Higher
Order Logics, volume 5674 of LNCS, pages 131–146. Springer, 2009.
[BBN11] Jasmin Christian Blanchette, Lukas Bulwahn, and Tobias Nipkow. Automatic proof and disproof in Isabelle/HOL.
In C. Tinelli and V. Sofronie-Stokkermans, editors, Frontiers of Combining Systems (FroCoS 2011), volume 6989
of LNCS, pages 12–27. Springer, 2011.
[BBP13] Jasmin Christian Blanchette, Sascha Böhme, and Lawrence C. Paulson. Extending Sledgehammer with SMT
solvers. Journal of Automated Reasoning, 51(1):109–128, 2013.
[BC04] Yves Bertot and Pierre Castéran. Interactive Theorem Proving and Program Development: Coq’Art: The Calculus
of Inductive Constructions. Springer, 2004.
[BKH+ 08] Lukas Bulwahn, Alexander Krauss, Florian Haftmann, Levent Erkök, and John Matthews. Imperative functional
programming with Isabelle/HOL. In Otmane Aı̈t Mohamed, César A. Muñoz, and Sofiène Tahar, editors, Theorem
Proving in Higher Order Logics, 21st International Conference, TPHOLs 2008, volume 5170 of Lecture Notes in
Computer Science, pages 134–149. Springer, 2008.
[BL18] Julian Brunner and Peter Lammich. Formal verification of an executable LTL model checker with partial order
reduction. J. Autom. Reasoning, 60(1):3–21, 2018.
[Bla12] Jasmin Christian Blanchette. Automatic Proofs and Refutations for Higher-Order Logic. PhD thesis, Technical
University Munich, 2012.
[Bla13] Jasmin Christian Blanchette. Relational analysis of (co)inductive predicates, (co)algebraic datatypes, and
(co)recursive functions. Software Quality Journal, 21(1):101–126, 2013.
[BM79] Robert S. Boyer and J Strother Moore. A Computational Logic. Academic Press, 1979.
[BN02] Stefan Berghofer and Tobias Nipkow. Executing higher order logic. In P. Callaghan, Z. Luo, J. McKinna, and
R. Pollack, editors, Types for Proofs and Programs (TYPES 2000), volume 2277 of LNCS, pages 24–40. Springer,
2002.
[BN04] Stefan Berghofer and Tobias Nipkow. Random testing in Isabelle/HOL. In J. Cuellar and Z. Liu, editors, Software
Engineering and Formal Methods (SEFM 2004), pages 230–239. IEEE Computer Society, 2004.
[BN10] Jasmin Christian Blanchette and Tobias Nipkow. Nitpick: A counterexample generator for higher-order logic based
on a relational model finder. In Matt Kaufmann and Lawrence C. Paulson, editors, Interactive Theorem Proving,
volume 6172 of LNCS, pages 131–146. Springer, 2010.
[Bul12a] Lukas Bulwahn. Counterexample Generation for Higher-Order Logic using Functional and Logic Programming.
PhD thesis, Technical University Munich, 2012.
[Bul12b] Lukas Bulwahn. The new Quickcheck for Isabelle: Random, exhaustive and symbolic testing under one roof. In
Chris Hawblitzel and Dale Miller, editors, Certified Programs and Proofs, volume 7679 of LNCS, pages 92–108.
Springer, 2012.
[Bul12c] Lukas Bulwahn. Smart testing of functional programs in Isabelle. In Nikolaj Bjørner and Andrei Voronkov, editors,
Logic for Programming, Artificial Intelligence, and Reasoning, volume 7180 of LNCS, pages 153–167. Springer,
2012.
[CDKM11] Harsh Raju Chamarthi, Peter C. Dillinger, Matt Kaufmann, and Panagiotis Manolios. Integrating testing and
interactive theorem proving. In David Hardin and Julien Schmaltz, editors, 10th International Workshop on the
ACL2 Theorem Prover and its Applications, ACL2 2011, volume 70 of EPTCS, pages 4–19, 2011.
[CH00] Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. In Martin
Odersky and Philip Wadler, editors, Fifth ACM SIGPLAN International Conference on Functional Programming
(ICFP ’00), pages 268–279. ACM, 2000.
[CM87] W. F. Clocksin and C. S. Mellish. Programming in Prolog. Springer, 3rd edition, 1987.
[DHT03] P. Dybjer, Q. Haiyan, and M. Takeyama. Combining testing and proving in dependent type theory. In Theorem
Proving in Higher Order Logics, volume 2758 of LNCS, pages 188–203. Springer, 2003.
[DJK+ 18] Jose Divasón, Sebastiaan J. C. Joosten, Ondrej Kuncar, René Thiemann, and Akihisa Yamada. Efficient certifi-
cation of complexity proofs: formalizing the Perron-Frobenius theorem (invited talk paper). In June Andronick
and Amy P. Felty, editors, 7th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP
2018, pages 2–13. ACM, 2018.
[DJTY19] Jose Divasón, Sebastiaan J. C. Joosten, René Thiemann, and Akihisa Yamada. A verified implementation of the
Berlekamp-Zassenhaus factorization algorithm. J. Autom. Reasoning, 2019. published online.
[dMB08] Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In C. Ramakrishnan and Jakob Rehof,
22 Lawrence C. Paulson, Tobias Nipkow and Makarius Wenzel
editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 4963 of Lecture Notes in
Computer Science, pages 337–340. Springer, 2008.
[dMKA+ 15] Leonardo Mendonça de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The
Lean theorem prover (system description). In Amy P. Felty and Aart Middeldorp, editors, Automated Deduction
— CADE-25, volume 9195 of Lecture Notes in Computer Science, pages 378–388. Springer, 2015.
[DW88] Saumya K. Debray and David S. Warren. Automatic mode inference for logic programs. The Journal of Logic
Programming, 5(3):207 – 229, 1988.
[Ebe15] Manuel Eberl. A decision procedure for univariate real polynomials in Isabelle/HOL. In 2015 Conference on
Certified Programs and Proofs, CPP ’15, pages 75–83. ACM, 2015.
[ELN+ 13] Javier Esparza, Peter Lammich, René Neumann, Tobias Nipkow, Alexander Schimpf, and Jan-Georg Smaus. A
fully verified executable LTL model checker. In Natasha Sharygina and Helmut Veith, editors, Computer Aided
Verification - 25th International Conference, CAV 2013, volume 8044 of Lecture Notes in Computer Science, pages
463–478. Springer, 2013.
[FGJM85] Kokichi Futatsugi, Joseph A. Goguen, Jean-Pierre Jouannaud, and José Meseguer. Principles of OBJ2. In 12th
ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), pages 52–66, New York,
NY, USA, 1985. ACM.
[GAB+ 17] Jürgen Giesl, Cornelius Aschermann, Marc Brockschmidt, Fabian Emmes, Florian Frohn, Carsten Fuhs, Jera
Hensel, Carsten Otto, Martin Plücker, Peter Schneider-Kamp, Thomas Ströder, Stephanie Swiderski, and René
Thiemann. Analyzing program termination and complexity automatically with AProVE. J. Autom. Reasoning,
58(1):3–31, 2017.
[GH98] David Griffioen and Marieke Huisman. A comparison of PVS and Isabelle/HOL. In Jim Grundy and Malcolm
Newey, editors, Theorem Proving in Higher Order Logics: TPHOLs ’98, pages 123–142. Springer, 1998.
[GKMB17] Victor B. F. Gomes, Martin Kleppmann, Dominic P. Mulligan, and Alastair R. Beresford. Verifying strong eventual
consistency in distributed systems. Proc. ACM Program. Lang., 1(OOPSLA):109:1–109:28, October 2017.
[GKN15] Adam Grabowski, Artur Kornilowicz, and Adam Naumowicz. Four decades of Mizar. Journal of Automated
Reasoning, 55(3):191–198, Oct 2015.
[GM93] M. J. C. Gordon and T. F. Melham, editors. Introduction to HOL: A Theorem Proving Environment for Higher
Order Logic. Cambridge University Press, 1993.
[GM10] Georges Gonthier and Assia Mahboubi. An introduction to small scale reflection in Coq. J. Formalized Reasoning,
3(2), 2010.
[GMW79] Michael J. C. Gordon, Robin Milner, and Christopher P. Wadsworth. Edinburgh LCF: A Mechanised Logic of
Computation. LNCS 78. Springer, 1979.
[Gog79] Joseph A. Goguen. Some design principles and theory for OBJ-O, a language to express and execute algebraic
specification for programs. In Edward K. Blum, Manfred Paul, and Satoru Takasu, editors, Mathematical Studies
of Information Processing, volume 75 of LNCS, pages 425–473. Springer, 1979.
[Gor86] Michael J. C. Gordon. Why higher-order logic is a good formalism for specifying and verifying hardware. In
G. Milne and P. A. Subrahmanyam, editors, Formal Aspects of VLSI Design, pages 153–177. North-Holland, 1986.
[Gor00] Michael J. C. Gordon. From LCF to HOL: a short history. In Gordon Plotkin, Colin Stirling, and Mads Tofte,
editors, Proof, Language, and Interaction: Essays in Honor of Robin Milner, pages 169–185. MIT Press, 2000.
[Gor15] M. J. C. Gordon. Tactics for mechanized reasoning: A commentary on Milner (1984) ‘The use of machines to assist
in rigorous proof’. Philosophical Transactions of the Royal Society. Series A, 373(2039), 2015.
[HAB+ 17] Thomas Hales, Mark Adams, Gertrud Bauer, Tat Dat Dang, John Harrison, Le Truong Hoang, Cezary Kaliszyk,
Victor Magron, Sean Mclaughlin, Tat Thang Nguyen, et al. A formal proof of the Kepler conjecture. Forum of
Mathematics, Pi, 5:e2, 2017.
[Haf09] Florian Haftmann. Code Generation from Specifications in Higher Order Logic. PhD thesis, Technische Universität
München, 2009.
[Har96] John Harrison. HOL Light: A tutorial introduction. In Mandayam K. Srivas and Albert John Camilleri, editors,
Formal Methods in Computer-Aided Design: FMCAD ’96, LNCS 1166, pages 265–269. Springer, 1996.
[HHP93] Robert Harper, Furio Honsell, and Gordon Plotkin. A framework for defining logics. Journal of the ACM,
40(1):143–184, 1993.
[HIH13] Johannes Hölzl, Fabian Immler, and Brian Huffman. Type classes and filters for mathematical analysis in Isa-
belle/HOL. In S. Blazy, C. Paulin-Mohring, and D. Pichardie, editors, Interactive Theorem Proving — 4th Inter-
national Conference, LNCS 7998, pages 279–294. Springer, 2013.
[HKKN13] Florian Haftmann, Alexander Krauss, Ondřej Kunčar, and Tobias Nipkow. Data refinement in Isabelle/HOL.
In S. Blazy, C. Paulin-Mohring, and D. Pichardie, editors, Interactive Theorem Proving — 4th International
Conference, volume 7998 of LNCS, pages 100–115. Springer, 2013.
[HKMS17] Warren A Hunt, Jr, Matt Kaufmann, J Strother Moore, and Anna Slobodova. Industrial hardware and software
verification with ACL2. Philosophical transactions of the Royal Society. Series A, 375(2104), 2017.
[HN10] Florian Haftmann and Tobias Nipkow. Code generation via higher-order rewrite systems. In M. Blume,
N. Kobayashi, and G. Vidal, editors, Functional and Logic Programming (FLOPS 2010), volume 6009 of LNCS,
pages 103–117. Springer, 2010.
[HN18] Lars Hupel and Tobias Nipkow. A verified compiler from Isabelle/HOL to CakeML. In A. Ahmed, editor, European
Symposium on Programming (ESOP 2018), volume 10801 of LNCS, pages 999–1026. Springer, 2018.
[HO82] Christoph M. Hoffmann and Michael J. O’Donnell. Programming with equations. ACM Trans. Program. Lang.
Syst., 4(1):83–112, 1982.
[Hol97] Gerard J. Holzmann. The model checker SPIN. IEEE Trans. Software Eng., 23(5):279–295, 1997.
[Hue75] G. P. Huet. A unification algorithm for typed λ-calculus. Theoretical Computer Science, 1:27–57, 1975.
From LCF to Isabelle/HOL 23
[HW06] Florian Haftmann and Makarius Wenzel. Constructive type classes in Isabelle. In Thorsten Altenkirch and Conor
McBride, editors, Types for Proofs and Programs, volume 4502 of LNCS, pages 160–174. Springer, 2006.
[HW09] Florian Haftmann and Makarius Wenzel. Local theory specifications in Isabelle/Isar. In Stefano Berardi, Ferruccio
Damiani, and Ugo de Liguoro, editors, Types for Proofs and Programs, TYPES 2008, volume 5497 of LNCS.
Springer, 2009.
[Jac06] Daniel Jackson. Software Abstractions. Logic, Language, and Analysis. MIT Press, 2006.
[KAE+ 10] Gerwin Klein, June Andronick, Kevin Elphinstone, Gernot Heiser, David Cock, Philip Derrin, Dhammika Elkaduwe,
Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. sel4: Formal
verification of an operating-system kernel. Communications of the ACM, 53(6):107–115, June 2010.
[Kal91] Sara Kalvala. HOL around the world. In M. Archer, J. J. Joyce, K. N. Levitt, and P. J. Windley, editors,
International Workshop on the HOL Theorem Proving System and its Applications, pages 4–12. IEEE Computer
Society, 1991.
[KMNO14] Ramana Kumar, Magnus O. Myreen, Michael Norrish, and Scott Owens. CakeML: a verified implementation of
ML. In Suresh Jagannathan and Peter Sewell, editors, The 41st Annual ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL ’14, pages 179–192. ACM, 2014.
[KP18] Ondřej Kunčar and Andrei Popescu. Safety and conservativity of definitions in HOL and Isabelle/HOL. PACMPL,
2(POPL):24:1–24:26, 2018.
[KP19] Ondřej Kunčar and Andrei Popescu. A consistent foundation for Isabelle/HOL. J. Automated Reasoning,
62(4):531–555, 2019.
[KWP99] Florian Kammüller, Markus Wenzel, and Lawrence C. Paulson. Locales: A sectioning concept for Isabelle. In
Y. Bertot, G. Dowek, A. Hirschowitz, C. Paulin, and L. Thery, editors, Theorem Proving in Higher Order Logics:
TPHOLs ’99, volume 1690 of LNCS. Springer, 1999.
[Lam13] Peter Lammich. Automatic data refinement. In Sandrine Blazy, Christine Paulin-Mohring, and David Pichardie,
editors, Interactive Theorem Proving - 4th International Conference, ITP 2013, volume 7998 of Lecture Notes in
Computer Science, pages 84–99. Springer, 2013.
[Lam14] Peter Lammich. Verified efficient implementation of Gabow’s strongly connected component algorithm. In Gerwin
Klein and Ruben Gamboa, editors, Interactive Theorem Proving — 5th International Conference, ITP 2014,
volume 8558 of Lecture Notes in Computer Science, pages 325–340. Springer, 2014.
[Lam16] Peter Lammich. Refinement based verification of imperative data structures. In Jeremy Avigad and Adam Chlipala,
editors, 5th ACM SIGPLAN Conference on Certified Programs and Proofs, pages 27–36. ACM, 2016.
[Lam17] Peter Lammich. Efficient verified (UN)SAT certificate checking. In Leonardo de Moura, editor, Automated Deduc-
tion — CADE-26, volume 10395 of Lecture Notes in Computer Science, pages 237–254. Springer, 2017.
[Lam19] Peter Lammich. Refinement to imperative HOL. J. Autom. Reasoning, 62(4):481–503, 2019.
[LB11] Andreas Lochbihler and Lukas Bulwahn. Animating the formalised semantics of a Java-like language. In Marko C.
J. D. van Eekelen, Herman Geuvers, Julien Schmaltz, and Freek Wiedijk, editors, Interactive Theorem Proving -
Second International Conference, ITP 2011, volume 6898 of Lecture Notes in Computer Science, pages 216–232.
Springer, 2011.
[Ler09] Xavier Leroy. A formally verified compiler back-end. J. Automated Reasoning, 43:363–446, 2009.
[LPP19] Wenda Li, Grant Olney Passmore, and Lawrence C. Paulson. Deciding univariate polynomial problems using
untrusted certificates in Isabelle/HOL. J. Autom. Reasoning, 62(1):69–91, 2019.
[LPY97] Kim Guldstrand Larsen, Paul Pettersson, and Wang Yi. UPPAAL in a nutshell. STTT, 1(1-2):134–152, 1997.
[LS19] Peter Lammich and S. Reza Sefidgar. Formalizing network flow algorithms: A refinement approach in isabelle/hol.
J. Autom. Reasoning, 62(2):261–280, 2019.
[LT12] Peter Lammich and Thomas Tuerk. Applying data refinement for monadic programs to Hopcroft’s algorithm. In
Lennart Beringer and Amy P. Felty, editors, Interactive Theorem Proving - Third International Conference, ITP
2012, volume 7406 of Lecture Notes in Computer Science, pages 166–182. Springer, 2012.
[Mil85] Robin Milner. The use of machines to assist in rigorous proof. In C. A. R. Hoare and J. C. Shepherdson, editors,
Mathematical Logic and Programming Languages, pages 77–88. Prentice-Hall, 1985.
[Mil91] Dale Miller. A logic programming language with lambda-abstraction, function variables, and simple unification.
J. Log. Comput., 1(4):497–536, 1991.
[Miz] The Mizar Mathematical Library. http://mizar.org.
[ML84] P. Martin-Löf. Constructive mathematics and computer programming. Philosophical Transactions of the Royal
Society. Series A, 312(1522):501–518, 1984.
[MMW16] Daniel Matichuk, Toby C. Murray, and Makarius Wenzel. Eisbach: A proof method language for Isabelle. Journal
of Automated Reasoning, 56(3), 2016.
[MQP06] Jia Meng, Claire Quigley, and Lawrence C. Paulson. Automation for interactive proof: First prototype. Information
and Computation, 204(10):1575–1596, 2006.
[MW10] D. Matthews and M. Wenzel. Efficient parallel programming in Poly/ML and Isabelle/ML. In ACM SIGPLAN
Workshop on Declarative Aspects of Multicore Programming (DAMP 2010), 2010.
[Nip91a] Tobias Nipkow. Higher-order critical pairs. In Proc. 6th IEEE Symp. Logic in Computer Science, pages 342–349.
IEEE Press, 1991.
[Nip91b] Tobias Nipkow. Higher-order unification, polymorphism, and subsorts. In S. Kaplan and M. Okada, editors, Proc.
2nd Int. Workshop Conditional and Typed Rewriting Systems, volume 516 of LNCS. Springer, 1991.
[Nip93a] Tobias Nipkow. Functional unification of higher-order patterns. In Proc. 8th IEEE Symp. Logic in Computer
Science, pages 64–74, 1993.
[Nip93b] Tobias Nipkow. Order-sorted polymorphism in Isabelle. In Gérard Huet and Gordon Plotkin, editors, Logical
Environments, pages 164–188. Cambridge Uiversity Press, 1993.
24 Lawrence C. Paulson, Tobias Nipkow and Makarius Wenzel
[NK14] Tobias Nipkow and Gerwin Klein. Concrete Semantics with Isabelle/HOL. Springer, 2014. 298 pp.
http://concrete-semantics.org.
[NM16] Julian Nagele and Aart Middeldorp. Certification of classical confluence results for left-linear term rewrite sys-
tems. In Jasmin Christian Blanchette and Stephan Merz, editors, Interactive Theorem Proving - 7th International
Conference, ITP 2016, volume 9807 of Lecture Notes in Computer Science, pages 290–306. Springer, 2016.
[NO80] Greg Nelson and Derek C. Oppen. Fast decision procedures based on congruence closure. Journal of the ACM,
27(2):356–364, 1980.
[NP92] Tobias Nipkow and Lawrence C. Paulson. Isabelle-91. In Deepak Kapur, editor, Automated Deduction — CADE-
11, volume 607 of LNCS, pages 673–676. Springer, 1992.
[NP93] Tobias Nipkow and Christian Prehofer. Type checking type classes. In Principles of Programming Languages,
POPL ’93, pages 409–418, New York, NY, USA, 1993. ACM.
[NP98] Tobias Nipkow and Christian Prehofer. Higher-order rewriting and equational reasoning. In W. Bibel and
P. Schmitt, editors, Automated Deduction — A Basis for Applications. Volume I: Foundations, volume 8 of
Applied Logic Series, pages 399–430. Kluwer, 1998.
[NPW02] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL: A Proof Assistant for Higher-Order
Logic. Springer, 2002. Online at http://isabelle.in.tum.de/dist/Isabelle/doc/tutorial.pdf.
[NS91] Tobias Nipkow and Gregor Snelting. Type classes and overloading resolution via order-sorted unification. In
J. Hughes, editor, Proc. 5th ACM Conf. Functional Programming Languages and Computer Architecture, volume
523 of LNCS, pages 1–14. Springer, 1991.
[Obu06] Steven Obua. Checking conservativity of overloaded definitions in higher-order logic. In Frank Pfenning, editor,
Term Rewriting and Applications, volume 4098 of LNCS, pages 212–226. Springer, 2006.
[O’D77] Michael J. O’Donnell. Computing in Systems Described by Equations, volume 58 of LNCS. Springer, 1977.
[Ove75] R. Overbeek. An implementation of hyper-resolution. Computers and Mathematics with Applications, 1:201–214,
1975.
[Owr06] Sam Owre. Random testing in PVS. In Workshop on Automated Formal Methods (AFM), 2006. Online at
http://fm.csl.sri.com/AFM06/papers/5-Owre.pdf.
[Pau86] Lawrence C. Paulson. Natural deduction as higher-order resolution. Journal of Logic Programming, 3:237–258,
1986.
[Pau89] Lawrence C. Paulson. The foundation of a generic theorem prover. J. Automated Resoning, 5(3):363–397, 1989.
[Pau90] Lawrence C. Paulson. Isabelle: The next 700 theorem provers. In P. Odifreddi, editor, Logic and Computer Science,
pages 361–386. Academic Press, 1990.
[Pau93] Lawrence C. Paulson. Isabelle’s object-logics. Technical Report 286, Cambridge University Computer Laboratory,
1993.
[Pau94] Lawrence C. Paulson. Isabelle - A Generic Theorem Prover (with contributions by T. Nipkow), volume 828 of
Lecture Notes in Computer Science. Springer, 1994.
[Pau98] Lawrence C. Paulson. The inductive approach to verifying cryptographic protocols. Journal of Computer Security,
6(1-2):85–128, 1998.
[Pau99] Lawrence C. Paulson. A generic tableau prover and its integration with Isabelle. Journal of Universal Computer
Science, 5(3):73–87, 1999.
[Pau03] Lawrence C. Paulson. The relative consistency of the axiom of choice — mechanized using Isabelle/ZF. LMS
Journal of Computation and Mathematics, 6:198–248, 2003. http://www.lms.ac.uk/jcm/6/lms2003-001/.
[Pau04] Lawrence C. Paulson. Organizing numerical theories using axiomatic type classes. J. Automated Reasoning,
33(1):29–49, 2004.
[Pau18] Lawrence C. Paulson. Computational logic: Its origins and applications. Proceedings of the Royal Society of London
A: Mathematical, Physical and Engineering Sciences, 474(2210), 2018.
[Pel96] Doron A. Peled. Combining partial order reductions with on-the-fly model-checking. Formal Methods in System
Design, 8(1):39–64, 1996.
[PG96] Lawrence C. Paulson and Krzysztof Grabczewski. Mechanizing set theory: Cardinal arithmetic and the axiom of
choice. J. Automated Reasoning, 17(3):291–323, December 1996.
[PHD+ 15] Zoe Paraskevopoulou, Catalin Hritcu, Maxime Dénès, Leonidas Lampropoulos, and Benjamin C. Pierce. Foun-
dational property-based testing. In Christian Urban and Xingyuan Zhang, editors, Interactive Theorem Proving,
volume 9236 of LNCS, pages 325–343. Springer, 2015.
[PS07] Lawrence C. Paulson and Kong Woei Susanto. Source-level proof reconstruction for interactive theorem proving.
In Klaus Schneider and Jens Brandt, editors, Theorem Proving in Higher Order Logics: TPHOLs 2007, LNCS
4732, pages 232–245. Springer, 2007.
[Raj93] Sreeranga P. Rajan. Executing HOL specifications: Towards an evaluation semantics for classical higher order
logic. In Luc J. M. Claesen and Michael J. C. Gordon, editors, Higher Order Logic Theorem Proving and its
Applications, volume A-20 of IFIP Transactions, pages 527–536. North-Holland/Elsevier, 1993.
[RNL08] Colin Runciman, Matthew Naylor, and Fredrik Lindblad. SmallCheck and lazy SmallCheck: Automatic exhaustive
testing for small values. In Andy Gill, editor, Proc. 1st ACM SIGPLAN Symposium on Haskell, pages 37–48.
ACM, 2008.
[Rob65] J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23–41, 1965.
[RV02] Alexandre Riazanov and Andrei Voronkov. The design and implementation of VAMPIRE. AI Communications,
15(2):91–110, 2002.
[Sch04] Stephan Schulz. System description: E 0.81. In David Basin and Michaël Rusinowitch, editors, Automated Rea-
soning — Second International Joint Conference, IJCAR 2004, LNAI 3097, pages 223–228. Springer, 2004.
From LCF to Isabelle/HOL 25
[Sha02] Natarajan Shankar. Little engines of proof. In L.-H. Eriksson and P. Lindsay, editors, FME 2002: Formal Methods
- Getting IT Right : International Symposium of Formal Methods Europe, LNCS 2391, pages 1–20. Springer, 2002.
[Sie19] Stephen F. Siegel. What’s wrong with on-the-fly partial order reduction. In I. Dillig and S. Tasiran, editors,
Computer Aided Verification (CAV 2019), LNCS. Springer, 2019.
[SK07] Alexander Spiridonov and Sarfraz Khurshid. Automatic generation of counterexamples for ACL2 using Alloy. In
Seventh Int. Workshop on the ACL2 Theorem prover and its Applications, 2007.
[SN08] Konrad Slind and Michael Norrish. A brief overview of HOL4. In Otmane Aı̈t Mohamed, César Muñoz, and Sofiène
Tahar, editors, Theorem Proving in Higher Order Logics, TPHOLs 2008, pages 28–32, 2008.
[Sum02] Rob Sumners. Checking ACL2 theorems via SAT checking. In Third Int. Workshop on the ACL2 Theorem prover
and its Applications, 2002.
[TJ07] Emina Torlak and Daniel Jackson. Kodkod: A relational model finder. In Orna Grumberg and Michael Huth,
editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 4424 of LNCS, pages 632–647.
Springer, 2007.
[TS09] René Thiemann and Christian Sternagel. Certification of termination proofs using CeTA. In Stefan Berghofer,
Tobias Nipkow, Christian Urban, and Makarius Wenzel, editors, Theorem Proving in Higher Order Logics, 22nd
International Conference, TPHOLs 2009, volume 5674 of Lecture Notes in Computer Science, pages 452–468.
Springer, 2009.
[Web05] Tjark Weber. Bounded model generation for Isabelle/HOL. In Wolfgang Ahrendt, Peter Baumgartner, Hans
de Nivelle, Silvio Ranise, and Cesare Tinelli, editors, Selected Papers from the Workshops on Disproving and the
Second International Workshop on Pragmatics of Decision Procedures (PDPAR 2004), volume 125(3) of Electronic
Notes in Theoretical Computer Science, pages 103–116. Elsevier, July 2005.
[Web08] Tjark Weber. SAT-based finite model generation for higher-order logic. PhD thesis, Technical University Munich,
Germany, 2008.
[Wei01] Christoph Weidenbach. Combining superposition, sorts and splitting. In Alan Robinson and Andrei Voronkov,
editors, Handbook of Automated Reasoning, volume II, chapter 27, pages 1965–2013. Elsevier Science, 2001.
[Wen97] Markus Wenzel. Type classes and overloading in higher-order logic. In Theorem Proving in Higher Order Logics,
volume 1275 of LNCS, pages 307–322. Springer, 1997.
[Wen07] Makarius Wenzel. Isabelle/Isar — a generic framework for human-readable proof documents. Studies in Logic,
Grammar, and Rhetoric, 10(23):277–297, 2007. From Insight to Proof — Festschrift in Honour of Andrzej Trybulec.
[Wen11] M. Wenzel. Isabelle as document-oriented proof assistant. In J. H. Davenport et al., editors, Conference on
Intelligent Computer Mathematics (CICM 2011), volume 6824 of LNAI. Springer, 2011.
[Wen13a] Makarius Wenzel. READ-EVAL-PRINT in parallel and asynchronous proof-checking. In Cezary Kaliszyk and
Christoph Lüth, editors, User Interfaces for Theorem Provers (UITP 2012), volume 118 of Electronic Proceedings
in Theoretical Computer Science, 2013.
[Wen13b] Makarius Wenzel. Shared-memory multiprocessing for interactive theorem proving. In Sandrine Blazy, Christine
Paulin-Mohring, and David Pichardie, editors, Interactive Theorem Proving (ITP 2013), volume 7998 of Lecture
Notes in Computer Science. Springer, 2013.
[Wen14] Makarius Wenzel. Asynchronous user interaction and tool integration in Isabelle/PIDE. In Gerwin Klein and
Ruben Gamboa, editors, Interactive Theorem Proving (ITP 2014), volume 8558 of LNCS. Springer, 2014.
[Wen19] Makarius Wenzel. Interaction with formal mathematical documents in Isabelle/PIDE. In Cezary Kaliszyk, Edwin
Brady, Andrea Kohlhase, and Claudio Sacerdoti Coen, editors, Intelligent Computer Mathematics (CICM 2019),
volume 11617 of LNAI. Springer, 2019. https://arxiv.org/abs/1905.01735.
[Wie01] Freek Wiedijk. Mizar Light for HOL Light. In Richard J. Boulton and Paul B. Jackson, editors, Theorem Proving
in Higher Order Logics, TPHOLs 2001, pages 378–393, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
[WL18] Simon Wimmer and Peter Lammich. Verified model checking of timed automata. In Dirk Beyer and Marieke
Huisman, editors, Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2018, volume
10805 of Lecture Notes in Computer Science, pages 61–78. Springer, 2018.
[Woo18] Charlie Wood. The strange numbers that birthed modern algebra. Online at https://www.quantamagazine.org/
the-strange-numbers-that-birthed-modern-algebra-20180906/, September 2018.