Comparison of Two Theorem Provers: Isabelle/HOL and Coq
Comparison of Two Theorem Provers: Isabelle/HOL and Coq
Comparison of Two Theorem Provers: Isabelle/HOL and Coq
Artem Yushkovskiy1,2
arXiv:1808.09701v2 [cs.LO] 6 Sep 2018
Autumn 2017
Abstract
The need for formal definition of the very basis of mathematics arose in the last century. The scale and
complexity of mathematics, along with discovered paradoxes, revealed the danger of accumulating errors
across theories. Although, according to Gödel’s incompleteness theorems, it is not possible to construct
a single formal system which will describe all phenomena in the world, being complete and consistent
at the same time, it gave rise to rather practical areas of logic, such as the theory of automated theorem
proving. This is a set of techniques used to verify mathematical statements mechanically using logical
reasoning. Moreover, it can be used to solve complex engineering problems as well, for instance, to prove
the security properties of a software system or an algorithm. This paper compares two widespread tools
for automated theorem proving, Isabelle/HOL [1] and Coq [2], with respect to expressiveness, limitations
and usability. For this reason, it firstly gives a brief introduction to the bases of formal systems and
automated deduction theory, their main problems and challenges, and then provides detailed comparison
of most notable features of the selected theorem provers with support of illustrative proof examples.
KEYWORDS: proof assistants, Coq, Isabelle/HOL, logics, proof theory, formal method, classical logic,
intuitionistic logic, usability.
1 Introduction
Nowadays, the search for foundations of mathematics has become one of the key questions in
philosophy of mathematics, which eventually has an impact on numerous problems in modern
life. As a result, the formal approach was developed as a new methodology for manipulating the
abstract essences in a verifiable way. In other words, it is possible to follow the sequence of such
manipulations in order to check the validity of each statement and, as a result, of a system at
whole. Moreover, automating such a verification process can significantly increase reliability of
formal models and systems based on them.
At present, a large number of tools have been developed to automate this process. Generally,
these tools can be divided into two broad classes. The first class contains tools pursuing the aim of
1
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
validating the input statement (theorem) with respect to the sequence of inference transitions (user-
defined proof ) according to set of inference rules (defined by logic). Such tools are sometimes
called proof assistants, their purpose is to help users to develop new proofs. The tools Isabelle [1],
Coq [2], PVS [3] are well-known examples of such systems, which are commonly used in recent
years.
The second class consists of tools that automatically discover the formal proof, which can rely
either on induction, on meta argument, or on higher-order logic. Such tools are often called
automated theorem provers, they apply techniques of automated logical reasoning to develop the
proof automatically. The systems Otter [4] and ACL2 [5] are commonly known examples of such
tools. In this paper, only systems of the first class were considered in order to test the usability
of such systems.
This paper is organised as follows. Section 2 describes basic foundations of logic necessary
for understanding theorem provers. In particular, Section 2.1 provides formal definition, Sec-
tions 2.2–2.4 describe different types, basic properties and theoretical limitations of formal sys-
tems. Section 3 presents the comparison itself and provides the illustrative examples of different
kinds of proofs in both considering systems.
2
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
The formulas consist of propositional variables, connected with logical connectives (or logical
operators) according to rules, defined by a formal language. The formulas, which satisfy such
rules, are called well-formed formulas (wff). Only wff can form judgements in a formal system.
The propositional variable is an atomic formula that can be claimed as either true or false. The
logical connective is a symbol in formal language that transforms one wff to another. Typically,
the set of logical connectives contains negation ¬, conjunction ∧, disjunction ∨, and implication
→ operators, although the combination of negation operator with any other of aforementioned
operators will be already functionally complete (i.e., any formula can be represented with the
usage of these two logical connectives).
The formal system described above does not contain any restriction on the form of proposi-
tional variables, such logic is called propositional logic. However, if these variables are quantified
on the sets, such logic is called first-order or predicate logic. Commonly, first-order logic operates
with two quantifiers, the universal quantifier ∀ and the existential quantifier ∃. Thereafter, the
second-order logic extends it by adding quantifiers over first-order quantified sets — relations defin-
ing the sets of sets. In turn, it can be extended by the higher-order logic, which contain quantifiers
over the arbitrary nested sets (for instance, the expression ∀ f : bool → bool, f ( f ( f x )) = f x could
be considered in higher-order logic), or the type theory, which assigns a type for every expression
in the formal language (see Section 2.4).
A formal system determines the set of derivable formulas (judgements that are provable with
respect to the rules of formal system). Let Φ be a set of formulas. Initially, it only consists of
hypotheses, a priori true formulas, which are claimed to be already proved. The notation Φ ⊢ φ
means that the formula φ is provable from Φ, if there exists a proof that infers φ from Φ. The
formula which is provable without additional premises is called tautology and denoted as ⊢ φ
(meaning ∅ ⊢ φ). The formula is called contradiction if ⊢ ¬φ. Obviously, all contradictions are
equivalent in one formal system, they are denoted as ⊥.
In current paper, the notation (1), which was borrowed from the Isabelle documentation,
will be used for expressing the rules of inference. In this notation, the sign =⇒ means logical
implication, which is right-associative, see formula (3). This notation is equivalent to the standard
notation (2):
[[ A1 ; A2 ; . . . An ]] =⇒ B (1)
≡ { A1 , A2 , . . . A n } ⊢ B (2)
A1 =⇒ A2 =⇒ · · · =⇒ An =⇒ B
(3)
≡ A1 =⇒ ( A2 =⇒ (· · · =⇒ ( An =⇒ B)))
The formulas below describe the principal inference rule residing in most logic systems, the
Modus ponens (MP) rule, and two main axioms of classical logic:
[[ A, A =⇒ B]] =⇒ B (MP)
A =⇒ ( B =⇒ A). (A1)
( A =⇒ ( B =⇒ C )) =⇒ (( A =⇒ B) =⇒ ( A =⇒ C )). (A2)
Together with axioms (A1) and (A2), Modus ponens rule forms the Hilbert proof system
which can process statements of classical propositional logic. Other classical logic systems of-
ten include the axiom of excluded middle (EM), and may derive the double negation introduc-
3
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
A ∨ ¬ A. (EM)
A =⇒ ¬¬ A (DNi)
¬¬ A =⇒ A (DNe)
Many classical logics may derive the de Morgan’s laws (DM1), (DM2), the law of contraposi-
tion (CP), the Peirce’s law (PL) and many other tautologies:
¬( A ∧ B) ⇐⇒ ¬ A ∨ ¬ B (DM1)
¬( A ∨ B) ⇐⇒ ¬ A ∧ ¬ B (DM2)
( A → B) =⇒ (¬ B → ¬ A) (CP)
(( A → B) → A) =⇒ B (PL)
The axiom of excluded middle means that every logical statement is decidable, which might
not be true in some applications. Adding this axiom to the formal system leads to the reasoning
from truth statements, in contrast to natural deduction systems that use reasoning from assumptions.
Although the difference between these two kinds of formal systems seems to be subtle, the latter
can be used more as framework, allowing to build new systems on the logical base of pre-defined
premises and formal proof rules.
• consistent, if both formula and its negation can not be proved in the system:
∄φ ∈ Γ : Γ ⊢ φ ∧ Γ ⊢ ¬φ ⇔ Γ 0 ⊥;
• complete, if all true statements can be inferred:
∀φ ∈ U : A ⊢ φ ∨ A ⊢ ¬φ ;
• independent, if no axiom can be inferred from another:
∄a ∈ A : A ⊢ a.
For instance, the Hilbert system described above is consistent and independent, yet incom-
plete under the classical semantics. In 1931, Kurt Gödel proved his first incompleteness theorem
which states that any consistent formal system is incomplete. Later, in 1936, Alfred Tarski ex-
tended this result by proving his Undefinability theorem, which states that the concept of truth
cannot be defined in a formal system. In that case, modern tools, such as Coq, often restrict
propositions to be either provable or unprovable, rather than true or false.
2.3 Lambda-calculus
The necessity of building the automatic reasoning systems has lead to development of models
that abstract the computation process. That time, the concept of effective computability was
being evolving rapidly, causing development of multiple formalisations of computation, such
as Turing Machine, Normal Markov algorithms, Recursive functions, and other. One of the fist
and most effective models was λ-calculus invented by Alonzo Church in 1930s. This formalism
provides solid theoretical foundation for the family of functional programming languages [8]. In
4
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
λ-calculus, functions are first-order objects, which means functions can be applied as arguments
to other functions.
The central concept in λ-calculus is an expression, which can be defined as a subject for appli-
cation the rewriting rules [9]. The basic rewriting rules of λ-calculus are listed below:
λ-calculus described above is called the type-free λ-calculus. The more strong calculi can be
constructed by using the types of expressions to the system, for which some useful properties
can be proven (e.g., termination and memory safety) [10].
The Martin-Löf type theory, also known as the Intuitionistic type theory1 , is based on the principles
of constructive mathematics, that require explicit definition of the way of "constructing" an object
in order to prove its existence. Therefore, an important place in intuitionistic type theory is held
by the inductive types, which were constructed recursively using a basic type (zero) and successor
function which defines "next" element.
The Intuitionistic type theory also uses a wide class of dependent types, whose definition de-
pends on a value. For instance, the n-ary tuple is a dependent type that is defined by the value
of n. However, the type checking for such a system is an undecidable problem since determining
of the equality of two arbitrary dependent types turns to be tantamount to a problem of inducing
the equivalence of two non-trivial programs (which is undecidable in general case according to
the Rice’s theorem [14]).
1 In this paper, in terms intuitionistic type theory and intuitionistic logic, the word intuitionistic is used as a synonym for
constructive.
5
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
Higher-Order Logic (HOL), first-order logic theories such as Zermelo-Fraenkel Set Theory (ZF), Classical Computational
Logic (CCL), etc. In this paper, the Isabelle/HOL has been considered as the startpoint for exploring the power of this
proof assistant.
6
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
describe logic behind Isabelle. In particular, the theory of higher-order logic is implemented as
Isabelle/HOL, and it is commonly used because of its expressivity and relative conciseness.
Isabelle exploits classical logic, so even propositional type is declared as a set of two elements
true and false (thus any n-ary logic can be easily formalised). In proofs, Isabelle combines
several languages: HOL as a functional programming language (which must be always in quotes),
and Isar as the language for describing procedures in order to manipulate the proof.
7
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
programs, including the general-purpose operating system kernel seL4 (2009) [21], the C stan-
dard (2015) [22], and others.
Both Isabelle and Coq have their own Integrated Development Environment (IDE) to work
in (gtk-based CoqIDE and jEdit Prover IDE, respectively). In general, both native IDEs of these
theorem provers provide the facility for interactive executing scripts step-by-step while preserv-
ing the state of proof (environment), which for each step describes the set of premises along with
already proved statements (context) and the set of statements to be proven (goals). However, Is-
abelle’s native IDE allows to change the proof state arbitrarily, in contrast to the CoqIDE, which
provides only the capability of switching the proof state to backward or forward linearly. Alter-
natively, both considering theorem provers have numerous of plugins for many popular IDEs, for
instance, the Proof General [23] is a plugin for Emacs, which supports numerous proof assistants.
During the work on this paper, only native IDEs of each proof assistant have been used in order
to minimize the impact of third-party tools to the research.
Both systems accept proofs written in an imperative fashion (forward proof ), i.e., such proof
represents a sequence of tactic calls, that implicitly change the proof state at each step, com-
pounded by the control-flow operators called tacticals, that combine tactics together, separate
their results, repeat calls, etc. In addition, the syntax of Isar permits writing goals explicitly in
the proof (backward proof, see Appendix A.6 Fig. 19 and Fig.15).
Figure 1: Proof failure of the (DNe) rule in Figure 2: Proof of the (DNi) rule in Coq
Coq
In addition, the double-negated axiom of excluded middle can be proven as well solely in
intuitionistic logic, see Appendix A.2 Fig.11. This is a way for embedding the classical proposi-
tional logic into intuitionistic logic and known as Glivenko’s double-negation translation [24], which
maps all classical tautologies to intuitionistic ones by double-negating them. Furthermore, there
are other schemes of the translation for other classical logics, such as Gödel-Gentzen translation,
Kuroda’s translation, etc. [25].
Therefore, numerous of theorems, such as the classical logic tautology Peirce’s law (PL), can
not be proved in intuitionistic logic, while being valid in classical logic, which makes the latter
8
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
strictly weaker [26] and incomplete (Coq’s tactic for automatic reasoning of propositional state-
ments tauto fails to prove this automatically).
In classical logic, some proofs remain valid, yet completely inapplicable. For instance, the
following non-constructive proof of the statement "there exist algebraic irrational numbers x and y
such that x y is rational" may serve as a classic example of it. The proof relies on the axiom of
√ √2 √
excluded middle [27]. Consider the number √ √ x =
2 ; if it is rational, then consider 2 and
√ √ 2 √ 2 √
y = 2, which both are irrational; if 2 is irrational, then consider x = 2 and y = 2,
so that x y is rational, q.e.d. Although this proof is clear and concise, it reveals no information
√ √2
about whether the number 2 is rational or irrational. More importantly, it gives no algorithm
for finding other such numbers. Therefore, the main purpose of constructive proofs is to define
such a solution schema for a problem, in addition to proving the claim. Commonly, the proofs of
existence3 of an element are non-constructive as in order to prove such a statement it is enough
to find single valid example.
9
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
the type nat, the simple formula 2 · Sn = n · (n + 1) for sum Sn of first n integer numbers has been
chosen (see proof in Isabelle in Appendix A.6 Fig.19, see proof in Coq in Appendix A.6 Fig.20).
Note that the proof in Coq uses the library Coq.omega.Omega, which contains powerful tactics to
simplifying and proving natural numbers formulas.
range_sum :: Nat -> Nat (** val range_sum : nat -> nat **)
range_sum n =
case n of { let rec range_sum = function
O -> O; | O -> O
S p -> add (range_sum p) (S p)} | S p -> add (range_sum p) (S p)
10
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
• Expressiveness of syntax:
◦ Both systems have the built-in powerful functional languages, which can be used to define
complex recursive structures;
◦ Both systems accept forward proofs (written in imperative style as a sequence of tactics
calls). This method may seem non-natural mathematically, as the search for proof is being
performed "blindly", preserving the goal of the implicitly;
◦ In contrast to Coq, the backward proof supported by Isabelle firstly states the target goal
explicitly for every tactic (with keywords show, have, assume, etc.), so that the proof become
much more readable, yet it requires more time to be written.
• Usability of the syntax:
◦ Although Isabelle recognises common mathematical ASCII symbols in proof which makes
it much more readable, it may seem inconvenient to use them within IDE (e.g., character ∀
is incoded as \<forall>, ∑ as \<Sum>, etc.);
◦ The syntax of Coq is closer to the syntax of a programming language rather than mathemat-
ics, apparently it was designed for convenient work with a keyboard.
• Usability of the native IDE:
◦ The authours are inclined to consider the Isabelle’s jEdit Prover IDE more user-friendly
as the whole proof is being recompiled every time user changes the syntax tree, which
facilitates user to acquire the proof state for any arbitrary step of the proof;
◦ In contrast, the CoqIDE can change the proof state backward and forward linearly, which
however implies less system overload.
• Additional comparison information:
◦ Coq has an essential feature that distincts it from most other theorem provers: it can ex-
tract the verified code for which compliance with the specification have been proved in a
constructive way. This encourages using Coq as a software verification tool.
4 Future work
Although this paper does not pretend to give a fully exhaustive comparative analysis of two
such complex systems as Coq and Isabelle, the authors hope it will help users without advanced
background in mathematics to be involved into the work with proof assistants more quickly and
easily. In future, this paper tends to be a foundation for more advanced survey of automatic tools
used in software verification.
Acknowledgements
I wish to thank Prof. Stavros Tripakis for letting me dive into the exciting world of Logic, for pro-
viding feedback on my paper at all stages of the work, for answering all my countless questions
and supporting me.
References
[1] “Isabelle, a generic proof assistant.” https://www.cl.cam.ac.uk/research/hvg/Isabelle/.
11
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
12
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
Appendices
A.1 Basic type definitions
(* In Coq, False is an unobservable proposition, which
is defined as a propositional type without constructor *)
Inductive False : Prop := .
Figure 9: Definition of addition over nat Figure 10: Definition of addition over nat
in Isabelle in Coq
13
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
14
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
Theorem DeMorganPropositional_Coq:
forall P Q : Prop, ¬(P \/ Q) <-> ¬P /\ ¬Q.
Proof.
(* ’tauto’ automatically proves the equation *)
intros P Q. unfold iff.
split.
- intros H_not_or. unfold not. constructor.
+ intro H_P. apply H_not_or. left. apply H_P.
+ intro H_Q. apply H_not_or. right. apply H_Q.
- intros H_and_not H_or.
destruct H_and_not as [H_not_P H_not_Q].
destruct H_or as [H_P | H_Q].
+ apply H_not_P. assumption.
+ apply H_not_Q. assumption.
Qed.
(* define macroses: *)
Notation "a || b" := (orb a b).
Notation "a && b" := (andb a b).
Theorem DeMorganBoolean_Coq:
forall a b: bool, negb (a || b) = ((negb a) && (negb b)).
Proof.
try tauto. (* automatic tactic fails here *)
intros a b.
destruct a; simpl; reflexivity.
Qed.
15
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
Figure 15: Proof of the de Morgan’s law for first-order propositions in Isabelle
Figure 16: Proof of the de Morgan’s law for first-order propositions in Coq
5 This proof was originally taken from the set of examples in Isabelle’s documentation, see
https://github.com/seL4/isabelle/blob/master/src/HOL/Isar_Examples/Drinker.thy
16
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
Figure 17: Higher-order statement definition Figure 18: Higher-order statement definition
in Isabelle in Coq
17
In proceedings of the Seminar in Computer Science (CS-E4000), Aalto Univeristy, Autumn 2017
A.6 Example proofs of the formula for sum of first n numbers using inductive
types
fun range_sum :: "nat ⇒ nat"
where "range_sum n = (∑k::nat=0..n . k)"
value "range_sum 10" (* check the function *)
Figure 19: Proof of the formula for sum of n first number in Isabelle
Theorem SimpleArithProgressionSumFormula_Coq:
forall n, 2 * range_sum n = n * (n + 1).
Proof.
intros.
induction n.
(* goal: ’2 * range_sum 0 = 0 * (0 + 1)’ *)
- simpl; reflexivity.
(* goal: ’2 * range_sum (S n) = S n * (S n + 1)’ *)
- rewrite -> Nat.mul_add_distr_l. (* ’2*range_sum(S n) = S n * S n + S n * 1’ *)
rewrite -> Nat.mul_1_r. (* ’2*range_sum(S n) = S n * S n + S n’ *)
rewrite -> (Nat.mul_succ_l n). (* ’2*range_sum(S n) = n * S n + S n + S n’ *)
rewrite <- (Nat.add_1_r n). (* ’2*range_sum(n+1) = n*(n+1)+(n+1)+(n+1)’ *)
rewrite -> range_sum_lemma. (* ’2*(range_sum(n)+(n+1)) = n*(n+1)+(n+1)+(n+1)’ *)
omega. (* automatically solve arithmetic equation *)
Qed.
Figure 20: Proof of the formula for sum of n first number in Coq
18