sml chapter10
sml chapter10
For raw power Hal cannot compete with specialized theorem provers. What Hal
lacks in power it makes up in flexibility. A typical resolution theorem prover
supports pure classical logic with equality, but without induction. Tactical the-
orem provers allow a mixture of automatic and interactive working, in virtually
any logic.
Hal works in classical logic for familiarity’s sake, but it can easily be extended
to include induction, modal operators, set theory or whatever. Its tactics must be
changed to reflect the new inference rules; the tacticals remain the same, ready
to express search procedures for the new logic.
Chapter outline
The chapter contains the following sections:
A sequent calculus for first-order logic. The semantics of first-order logic is
403
404 10 A Tactical Theorem Prover
sound rule must yield a valid conclusion provided its premises are valid. A set
of inference rules for a logic is called a proof system or a formalization.
Of the many proof systems for classical first-order logic, easiest to automate is
the sequent calculus. The tableau method, which is sometimes used to automate
first-order logic, is a compact notation for the sequent calculus.
Validity and basic sequents. A valid sequent is one that is true under every struc-
ture and assignment. The theorems of our sequent calculus will be precisely the
valid sequents.
A sequent is called basic if both sides share a common formula φ. This can
be formalized as the axiom
φ, 0 ` 1, φ.
In the notation just described, φ, 0 and 1, φ are multisets containing φ. Such
sequents are clearly valid.
The other formulæ, those contained in 0 and 1, play no part in the infer-
ence. The sequent calculus is sometimes formulated such that a basic sequent
406 10 A Tactical Theorem Prover
Sequent rules for the connectives. Sequent calculus rules come in pairs, to intro-
duce each connective on the left or right of the ` symbol. For example, the rule
∧:left introduces a conjunction on the left, while ∧:right introduces a conjunc-
tion on the right. Here is the latter rule in the usual notation, with its premises
above the line and its conclusion below:
0 ` 1, φ 0 ` 1, ψ
∧:right
0 ` 1, φ ∧ ψ
To show that ∧:right is a sound rule, let us assume that its premises are valid and
demonstrate that its conclusion is valid. Suppose that, under some structure and
assignment, every formula in 0 is true; we must demonstrate that some formula
in 1, φ ∧ ψ is true. If no formula in 1 is true, then both φ and ψ are true by the
premises. Therefore φ ∧ ψ is true.
Now let us justify the rule ∧:left.
φ, ψ, 0 ` 1
∧:left
φ ∧ ψ, 0 ` 1
To show that this rule is sound, we proceed as above. Suppose that every formula
in 0, φ ∧ ψ is true. Then both φ and ψ are true. Assuming that the premise is
valid, some formula of 1 must be true, and this establishes the conclusion.
Figure 10.1 presents the rules for the propositional connectives ∧, ∨, →, ↔
and ¬. All the rules are justified similarly.
:left :right
φ, ψ, 0 ` 1 0 ` 1, φ 0 ` 1, ψ
φ ∧ ψ, 0 ` 1 0 ` 1, φ ∧ ψ
φ, 0 ` 1 ψ, 0 ` 1 0 ` 1, φ, ψ
φ ∨ ψ, 0 ` 1 0 ` 1, φ ∨ ψ
0 ` 1, φ ψ, 0 ` 1 φ, 0 ` 1, ψ
φ → ψ, 0 ` 1 0 ` 1, φ → ψ
φ, ψ, 0 ` 1 0 ` 1, φ, ψ φ, 0 ` 1, ψ ψ, 0 ` 1, φ
φ ↔ ψ, 0 ` 1 0 ` 1, φ ↔ ψ
0 ` 1, φ φ, 0 ` 1
¬φ, 0 ` 1 0 ` 1, ¬φ
Under the backward reading, each rule attacks a formula in the goal. Applying
∧:left breaks down a conjunction on the left side, while ∧:right breaks down a
conjunction on the right. If all the resulting subgoals are basic sequents, then
the initial goal has been proved. For propositional logic, this procedure must
terminate.
A sequent may have several different proofs, depending on which formulæ are
broken down first. The proof (∗) first breaks down the conjunction on the left in
φ ∧ ψ ` ψ ∧ φ. For a different proof, begin by breaking down the conjunction
on the right:
φ, ψ ` ψ φ, ψ ` φ
∧:left ∧:left
φ∧ψ `ψ φ∧ψ `φ
∧:right
φ∧ψ `ψ ∧φ
This is larger than the proof (∗) in that ∧:left is applied twice. Applying ∧:right
to the initial goal produced two subgoals, each with a conjunction on the left.
Shorter proofs usually result if the rule that produces the fewest subgoals is
chosen at each step.
To summarize, we have the following proof procedure:
• Take the sequent to be proved as the initial goal. The root of the proof
tree, and its only leaf, is this goal.
• Select some subgoal that is a leaf of the proof tree and apply a rule to it,
turning the leaf into a branch node with one or more leaves.
• Stop whenever all the leaves are basic sequents (success), or when no
rules can be applied to a leaf (failure).
This procedure is surprisingly effective, though its search is undirected. Both
∨:left and ∧:right may be applied to the subgoal p ∨ q, r ` r ∧ r . The former
rule performs case analysis on the irrelevant formula p ∨ q; the latter rule yields
two basic subgoals, succeeding immediately.
Exercise 10.6 Show that any sequent containing both φ and ¬φ to the left of
the ` symbol is provable.
10.3 Sequent rules for the quantifiers 409
They are dual to the rules for the universal quantifier and can be justified simi-
larly. Note that ∃x . φ is equivalent to ¬∀x . ¬φ.
The rules ∀:left and ∃:right have one feature that is not present in any of the
other rules: in backward proof, they do not remove any formulæ from the goal.
They expand a quantified formula, substituting a term into its body; and they
retain the formula to allow repeated expansion. It is impossible to determine in
advance how many expansions of a quantified formula are required for a proof.
Because of this, our proof procedure can fail to terminate; first-order logic is
undecidable.
Exercise 10.7 If the premise of ∀:right is ignored, can a proof involving this
rule reach an inconsistent conclusion? (This means a sequent ` φ such that ¬φ
is a valid formula.)
The topmost sequent is not basic; to finish the proof we must again apply ∀:left.
The first application of this rule has accomplished nothing. We have a general
heuristic: never apply ∀:left or ∃:right to a goal if a different rule can usefully
be applied.
10.4 Theorem proving with quantifiers 411
Working upwards from the goal, ∃:right is applied, introducing z as a free vari-
able. Although the existential formula remains in the subgoal, it remains dor-
mant until we again reach a goal where no other rule is applicable. The next
inference, →:right, moves φ(z ) to the left. Since x is not free in the subgoal,
∀:right can be applied, replacing ∀x . φ(x ) by φ(x ). In the resulting subgoal,
∃:right is again applied (there is no alternative), substituting x for z . The final
subgoal after →:right is a basic sequent containing φ(x ) on both sides.
Observe that ∃z . φ(z ) → ∀x . φ(x ) is expanded twice by ∃:right. The se-
quent cannot be proved otherwise. Sequents requiring n expansions of a quan-
tifier, for any given n, are not hard to devise.
2 To see that ∃z . φ(z ) → ∀x . φ(x ) is a theorem, first note that in fully parenthe-
sized form it is ∃z . [φ(z ) → (∀x . φ(x ))]. Pushing the existential quantifier
inside the implication changes it to a universal quantifier. The formula is thus
equivalent to (∀z . φ(z )) → (∀x . φ(x )), which is trivially true.
412 10 A Tactical Theorem Prover
variable:
φ[?a/x ], ∀x . φ, 0 ` 1
∀:left
∀x . φ, 0 ` 1
Replacing ?b and ?c by a transforms both φ(?c, ?c) and φ(a, ?b) into φ(a, a),
completing the proof. The parameter a is not labelled with any variables because
there are none in the goal supplied to ∀:right.
Further reading. A number of textbooks present logic from the viewpoint of
computer science. They emphasize proof procedures and unification, avoiding
the more traditional concerns of mathematical logic, such as model theory. See Galton
(1990) or Reeves and Clarke (1990) for a gentle introduction to logic. Gallier (1986)
gives a more technical treatment centred around the sequent calculus.
Exercise 10.8 Reconstruct the first three quantifier proofs above, this time us-
ing (meta) variables and parameters.
The signature. Signature FOL defines the representation of first-order terms and
formulæ:
signature FOL =
sig
datatype term = Var of string
| Param of string * string list
| Bound of int
| Fun of string * term list
datatype form = Pred of string * term list
| Conn of string * form list
| Quant of string * string * form
type goal = form list * form list
val precOf : string -> int
val abstract : int -> term -> form -> form
val subst : int -> term -> form -> form
val termVars : term * string list -> string list
val goalVars : goal *string list -> string list
val termParams : term * (string * string list) list
-> (string * string list) list
val goalParams : goal *(string * string list) list
-> (string * string list) list
end;
Type term realizes the methods described in the previous section. A variable
(constructor Var ) has a name. A Bound variable has an index. A Fun appli-
cation has a function’s name and argument list; a function taking no arguments
is simply a constant. A parameter (Param) has a name and a list of forbidden
variables.
Type form is elementary. An atomic formula (Pred ) has a predicate’s name
and argument list. A connective application (Conn) has a connective and a list
of formulæ, typically "˜", "&", "|", "-->", or "<->" paired with one or
two formulæ. A Quant formula has a quantifier (either "ALL" or "EX"), a
bound variable name and a formula for the body.
Type goal abbreviates the type of pairs of formula lists. Some older ML com-
pilers do not allow type abbreviations in signatures. We could specify goal
simply as a type: its declaration inside the structure would be visible outside.
The function precOf defines the precedences of the connectives, as required
for parsing and printing.
Functions abstract and subst resemble their namesakes of the previous chap-
ter, but operate on formulæ. Calling abstract i t p replaces each occurrence of t
in p by the index i (which is increased within quantifications); typically i = 0
and t is an atomic term. Calling subst i t p replaces the index i (increased
within quantifications) by t in the formula p.
10.5 Representing terms and formulæ 415
The function termVars collects the list of variables in a term (without repe-
titions); termVars(t, bs) inserts all the variables of t into the list bs. The argu-
ment bs may appear to be a needless complication, but it eliminates costly list
appends while allowing termVars to be extended to formulæ and goals. This
will become clear when we examine the function definitions.
The function goalVars, which also takes two arguments, collects the list of
variables in a goal. A goal in Hal is a sequent. Although sequents are represented
in ML using formula lists, not multisets, we shall be able to implement the style
of proof discussed above.
The functions termParams and goalParams collect the list of parameters in
a term or goal, respectively. Each parameter consists of its name paired with a
list of variable names.
The structure. Structure Fol (Figure 10.2) implements signature FOL. The da-
tatype declarations of term and form are omitted to save space; they are
identical to those in the signature. The structure declares several functions not
specified in the signature.
Calling replace (u1, u2) t replaces the term u1 by u2 throughout the term t.
This function is called by abstract and subst.
Functionals accumForm and accumGoal demonstrate higher-order program-
ming. Suppose that f has type term × τ → τ , for some type τ , where f (t, x )
accumulates some information about t in x . (For instance, f could be termVars,
which accumulates the list of free variables in a term.) Then foldr f extends f
to lists of terms. The function accumForm f has type form × τ → τ , extend-
ing f to operate on formulæ. It lets foldr f handle the arguments of a predi-
cate P (t1 , . . . , tn ); it recursively lets foldr (accumForm f ) handle the formula
lists of connectives. The functional accumGoal calls foldr twice, extending a
function of type form × τ → τ to one of type (form list × form list) × τ → τ .
It extends a function involving formulæ to one involving goals.
Functionals accumForm and accumGoal provide a uniform means of travers-
ing formulæ and goals. They define the functions goalVars and goalParams
and could have many similar applications. Moreover, they are efficient: they
create no lists or other data structures.
The functions termVars and termParams are defined by recursion, scan-
ning a term to accumulate its variables or parameters. They use foldr to traverse
argument lists. The function insert (omitted to save space) builds an ordered
list of strings without repetitions. Note that termVars does not regard the pa-
rameter b?a1 ,...,?ak as containing ?a1 , . . . , ?ak ; these forbidden variables are not
logically part of the term and perhaps ought to be stored in a separate table.
416 10 A Tactical Theorem Prover
Exercise 10.13 Sketch how FOL and Fol can be modified to adopt a new repre-
sentation of terms. Bound variables are identified by name, but are syntactically
distinct from parameters and meta-variables. Would this representation work for
the λ-calculus?
Exercise 10.14 Change the declaration of type form, replacing Conn by sep-
arate constructors for each connective, say Neg, Conj , Disj , Imp, Iff . Modify
FOL and Fol appropriately.
TermPack = ( TermList )
| Empty
Term = Id TermPack
| ? Id
Formulæ (Form) are defined in mutual recursion with primaries, which consist
of atomic formulæ and their negations:
Primary = ˜ Primary
| ( Form )
| Id TermPack
418 10 A Tactical Theorem Prover
The quantifiers are rendered into ASCII characters as ALL and EX; the following
table gives the treatment of the connectives:
Usual symbol: ¬ ∧ ∨ → ↔
ASCII version: ˜ & | --> <->
since ASCII lacks Greek letters. Hal requires a quantified formula to be enclosed
in parentheses if it is the operand of a connective.
Parsing. The signature for parsing is minimal. It simply specifies the function
read , for converting strings to formulæ:
signature PARSEF OL =
sig
val read : string -> Fol .form
end;
Before we can implement this signature, we must build structures for the lexical
analysis and parsing of first-order logic. Structure FolKey defines the lexical
syntax. Let us apply the functors described in Chapter 9:
structure FolKey =
struct val alphas = ["ALL","EX"]
and symbols = ["(", ")", ".", ",", "?", "˜",
"&", "|", "<->", "-->", "|-"]
end;
structure FolLex = Lexical (FolKey);
structure FolParsing = Parsing (FolLex );
Figure 10.3 presents the corresponding structure. It is fairly simple, but a few
points are worth noting.
Functions list and pack express the grammar phrases TermList and TermPack .
They are general enough to define ‘lists’ and ‘packs’ of arbitrary phrases.
The parser cannot distinguish constants from parameters or check that func-
tions have the right number of arguments: it keeps no information about the
functions and predicates of the first-order language. It regards any identifier
as a constant, representing x by Fun("x", []). When parsing the quantifica-
tion ∀x . φ(x ), it abstracts the body φ(x ) over its occurrences of the ‘constant’ x .
As discussed in the previous chapter, our parser cannot accept left-recursive
grammar rules such as
Form = Form Conn Form.
10.6 Parsing and displaying formulæ 419
Displaying. Signature DISPLAY FOL specifies the pretty printing operators for
formulæ and goals (which are sequents):
signature DISPLAYF OL =
sig
val form: Fol .form -> unit
val goal : int -> Fol .goal -> unit
end;
The integer argument of function goal is displayed before the goal itself. It
represents the subgoal number; a proof state typically has several subgoals. The
sessions in Section 10.14 illustrate the output.
Structure DisplayFol implements this signature; see Figure 10.4. Our pretty
printer must be supplied with symbolic expressions that describe the formatting.
Function enclose wraps an expression in parentheses, while list inserts commas
between the elements of a list of expressions. Together, they format argument
lists as (t1 , . . . , tn ).
A parameter’s name is printed, but not its list of forbidden variables. Another
part of the program will display that information as a table.
The precedences of the connectives govern the inclusion of parentheses. Call-
ing formp k q formats the formula q — enclosing it in parentheses, if necessary,
to protect it from an adjacent connective of precedence k . In producing the
string q & (p | r), it encloses p | r in parentheses because the adjacent
connective (&) has precedence 3 while | has precedence 2.
Exercise 10.16 Explain the workings of each of the functions supplied to >>
in ParseFol .
Exercise 10.17 Alter the parser to admit q --> ALL x. p as correct syntax
for q → (∀x . p), for example. It should no longer demand parentheses around
quantified formulæ.
10.6 Parsing and displaying formulæ 421
Exercise 10.18 The inner parenthesis pair in q & (p1 --> (p2 | r)) is
redundant because | has greater precedence than -->; our pretty printing often
includes such needless parentheses. Suggest modifications to the function form
that would prevent this.
10.7 Unification
Hal attempts to unify atomic formulæ in goals. Its basic unification
algorithm takes terms containing no bound variables. Given a pair of terms,
it computes a set of (variable, term) replacements to make them identical, or
reports that the terms cannot be unified. Performing the replacements is called
instantiation. Unification involves three cases:
Function applications. Two function applications can be unified only if they ap-
ply the same function; clearly no instantiation can transform f (?a) and g(b, ?c)
into identical terms. To unify g(t1 , t2 ) with g(u1 , u2 ) involves unifying t1 with u1
and t2 with u2 consistently — thus g(?a, ?a) cannot be unified with g(b, c) be-
cause a variable (?a) cannot be replaced by two different constants (b and c).
The unification of f (t1 , . . . , tn ) with f (u1 , . . . , un ) begins by unifying t1 with
u1 , then applies the resulting replacements to the remaining terms. The next
step is unifying t2 with u2 and applying the new replacements to the remaining
terms, and so forth. If any unifications fail then the function applications are
not unifiable. The corresponding arguments can be chosen for unification in any
order without significantly affecting the outcome.
Parameters. Two parameters can be unified only if they have the same name. A
parameter cannot be unified with a function application.
This is the notorious occurs check, which most Prolog interpreters omit because
of its cost. For theorem proving, soundness must have priority over efficiency;
the occurs check must be performed.
Examples. To unify g(?a, f (?c)) with g(f (?b), ?a), first unify ?a with f (?b), a
trivial step. After replacing ?a by f (?b) in the remaining arguments, unify f (?c)
with f (?b). This replaces ?c by ?b. The outcome can be given as the set {?a 7→
f (?b)), ?c 7→?b}. The unified formula is g(f (?b), f (?b)).
Here is another example. To unify g(?a, f (?a)) with g(f (?b), ?b), the first
step again replaces ?a by f (?b). The next task is unifying f (f (?b)) with ?b —
which is impossible because f (f (?b)) contains ?b. Unification fails.
The function atoms attempts to unify two atomic formulæ, while instTerm,
instForm and instGoal apply replacements to terms, formulæ and goals, respec-
tively.
We represent a set of replacements by a dictionary, using structure StringDict
(Section 7.10); variable names are strings.
An atomic formula consists of a predicate applied to an argument list, such
as P (t1 , . . . , tn ). Unifying two atomic formulæ is essentially the same as uni-
fying two function applications; the predicates must be the same and the corre-
sponding argument pairs must be simultaneously unifiable.
Structure Unify (Figure 10.5) implements unification. The key functions are
declared within unifyLists in order to have access to env , the environment of re-
placements. Collecting the replacements in env is more efficient than applying
each replacement as it is generated. Replacements are regarded as cumulative
rather than simultaneous, just as in the λ-calculus interpreter’s treatment of def-
initions (Section 9.7). Simultaneous substitution by
would replace ?a by f (?b), but our functions replace ?a by f (g(z )). This is the
correct treatment for our unification algorithm.
Here are some remarks about the functions declared in unifyLists:
original state unchanged so that other tactics can be tried. A unification algo-
rithm could employ imperative techniques provided they were invisible outside.
The unification function raises exception Failed when two terms cannot be
unified. As in parsing, the failure may be detected in deeply nested recursive
calls; the exception propagates upwards. This is a typical case where exceptions
work well.
Function instTerm substitutes in parameters as described above. Each for-
bidden variable is replaced by the list of variables in the term resulting from
the substitution. This could be done using List.concat, but the combination of
foldr and termVars performs less copying.
Efficient unification algorithms. The algorithm presented here can take expo-
nential time, in highly exceptional cases. In practice, it is quite usable. More
efficient algorithms exist. The linear time algorithm of Paterson and Wegman (1978)
is usually regarded as too complicated for practical use. The algorithm of Martelli and
Montanari (1982) is almost linear and is intended to be usable. However, Corbin and
Bidoit (1983) propose an algorithm based upon the naı̈ve one, but representing terms
by graphs (really, pointers) instead of trees. They claim it to be superior to the almost
linear algorithms because of its simplicity, despite needing quadratic time. Ružička and
Prı́vara (1988) have refined this approach to be almost linear too.
Exercise 10.20 What could happen if this line were omitted from unify?
if t = Fol .Var a then env else
to prove. The leaves, or current subgoals, are the sequents that remain to be
proved.
A goal φ paired with the singleton subgoal list [ ` φ ] represents the initial
state of a proof of φ; no rules have yet been applied. A goal φ paired with the
empty subgoal list is a final state, and represents a finished proof.
If the full proof tree is not stored, how can we be certain that a Hal proof is
correct? The answer is to hide the representation of proof states using an abstract
type state, providing a limited set of operations — to create an initial state, to
examine the contents of a state, to test for a final state, and to transform a state
into a new state by some rule of inference.
If greater security is required, the proof could be saved and checked by a
separate program. Bear in mind that proofs of real theorems can be extremely
large, and that no amount of machine checking can provide absolute security.
Our programs and proof systems are fallible — as are the theories we use to
reduce ‘real world’ tasks to logic.
Approaches to formalizing an inference system. While developing Edinburgh
LCF , Robin Milner conceived the idea of defining an inference system as an
abstract type. He designed ML’s type system to support this application. LCF’s type
thm denotes the set of theorems of the logic. Functions with result type thm implement
the axioms and inference rules.
Implementing the inference rules as functions from theorems to theorems supports
forward proof, LCF’s primitive style of reasoning. To support backward proof, LCF
provides tactics. LCF tactics represent a partial proof by a function of type thm list →
thm. This function proves the main goal, using inference rules, when supplied with
theorems for each of the subgoals. A finished proof can be supplied with the empty list
to prove the main goal. The classic description (Gordon et al., 1979) is out of print, but
my book on LCF also describes this work (Paulson, 1987).
Hal differs from LCF in implementing the inference rules as functions on proof states,
not on theorems. These functions are themselves tactics and support backward proof
as the primitive style. They do not support forward proof. The approach supports
unification; tactics may update meta-variables in the proof state.
Isabelle (Paulson, 1994) uses yet another approach. Rules and proof states have a
common representation in the typed λ-calculus. Combining these objects yields both
forward and backward proof. This requires some form of higher-order unification (Huet,
1975).
signature RULE =
sig
type state
type tactic = state -> state ImpSeq.t
val main : state -> Fol .form
val subgoals : state -> Fol .goal list
val initial : Fol .form -> state
val final : state -> bool
val basic : int -> tactic
val unify : int -> tactic
val conjL : int -> tactic
val conjR : int -> tactic
val disjL : int -> tactic
val disjR : int -> tactic
val impL : int -> tactic
val impR : int -> tactic
val negL : int -> tactic
val negR : int -> tactic
val iffL : int -> tactic
val iffR : int -> tactic
val allL : int -> tactic
val allR : int -> tactic
val exL : int -> tactic
val exR : int -> tactic
end;
10.9 The ML signature 429
where ImpSeq is the structure for lazy lists presented in Section 8.4. A tactic
maps a state to a sequence of possible next states. The primitive tactics generate
finite sequences, typically of length zero or one. A complex tactic, say for depth-
first search, could generate an infinite sequence of states.
The function initial creates initial states containing a given formula as the
main goal and the only subgoal. The predicate final tests whether a proof state
is final, containing no subgoals.
The other functions in the signature are the primitive tactics, which define the
inference rules of the sequent calculus. Later, we shall introduce tacticals for
combining tactics.
The subgoals of a proof state are numbered starting from 1. Each primitive
tactic, given an integer argument i and a state, applies some rule of the sequent
calculus to subgoal i , creating a new state. For instance, calling
conjL 3 st
applies ∧:left to subgoal 3 of state st. If this subgoal has the form φ ∧ ψ, 0 ` 1
then subgoal 3 of the next state will be φ, ψ, 0 ` 1. Otherwise, ∧:left is not
applicable to the subgoal and there can be no next state; conjL will return the
empty sequence.
If subgoal 5 of st is 0 ` 1, φ ∧ ψ, then
conjR 5 st
generates a sequence of four next states. Only the first of these is computed,
with the others available upon demand, since sequences are lazy.
Declaring type state. The datatype declaration introduces type state with
its constructor State. The constructor is not exported, allowing access to the
representation only inside the structure body. Type tactic is declared to abbre-
viate the type of functions from state to state sequences.
Functions main and subgoals return the corresponding parts of a proof state.
The third component of a proof state is an integer, for generating unique names
in quantifier rules. Its value is initially 0 and is increased as necessary when
the next state is created. If this name counter were kept in a reference cell and
updated by assignment, much of the code would be simpler — especially where
the counter plays no rôle. However, applying a quantifier rule to a state would
affect all states sharing that reference. Resetting the counter to 0, while produc-
ing shorter names, could also lead to re-use of names and faulty reasoning. It is
safest to ensure that all tactics are purely functional.
Calling initial p creates a state containing the sequent ` p as its only subgoal,
with p as its main goal and 0 for its variable counter. Predicate final tests for an
empty subgoal list.
The definitions of basic and unify. All tactics are expressed using spliceGoals,
a function to replace subgoal i by a new list of subgoals in a state. The List
functions take and drop extract the subgoals before and after i , so that the new
subgoals can be spliced into the correct place.
The declaration of propRule illustrates how proof states are processed. This
function makes a tactic from a function goalF of type goal → goal list. Applied
to an integer i and a state, it supplies subgoal i to goalF and splices in the
resulting subgoals; it returns the new state as a one-element sequence. It returns
the empty sequence if any exception is raised. Exception Subscript results from
the call List.nth(gs,i -1) if there is no i th subgoal; recall that nth numbers
a list’s elements starting from zero. Other exceptions, such as Match, can result
from goalF .
The tactic basic is a simple application of propRule. It supplies as goalF
a function that checks whether the goal (ps, qs) is a basic sequent. If so then
10.10 Tactics for basic sequents 431
it returns the empty list of subgoals; the effect is to delete that subgoal from
the next state. But if (ps, qs) is not a basic sequent then the function raises an
exception.
The tactic unify is more complicated: it can return multiple next states. It
calls unifiable to generate a sequence of unifying environments, and inst to
apply them to the other subgoals. Function next, which performs the final pro-
cessing, is applied via the functional ImpSeq.map.
The function unifiable takes lists ps and qs of formulæ. It returns the se-
quence of all environments obtained by unifying some p of ps with some q
of qs. The function find handles the ‘inner loop,’ searching in qs for some-
thing to unify with p. It generates a sequence whose head is an environment and
whose tail is generated by the recursive call find qs, but if Unify.atoms raises
an exception then the result is simply find qs.
Look out for other goals. When unify solves a subgoal, it may update the state
so that some other subgoal becomes unprovable. Success of this tactic does
not guarantee that it is the right way to find a proof; in some cases, a different tactic
should be used instead. Any search procedure involving unify should use backtracking.
On the other hand, solving a goal by basic is always safe.
fun splitConn a qs =
let fun get [] = raise Match
| get (Fol .Conn(b,ps) :: qs) = if a=b then ps else get qs
| get (q::qs) = get qs;
fun del [] = []
| del ((q as Fol .Conn(b,_)) :: qs) = if a=b then qs
else q :: del qs
| del (q::qs) = q :: del qs
in (get qs, del qs) end;
fun propL a leftF = propRule (fn (ps,qs) => leftF (splitConn a ps, qs));
fun propR a rightF = propRule (fn (ps,qs) => rightF (ps, splitConn a qs));
val conjL = propL "&" (fn (([p1,p2], ps), qs) => [(p1::p2::ps, qs)]);
val conjR = propR "&"
(fn (ps, ([q1,q2], qs)) => [(ps, q1::qs), (ps, q2::qs)]);
val disjL = propL "|"
(fn (([p1,p2], ps), qs) => [(p1::ps, qs), (p2::ps, qs)]);
val disjR = propR "|" (fn (ps, ([q1,q2], qs)) => [(ps, q1::q2::qs)]);
val impL = propL "-->"
(fn (([p1,p2], ps), qs) => [(p2::ps, qs), (ps, p1::qs)]);
val impR = propR "-->" (fn (ps, ([q1,q2], qs)) => [(q1::ps, q2::qs)]);
val negL = propL "˜" (fn (([p], ps), qs) => [(ps, p::qs)]);
val negR = propR "˜" (fn (ps, ([q], qs)) => [(q::ps, qs)]);
val iffL = propL "<->"
(fn (([p1,p2], ps), qs) => [(p1::p2::ps, qs), (ps, p1::p2::qs)]);
val iffR = propR "<->"
(fn (ps, ([q1,q2], qs)) => [(q1::ps, q2::qs), (q2::ps, q1::qs)]);
434 10 A Tactical Theorem Prover
another function leftF , which creates new subgoals. The functional propR is
similar, but searches on the right side.
The tactics are given by val declarations, since they have no explicit argu-
ments. Each tactic consists of a call to propL or propR. Each passes in fn nota-
tion the argument leftF or rightF . Each function takes an analysed subgoal and
returns one or two subgoals. Thus conjL searches for a conjunction in the left
part and inserts the two conjuncts into the new subgoal, while conjR searches
for a conjunction in the right part and makes two subgoals.
Tactics allR and exL select a quantified formula and substitute a parameter
into its body. The parameter has the name b and carries, as forbidden variables,
all the variables in the subgoal.
As we reach the end of Rule, we should remember that the tactics declared
in it are the only means of creating values of type state. All proof procedures
— even if they demonstrate validity using sophisticated data structures — must
ultimately apply these tactics, constructing a formal proof. If the code given
above is correct, and the ML system is correct, then Hal proofs are guaranteed to
be sound. No coding errors after this point can yield faulty proofs. This security
comes from defining state as an abstract type.
Exercise 10.22 Suggest a representation of type state that would store the
entire proof tree. Best would be an encoding that uses little space while allowing
the proof tree to be reconstructed. Sketch the modifications to RULE and Rule.
Exercise 10.23 Our set of tactics provides no way of using a previously proved
theorem in a proof. A tactic based on the rule
`φ φ, 0 ` 1
0 `1
could insert the theorem ` φ as a lemma into a goal.3 Describe how such a
tactic could be implemented.
The interface consists of the following items, which (except pr ) act upon a
stored proof state:
b not in ?c ?d
for b?c ,?d ; it prints nothing at all for a parameter that has no forbidden vari-
ables. Function printgoals prints a list of numbered subgoals. With the help of
these functions, pr prints a state: its main goal, its subgoal list, and its table of
parameters.
Exercise 10.25 Design and implement an undo command that cancels the ef-
fect of the most recent by command. Repeated undo commands should revert
to earlier and earlier states.
Exercise 10.26 There are many ways of managing the search tree of states.
The interface could explore a single path through the tree. Each node would
store a sequence of possible next states, marking one as the active branch.
Changing the active branch at any node would select a different path. Develop
this idea.
Now φ ∧ ψ → ψ ∧ φ is the main goal and the only subgoal. We must apply
→:right to subgoal 1; no other step is possible:
by (Rule.impR 1);
> P & Q --> Q & P
> 1. P & Q |- Q & P
> 1. P, Q |- Q & P
Tactics are usually applied to subgoal 1; let us tackle subgoal 2 for variety. It is
a basic sequent, so it falls to Rule.basic.
by (Rule.basic 2);
> P & Q --> Q & P
> 1. P, Q |- Q
Most theorem provers provide some means of storing theorems once proved, but
this is not possible in Hal. We go on to the next example, ∃z . φ(z ) → ∀x . φ(x ),
which was discussed earlier.
goal "EX z. P(z) --> (ALL x. P(x))";
> EX z. P(z) --> (ALL x. P(x))
> 1. empty |- EX z. P(z) --> (ALL x. P(x))
The only possible step is to apply ∃:right to subgoal 1. The tactic generates a
variable called ?_a.
by (Rule.exR 1);
> EX z. P(z) --> (ALL x. P(x))
> 1. empty
> |- P(?_a) --> (ALL x. P(x)),
> EX z. P(z) --> (ALL x. P(x))
We could apply ∃:right again, but it seems sensible to analyse the other formula
in subgoal 1. So we apply →:right.
by (Rule.impR 1);
> EX z. P(z) --> (ALL x. P(x))
10.14 Two sample proofs using tactics 441
> 1. P(?_a)
> |- ALL x. P(x),
> EX z. P(z) --> (ALL x. P(x))
Continuing to work on the first formula, we apply ∀:right. The tactic generates a
parameter called _b, with ?_a as its forbidden variable. A table of parameters
is now displayed.
by (Rule.allR 1);
> EX z. P(z) --> (ALL x. P(x))
> 1. P(?_a) |- P(_b),
> EX z. P(z) --> (ALL x. P(x))
> _b not in ?_a
Since the subgoal contains P(?_a) on the left and P(_b) on the right, we
could try unifying these formulæ by calling Rule.unify. However, the forbid-
den variable of _b prevents this unification. Replacing ?_a by _b would violate
the proviso of ∀:right.
by (Rule.unify 1);
> ** Tactic FAILED! **
The situation is like it was at the start of the proof, except that the subgoal
contains two new atomic formulæ. Since they are not unifiable, we have no
choice but to expand the quantifier again, using ∃:right. The variable ?_c is
created.
by (Rule.exR 1);
> EX z. P(z) --> (ALL x. P(x))
> 1. P(?_a)
> |- P(?_c) --> (ALL x. P(x)), P(_b),
> EX z. P(z) --> (ALL x. P(x))
> _b not in ?_a
The proof continues as it did before, with the two atomic formulæ carried along.
We avoid applying ∃:right a third time and instead apply →:right.
by (Rule.impR 1);
> EX z. P(z) --> (ALL x. P(x))
> 1. P(?_c), P(?_a)
> |- ALL x. P(x), P(_b),
> EX z. P(z) --> (ALL x. P(x))
> _b not in ?_a
The subgoal has a new formula on the left, namely P(?_c), and ?_c is not a
forbidden variable of _b. Therefore P(?_c) and P(_b) are unifiable.
by (Rule.unify 1);
442 10 A Tactical Theorem Prover
Although the first attempt with Rule.unify failed, a successful proof was finally
found. This demonstrates how parameters and variables behave in practice.
10.15 Tacticals
The sample proofs of the previous section are unusually short. The
proof of even a simple formula can require many steps. To convince yourself of
this, try proving
((φ ↔ ψ) ↔ χ ) ↔ (φ ↔ (ψ ↔ χ )).
Although proofs are long, each step is usually obvious. Often, only one or two
rules can be applied to a subgoal. Moreover, the subgoals can be tackled in any
order because a successful proof must prove them all. We can always work on
subgoal 1. A respectable proof procedure can be expressed using tactics, with
the help of a few control structures.
The basic tacticals. Operations on tactics are called tacticals by analogy with
functions and functionals. The simplest tacticals implement the control struc-
tures of sequencing, choice and repetition. They are analogous to the parsing
operators --, || and repeat (see Section 9.2). So they share the same names,
with the additional infix operator |@|.
Tacticals in Hal involve operations on sequences. Type multifun abbreviates
types in the signature (Figure 10.11). The tacticals are not restricted to tactics.
They are all polymorphic; type state appears nowhere. Let us describe these
tacticals by their effect on arbitrary functions of suitable type, not just tactics.
The tactical -- composes two functions sequentially. When the function
f --g is applied to x , it computes the sequence f (x ) = [y1 , y2 , . . .] and returns
the concatenation of the sequences g(y1 ), g(y2 ), . . . . With tactics, -- applies
one tactic and then another to a proof state, returning all ‘next next’ states that
result.
The tactical || chooses between two functions. When the function f ||g
is applied to x , it returns f (x ) if this sequence is non-empty, and otherwise
returns g(y). With tactics, || applies one tactic to a proof state, and if it fails,
tries another. The tactical |@| provides a less committal form of choice; when
f |@|g is applied to x , it concatenates the sequences f (x ) and g(x ).
The tactics all and no can be used with tacticals to obtain effects such as
repetition. For all x , all (x ) returns the singleton sequence [x ] while no(x )
10.15 Tacticals 443
infix 6 $--;
infix 5 --;
infix 0 || |@|;
signature TACTICAL =
sig
type (0 a,0 b) multifun = 0 a -> 0 b ImpSeq.t
val -- : (0 a,0 b)multifun * (0 b,0 c)multifun -> (0 a,0 c)multifun
val || : (0 a,0 b)multifun * (0 a,0 b)multifun -> (0 a,0 b)multifun
val |@| : (0 a,0 b)multifun * (0 a,0 b)multifun -> (0 a,0 b)multifun
val all : (0 a,0 a)multifun
val no : (0 a,0 b)multifun
val try : (0 a,0 a)multifun -> (0 a,0 a)multifun
val repeat : (0 a,0 a)multifun -> (0 a,0 a)multifun
val repeatDeterm : (0 a,0 a)multifun -> (0 a,0 a)multifun
val depthFirst : (0 a->bool ) -> (0 a,0 a)multifun -> (0 a,0 a)multifun
val depthIter : (0 a->bool ) * int -> (0 a,0 a)multifun -> (0 a,0 a)multifun
val firstF : (0 a -> (0 b,0 c)multifun) list -> 0 a -> (0 b,0 c)multifun
end;
returns the empty sequence. Thus, all succeeds with all arguments while no
succeeds with none. Note that all is the identity element for --:
Implementing the tacticals. Let us turn to the structure Tactical (Figure 10.12).
The rôle of sequence concatenation in -- is clear, but its rôle in |@| may be
obscure. What is wrong with this obvious definition?
fun (tac1 |@| tac2) x = ImpSeq.append (tac1 x , tac2 x );
This version of |@| may prematurely (or needlessly) call tac2. Defining |@|
using ImpSeq.concat ensures that tac2 is not called until the elements pro-
duced by tac1 have been exhausted. In a lazy language, the obvious definition
of |@| would behave properly.
The tactical try attempts to apply its argument.
The tactical repeat applies a function repeatedly. The result of repeat f x
is a sequence of values obtained from x by repeatedly applying f , such that a
444 10 A Tactical Theorem Prover
further application of f would fail. The tactical is defined recursively, like the
analogous parsing operator.
The tactical repeatDeterm also provides repetition. It is deterministic: it
considers only the first outcome returned at each step. When the other outcomes
are not needed, repeatDeterm is much more efficient than repeat.
The tactical depthFirst explores the search tree generated by a function. Call-
ing depthFirst pred f x returns a sequence of values, all satisfying the predi-
cate pred , that were obtained from x by repeatedly applying f .
The tactical depthIter explores the search tree using depth-first iterative deep-
ening. It searches first to depth d , then depth 2d , then 3d and so forth; this en-
sures that no solutions are missed. Its other arguments are as in depthFirst. Its
rather messy implementation is based upon the code discussed in Section 5.20.
Finally, firstF is a convenient means of combining primitive inference rules;
see Figure 10.13 below.
Some examples. In order to demonstrate the tacticals, we first open their struc-
ture, making available the infixes.
open Tactical ;
Now let us prove the following formula, which concerns the associative law for
conjunction:
goal "(P & Q) & R --> P & (Q & R)";
> (P & Q) & R --> P & (Q & R)
> 1. empty |- (P & Q) & R --> P & (Q & R)
The only rule that can be applied is →:right. Looking ahead a bit, we can
foresee two applications of ∧:left. With repeat we can apply both rules as often
as necessary:
by (repeat (Rule.impR 1 || Rule.conjL 1));
> (P & Q) & R --> P & (Q & R)
> 1. P, Q, R |- P & (Q & R)
We have proved the theorem using only two by commands; a rule-by-rule proof
would have needed eight commands. For another demonstration, let us prove a
theorem using one fancy tactic. Take our old quantifier example:
446 10 A Tactical Theorem Prover
Let us repeat the tactics used in Section 10.14, choosing their order carefully.
Certainly Rule.unify should be tried first, since it might solve the goal alto-
gether. And Rule.exR must be last; otherwise it will apply every time and
cause an infinite loop.
by (repeat (Rule.unify 1 || Rule.impR 1 ||
Rule.allR 1 || Rule.exR 1));
> EX z. P(z) --> (ALL x. P(x))
> No subgoals left!
Exercise 10.27 What does the tactic repeat(f --f )(x ) do?
The components of depth are themselves useful for interactive proof, espe-
cially when depth fails. They are specified in signature TAC:
signature TAC =
sig
val safeSteps: int -> Rule.tactic
val quant : int -> Rule.tactic
val step : int -> Rule.tactic
val depth : Rule.tactic
val depthIt : int -> Rule.tactic
end;
4 Since the goal formula does not fit on one line, the \...\ escape sequence
divides the string over two lines.
10.16 Automatic tactics for first-order logic 449
It is worth reiterating that our tactics cannot compete with automatic theorem
provers. They work by applying primitive inference rules, whose implemen-
tation was designed for interactive use. Their ‘inner loop’ (the tactic safe)
searches for connectives in a profligate manner. No heuristics govern the ex-
pansion of quantifiers. This simple-looking example (problem 43) is not solved
in a reasonable time:
goal "(ALL x. ALL y. q(x,y) <-> (ALL z. p(z,x) <-> p(z,y))) \
\ --> (ALL x. (ALL y. q(x,y) <-> q(y,x)))";
Tactics work best when the logic has no known automatic proof procedure. Tac-
ticals allow experimentation with different search procedures, while the abstract
type state guards against faulty reasoning.
Other theorem provers. Most automatic theorem provers are based on the res-
olution principle (Chang and Lee, 1973). They prove a formula A by convert-
ing ¬A to clause form (based upon conjunctive normal form) and deriving a contradic-
tion. A popular resolution prover is W. McCune’s Otter. For example, Quaife (1992) has
used Otter for proofs in Peano arithmetic, geometry and set theory. Another impressive
system is SETHEO (Letz et al., 1992).
Tableau provers are less powerful, but more natural than resolution provers, since
they do not require conversion to clause form. Examples include HARP (Oppacher and
Suen, 1988) and the amazingly simple leanTAP (Beckert and Posegga, 1995), which
consists of a few lines of Prolog. Tactic depthIt is loosely based upon leanTAP but is
much slower.
The tactical approach combines modest automation with great flexibility. Systems
apply it not for classical first-order logic, but for other logics of computational impor-
tance. LCF supports a logic of domain theory (Gordon et al., 1979; Paulson, 1987). The
HOL system supports Church’s higher-order logic (Gordon and Melham, 1993). Nuprl
450 10 A Tactical Theorem Prover
supports a form of constructive type theory (Constable et al., 1986). Isabelle is a generic
theorem prover, supporting several different logics (Paulson, 1994).
Exercise 10.31 Draw a diagram showing the structures, signatures and func-
tors of Hal and their relationships.
Exercise 10.32 Implement a tactic for the rule of mathematical induction, in-
volving the constant 0 and the successor function suc:
0 ` 1, φ[0/x ] φ, 0 ` 1, φ[suc(x )/x ] proviso: x must not
occur free in the con-
0 ` 1, ∀x . φ clusion
Can you foresee any difficulties in adding the tactic to an automatic proof pro-
cedure?
Exercise 10.33 Declare a tactical someGoal such that, when applied to a state
with n subgoals, someGoal f is equivalent to
f (n) || f (n − 1) || . . . || f (1).
What does repeat (someGoal Rule.conjR) do to a proof state?
Exercise 10.34 Our proof procedure always works on subgoal 1. When might
it be better to choose other subgoals?