0% found this document useful (0 votes)
329 views15 pages

Proof Checker PDF

The document describes a proof checker called CALCCHECK designed for Gries and Schneider's textbook "A Logical Approach to Discrete Math". CALCCHECK takes LaTeX-formatted proofs as input and checks that each step is justified by the theorems referenced. It provides feedback to help students determine if their proofs are correct and develop skills in rigorous mathematical writing. While not complete, CALCCHECK can verify most proofs in the textbook's chapters on propositional and predicate logic.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
329 views15 pages

Proof Checker PDF

The document describes a proof checker called CALCCHECK designed for Gries and Schneider's textbook "A Logical Approach to Discrete Math". CALCCHECK takes LaTeX-formatted proofs as input and checks that each step is justified by the theorems referenced. It provides feedback to help students determine if their proofs are correct and develop skills in rigorous mathematical writing. While not complete, CALCCHECK can verify most proofs in the textbook's chapters on propositional and predicate logic.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

The Teaching Tool :

A Proof-Checker for Gries and Schneiders


Logical Approach to Discrete Math
Wolfram Kahl
McMaster University, Hamilton, Ontario, Canada
[email protected]
Abstract. Students following a rst-year course based on Gries and
Schneiders LADM textbook had frequently been asking: How can I
know whether my solution is good?
We now report on the development of a proof-checker designed to
answer exactly that question, while intentionally not helping to nd the
solutions in the rst place. provides detailed feedback to L
A
T
E
X-
formatted calculational proofs, and thus helps students to develop con-
dence in their own skills in rigorous mathematical writing.
Gries and Schneiders book emphasises rigorous development of math-
ematical results, while striking one particular compromise between full
formality and customary, more informal, mathematical practises, and
thus teaches aspects of both. This is one source of several unusual re-
quirements for a mechanised proof-checker; other interesting aspects arise
from details of their notational conventions.
1 Introduction
When teaching a rst-year course on Logic and Discrete Mathematics for Com-
puter Science following Gries and Schneiders textbook A Logical Approach to
Discrete Math(LADMfor short) [GS93] for the rst time, I obtained feedback
from students feeling that the book did not contain suciently many worked ex-
amples, that insucient solutions for exercises were available
1
, and, especially,
that they felt at a loss since they did not see any way of knowing how good their
answers were before the marked assignment was returned to them.
The following year (2011), I therefore started to implement , a
tool intended mainly as a proof-checker for the calculational proof style taught
by LADM. For the time being, the usage paradigm of is the same as
that of Spiveys Z type-checker f UZZ: also operates on L
A
T
E
X source by
parsing and analysing the contents of specic formal environments, and providing
feedback on those. Using L
A
T
E
X as input syntax has the advantage that students
learn a general-purpose skill, with only very little formalism-specic overhead.
1
AnInstructors Manual containing solutions exists, but is made available explicitly
only to instructors, with the proviso that answers to selected exercises may be used
in lectures or distributed to students as answers to homeworks or tests.
J.-P. Jouannaud and Z. Shao (Eds.): CPP 2011, LNCS 7086, pp. 216230, 2011.
c Springer-Verlag Berlin Heidelberg 2011
CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 217
For example, the following proof can be found on p. 46 of LADM (without
the Proving line):
Proving (3.16) (p q) (q p):
p q
= Def. of (3.10)
(p q)
= Symmetry of (3.2)
(q p)
= Def. of (3.10), with p, q := q, p
q p
Using the L
A
T
E
X macro package accompanying , this proof rendering
has been generated from the following L
A
T
E
X source:
\begin{calc}[(3.16) $(p \nequiv q) \equiv (q \nequiv p)$]
p \nequiv q
\CalcStep{=}{Def.~of $\nequiv$ (3.10)}
\lnot(p \equiv q)
\CalcStep{=}{Symmetry of $\equiv$ (3.2)}
\lnot(q \equiv p)
\CalcStep{=}{Def.~of $\nequiv$ (3.10), with $p, q \becomes q, p$}
q \nequiv p
\end{calc}
The L
A
T
E
X macros have been kept as unobtrusive as possible, with the aim of
letting the skill of producing -checked proofs directly improve the skill
of producing hand-written proofs in the exams.
Running on an input le containing the above L
A
T
E
X fragment
produces the following output to an HTML le, and also in Unicode to the
terminal:
2
Proving (3.16) (p q) (q p):
p q
DeI.~oI $\nequiv$ (3.10)
CalcCheck-0.2.12: (3.10) DeIinition oI - OK
(p q)
Symmetry oI $\equiv$ (3.2)
CalcCheck-0.2.12: (3.2) Symmetry oI - OK (no change)
(q p)
DeI.~oI $\nequiv$ (3.10), with $p, q \becomes q, p$
CalcCheck-0.2.12: (3.10) DeIinition oI - OK
q p
CalcCheck-0.2.12: ProoI matches goal - OK
2
output included in this paper has been rendered by a WWW browser from
the -generated HTML les.
218 W. Kahl
This output is only produced if there are no syntax errors, and contains the
relevant parts of the input together with additional annotations:
The optional argument of the {calc} environment is the proof goal; in this
case, the goal is recognised as one of the numbered LADM theorems.
attempts to verify that the whole proof, (p q) = . . . = . . . =
(q p) is actually a proof of the goal, assuming all steps are correct. LADM
includes a number of dierent patterns how such calculational proofs can
satisfy their goals (similar to the optional method argument of proof in
Isabelle/Isar [Nip03], but rarely made explicit in LADM).
In LADM, each proof step requires a hint stating the theorem(s) applied
in this step; attempts to verify for each proof step that it can
be obtained from the theorems mentioned in the hint. Currently,
relies on the theorem numbers, e.g., (3.10), but it is planned to make it
recognise also theorem names, e.g. alsoDef. , that are perfectly acceptable
in the context of hand-written mathematics.
3
Therefore, rst of all
reports which theorems it recognises as mentioned in the hint, or Could not
extract information if it recognised none. Following that, it adds OK if
it can derive the proof step from these theorems, and could not justify this
step otherwise.
For an example of the latter, here is the output for one student proof
the rst could not justifyshould really have alerted the student to the simple
typo here (v for r in the second expression), and looking closely at the second
could not justify would have revealed that the referenced theorem number
belongs to a dierent theorem:
p q r
(3.59) Alternative DeIinition oI Implication
CalcCheck-0.2.12: (3.59) DeIinition oI ; could not justiIy this step!
p (q v)
(3.46) Distributivity oI $\lor$ over $\land$
CalcCheck-0.2.12: (3.46) Distributivity oI over ; could not justiIy this step!
(p q) (p r)
(3.59) Alternative DeIinition oI Implication
CalcCheck-0.2.12: (3.59) DeIinition oI - OK
(p q) (p r)
is not complete, that is, it cannot justify all acceptable correct proof
steps, and, due to the not-fully-formal nature of LADM proofs, also never will be
3
The course website continues to list the same rules as in the previous year:
Theorem numbers are never necessary for marks in this course
Theorem numbers are nice for disambiguation [. . . ]
Typically, a hint with just one of [name of the theorem], [theorem number], and [the
theorem [...], that is, the Boolean expression] is acceptable, although not necessarily
nice. [. . . ]
CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 219
complete. However, for the central LADM Chapter 3 Propositional Calculus,
can certify all correct proofs that are given in sucient detail, which
is rarely more that given in typical LADM proofs.
For predicate logic (chapters 89) and the theories of sets, functions, and
relations (chapters 11 and 14), occasionally more detail is required; for example in
the following proof about the domain of relations, one would normally probably
hope to be able to contract another two or three of the eight steps into larger
steps:
x {p p R fst.p}
= (11.3) Membership in set comprehension, with occurs(p, b)
( p p R x = fst.p)
= (8.21p) Pair dummy expansion
( b, c (p R)[p := b, c] (x = fst.p)[p := b, c])
= (14.4p) Pair projection
( b, c b, c R x = b)
= (9.19) Trading for , (1.3) Symmetry of =
( b, c b = x b, c R)
= (8.20) Nesting, (8.14) One-point rule
(c x, c R)
= Changing relational notation
( c x R c)
= (11.7)
x {x ( c x Rc)}
= (14.16) Domain of relations
x Dom.R
The resulting output below demonstrates some additional features:
Provisos concerning variable binding are derived automatically from the the-
orem statement, and always documented in the output.
Proviso handling is still incomplete occurs(b, c, R) fails to interpret R
as a meta-variable. This proviso should be a global assumption, but handling
of such assumptions is also still missing. Nevertheless, the listing of the used
occurs assumptions is helpful especially for students who are new to the
intricacies of variable binding.
In cases where does not understand the hint (Could not extract
information), it still accepts certain trivial steps, in the case here a change of
input notation that is not reected in the abstract syntax, and therefore also
does not inuence the output. (Merging this change of notation
step with one of the previous steps would of course be accepted, too, but has
been left separate here for demonstration.)
220 W. Kahl
can evaluate substitutions this happens here at the occur-
rence of the one-point rule (8.14). However, second-order matching is not yet
implemented. Therefore, certain applications of rules involving substitution
require the user to make this matching explicit; here, this is the case for the
result of the second step, which uses the following rule not found in LADM:
(8.21p) Pair Dummy Expansion: Provided occurs(x, y, R, P),
( p : t
1
t
2
R P)
= ( x : t
1
; y : t
2
R[p := x, y] P[p := x, y])
(The output below also demonstrates some deviations from LADM
notation: Quantication and set comprehension {. . . . . . . . .} use a bullet
instead of a colon, since the colon is used also for typing, and is less visually
separating. Also, pairs are displayed (x, y) instead of x, y, but both notations
are accepted in input.)
x p , p R Ist.p}
(11.3) Membership in set comprehension, with $\lnot\occursp}b}$
CalcCheck-0.2.12: (11.3) Set membership
- OK: occurs(p`, x`)
( p , p R x Ist.p )
(8.21p) Pair dummy expansion
CalcCheck-0.2.12: (8.21p) Pair dummy expansion
- OK: occurs(b, c`, x Ist.p, p R`)
( b, c , (p R)|p : (b, c)| (x Ist.p)|p : (b, c)| )
(14.4p) Pair projection
CalcCheck-0.2.12: (14.4p) Pair projection - OK
( b, c , (b, c) R x b )
(9.19) Trading Ior $\exists$, (1.3) Symmetry oI $$
CalcCheck-0.2.12: (9.19) Trading Ior , (1.3) Symmetry oI - OK
( b, c , b x (b, c) R )
(8.20) Nesting, (8.14) One-point rule
CalcCheck-0.2.12: (8.20) QuantiIication nesting, (8.14) One-point rule
- OK: occurs(b`, x`)
( c , (x, c) R )
Changing relational notation
CalcCheck-0.2.12: Could not extract inIormation - OK (no change)
( c , (x, c) R )
(11.7)
CalcCheck-0.2.12: (11.7) x x , R } R - OK
x x , ( c , (x, c) R ) }
(14.16) Domain oI relations
CalcCheck-0.2.12: (14.16) Domain
- OK: occurs(c, x`, R`)
x Dom.R
CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 221
In addition to this support for checking calculational proofs, also has
initial support for checking declarations produced as part of formalisation exer-
cises, or English to Logic translation (LADM chapters 2, 5, and sections 8.1
and 9.3).
Using during the work on their assignments does give students a
useful rst taste of proof certication, and increases their ability to produce and
appreciate rigorous proofs.
Section 3 presents additional features of , and in Sect. 4 we further
explain the use of in the course setting. Section 5 explains the main
challenges encountered producing formal support for the particular kind of semi-
formal mathematics practised in LADM, and in Sect. 6 we quickly describe the
current implementation.
During the term of initial development, was made available to the
students both as source code and as compiled executables for their most common
computing platform; it is now available via http://CalcCheck.McMaster.CA/.
2 Related Work
The only related system I am currently aware of that uses a L
A
T
E
X-based input
syntax is Spiveys Z type-checker f UZZ [Spi08], which analyses declarations and
expressions of the Z specication notation [Spi89], and performs syntax- and type-
checking. An argue environment is provided for typesetting calculational proofs,
but f UZZ does no proof-checking, and also does not type-check the contents of ar-
gue environments. It is possible to turn argue proofs into legal zed expressions by
commenting out the proof hints at the T
E
X-level; although these zed expressions
can then be type-checked by f UZZ, this is still an unsatisfactory kludge.
All other systems use their own specic input syntax.
A general-purpose proof assistant that has been used for teaching, including
rst-year courses [HR96,BZ07] is Mizar, which pioneered formalisation of the
structure of conventional mathematical proofs. The resulting large proof struc-
ture language also appears to be a central topic of the Mizar-based courses, which
makes that approach quite dierent in avour than the emphasis of LADM on
calculational proofs.
SASyLF [ASS08] is a proof checker designed specically for teaching program-
ming language theory and type theory (to graduate students); it has special
syntax to present denitions of syntax, semantics, and typing rules of object
languages, and checks structured proofs of language theoretical properties. Aldrich
et al. [ASS08] report extensively on their eorts to evaluate the pedagogic eects
of using their proof checker, and emphasise in particular the early feedback aspect.
Several systems are available that provide support for Hilbert-style proofs,
including Tutch [ACP01] (which concentrates on intuitionistic logics), EPTS
[ABP
+
04], and the Logic Daemon interactive website accompanying Allen and
Hands Logic Primer [AH01]. While ETPS seems to be used mainly via an in-
teractive user interface, and the Logic Daemon is available only as a web service,
Abel et al. argue [ACP01] that the batch-mode operation of Tutch, where edit-
ing is separate from proof checking, and the proof checker is used similarly to a
222 W. Kahl
programming language compiler, is advantageous for acquiring tool-independent
proof skills. (The proof programming facilities of Tutch also allow more struc-
tured proofs.)
Yet another approach to tool support for teaching logic concentrates on model
construction and exploration; several of the systems described in [GRB93] fall
into this category.
3 CALCCHECK Overview
The current usage paradigm of follows that of Spiveys Z type-checker
f UZZ [Spi08]: The user writes a L
A
T
E
X source le using a dedicated L
A
T
E
X package
dening the rendering of special-purpose T
E
X macros, and while this le can di-
rectly be processed using L
A
T
E
X for typesetting, it can also be passed to
for analysis of the formal content. Not all T
E
X mathematics is analysed, but only
that contained in the following special environments:
{calc} environments contain calculational proofs, and also displayed math-
ematical expressions (which could be understood as zero-step calculational
proofs).
{decls} environments contain declarations and denitions.
For declarations, inside the decls environment the following special macros are
available:
\declType for type declarations (type annotations in other contexts just
use :).
\declEquiv for denition of propositions and predicates declared as
equivalent
\declEqu for denition of other constants and functions declared as
equal
\remark for remarks at the end of a line
\also to separate multiple declarations
Furthermore, natural-language fragments are permitted in \mbox{. . . }, making
it possible to assign, in a formal manner, informal meaning to formal identiers,
following the practise of LADM. (To avoid confusion with the use of the colon in
type declarations and annotations, we render \declEquiv as : and \declEqu
as :=, whereas LADM tends to use just : there, too.)
With this, the formalisation of the LADM example sentence Henry VIII had
one son and Cleopatra had two proceeds as follows:
We declare:
\begin{decls}
h \declEquiv \mbox{Henry VIII had one son}
\also
c \declEquiv \mbox{Cleopatra had two sons}
\end{decls}
Then the original sentence is formalised as:
\begin{calc}
h \land c
\end{calc}
We declare:
h : Henry VIII had one son
c : Cleopatra had two sons
Then the original sentence is
formalised as:
h c
CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 223
Relating formal identiers to their informal meaning can be achieved via em-
bedding informal material inside formal denitions in \mbox{. . . }, or by adding
a \remark{. . . } to a formal declaration both are ignored by .
\begin{decls}
P \declEqu \mbox{set of persons}
\also
A \declType P \remark{Alex}
\also
J \declType P
\also
J \declEqu \mbox{Jane}
\end{decls}
P := set of persons
A : P Alex
J : P
J := Jane
Functions and predicates can be introduced with conventional denitions, again
either informal, that is, in \mbox{. . . }, of formal. For hard line breaks inside
formal material, there is a \BREAK macro (that can also be used in {calc}
environments). ignores most common L
A
T
E
X spacing commands.
\begin{decls}
called \declType P \times P \tfun \BB
\also
called(p,q)
\declEquiv
\mbox{$p$ called $q$}
\also
lonely \declType P \tfun \BB
\also
lonely . p
\declEquiv
\lnot (\exists \ q : P
\BREAK \strut\;
\withspot called(q,p) )
\end{decls}
called : P P B
called(p, q) : p called q
lonely : P B
lonely.p : ( q : P
called(q, p))
Most features of the {calc} environment have already been introduced in Sect. 1.
If the optional goal argument is provided, the goal may be shown also by proving
it equal to an already-known theorem; the special macro \ThisIs{. . . } is used
to refer to that theorem in what is typeset as a comment (following LADM
practise), but checked by . Such a \ThisIs{. . . } annotation can follow
either the rst or the last line of a proof.
\begin{calc}[(3.5)
Reflexivity of $\equiv$, $p \equiv p$
]
p \equiv p
\CalcStep{=}{(3.3) Identity of $\equiv$}
\true
\ThisIs{(3.4)}
\end{calc}
Proving (3.5) Reexivity of , p p:
p p
= (3.3) Identity of
true This is (3.4)
Throughout these example, it should be obvious that the eort involved in pro-
ducing input is almost completely contained in the eort necessary for
224 W. Kahl
producing L
A
T
E
X source for the desired output. occasionally prescribes
the use of particular L
A
T
E
X macros, but rarely requires truly additional eort.
Even with respect to the choice of L
A
T
E
X macros, is more lenient than
f UZZ, by allowing also standard L
A
T
E
X macros like \wedge and \vee instead
of the more mnemonic \land and \lor proposed for use with . This
decision was made to lower the friction for students who are not only new to
, but at the same time also new to L
A
T
E
X, and, at least in some in-
stances, tended to use the rst macro they found in any L
A
T
E
X-related material
for the symbol they had to produce.
4 Teaching with CALCCHECK
This rst time that was used, it was developed while the course was
delivered. Once had been fully introduced into the course, the following
rule was added to the weekly assignments:
You must submit a L
A
T
E
X le with correct syntax with
syntax errors or L
A
T
E
X errors, your submission earns 0 points.
To emphasise the dierence between the phases of syntax analysis and proof
checking, produces, after successful parsing, the following message:
CalcCheck-0.2.11: No syntax errors.
CalcCheck-0.2.11: Now checking...
At the same time it was emphasised that the students retained full responsibility
for the correctness of their submitted proofs: If were to OK an
incorrect step it would still count as a mistake this rule was stated only for
pedagogical reasons, to alert students to the fact that even mechanised proving
systems are not necessarily to be trusted; although is not formally
veried, I still have high condence that it is sound.
On the other hand, where could not justifya step that the markers
found to be correct, it still earned full marks. In the context of propositional
logic, such cases were limited to single proof steps involving more than two
rewrite steps, since for certain rules, even two steps could involve a lengthy
search. Therefore, for propositional logic, students never had to submit a proof
with steps that could not justify; they always had the choice of
making the intermediate steps explicit to obtain a fully checked proof (with
the run nishing much faster). Some students nevertheless had the
condence to submit correct but uncertied larger steps.
Since rules with provisos were not implemented during the course, proofs in
predicate logic and set theory were expected to contain steps that
could not justify. (And some had the condence to submit incorrect steps, in
les without syntax errors, so that one would expect that they had seen that
could not justify their work.)
CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 225
5 Formalising LADM
Even though LADM is certainly one of the most rigorous textbooks for discrete
mathematics currently available, it makes no claim to present a formal system in
full detail, and implementing mechanical support for LADM does in fact show
up a number of issues that are not covered conclusively by the book.
For example, the conjunction and disjunction operators and are assigned
the same precedence, but no rule is given for aggregation among occurrences
of dierent operators of the same precedence. The rule that All nonassociative
operators associate to the left, except . . . does not cover the expression 53+2,
although terms like that do occur in LADM, and are, as usual, interpreted as
unambiguously denoting (53)+2. The current version of generalises
this to arbitrary (non-right associating) operators of the same precedence, so that
p q r (which does not occur in LADM, but is also not explicitly forbidden)
denotes (p q) r by virtue of association to the left. However, since many
students routinely omit any parentheses between and , no matter what the
intended structure is, it is probably more useful to just forbid unparenthesised
occurrences of these operators together. (In such cases, does show the
inserted parentheses in its output, but at least some students do not use this as
help.)
Another precedence-related issue concerns inx notation for membership in
relations, where LADM says (p. 269):
In general, for any relation :
b, c and b c are interchangeable notations.
[. . . ] By convention, the precedence of a name of a relation that is used
as a binary inx operator is the same as the precedence of =; furthermore,
is considered to be conjunctional.
Now the name of a relation that is used as a binary inx operator can be a
complex expression; LADM p. 272 contains a proof where not only ( ) is
used as inx operator, but also ( ) (without enclosing parentheses),
producing the expression a ( ) d. Extending this, LADM appears to
allow us to write
a ( ) (b + c) S ,
and, due to conjunctionality, this has to parse as
a, b + c (( ) ) (b + c) S ,
although locally, both parenthesised expressions could also be arguments of func-
tion applications, which means that the following parse would be legal, too:
((a ( )) ( (b + c))) S
Therefore, the grammar resulting from a strict reading of the LADM rules for
inx relations is ambiguous. Although this ambiguity probably can always be re-
solved via type checking, where sucient typing information has been supplied,
226 W. Kahl
this is still not only non-trivial to implement, but also potentially quite confus-
ing for students. Currently, does not accept unparenthesised binary
operator applications as inx relation names.
Another area full of pitfalls for any not-fully-formal system is that of variable
binding. The approach of LADM towards variable binding is probably best char-
acterised as rst-order abstract syntax with implicit metavariable binding, and
with a slight tendency to use object-level language also on the meta-level, and
to treat substitutions as explicit, as demonstrated most clearly by the extension
of the denition of textual substitution to cover quantication:
(8.11) Provided occurs(y, x, F),
( y R P)[x := F] = ( y R[x := F] P[x := F])
LADM introduces a general quantication syntax for arbitrary abelian monoids;
if is a symmetric and associative operator and has a unit, then Expression
( x : X R : P) denotes the application of operator to the values P for
all x in X for which range R is true.
4
They do point out that, as a result,
not all quantications are dened, and some theorems include denedness of
certain quantications as provisos, which will, in general, not be decidable for
an automatic proof checker.
It appears to me that the provided axioms for quantication are insucient
to prove ( x, y R P) = ( y, x R P) without side conditions
restricting variable occurrences, but this is silently used in the proof of (8.22) on
p. 151, so I assume this as an additional quantication axiom.
In the chapters introducing quantication (8 and 9), potential capture or
rebinding of variables is dealt with carefully via explicit provisos formulated
using the formal meta-level predicate occurs( , ) taking a list of variables and
a list of expressions as arguments. However, in the chapter on set theory, many
necessary provisos are omitted from theorem statements without warning; it even
happens that a proviso is checked in a proof hint where the invoked theorem was
stated without provisos. As mentioned in the introduction, calculates
these binding-related provisos from the theorem statement by checking for each
metavariable whether it occurs under dierent sets of binders. This calculation
needs to take implicitly bound metavariables into account, too for example,
the following theorem needs no proviso:
(11.7) x {x R x} R
This is because both occurrences of R are in the scope of a binder for x, where
the binder for the RHS occurrence is the implicit meta-level binder induced by
the free occurrence of x in the LHS.
An area where LADM is more explicitly informal is that of higher-level proof
structures. Although proof techniques like assuming the antecedent and case
analysis are introduced in chapter 4 with a formal-looking syntax, this syntax
4
After putting this to a vote in the course, we replaced the second : in the quanti-
cation notation with the also used in the Z notation in that place.
CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 227
is not used in later applications of these techniques. It therefore appears to be
sensible to refer to existing systems that do oer high-level proof structuring like
Mizar [GKN10,NK09] or Isabelle/Isar [Nip03] for the purpose of designing an
appropriate variant for future versions of .
6 Implementation Aspects
is currently implemented in less than 5000 lines of Haskell [HHPJW07].
The front-end uses the monadic parser combinator library Parsec [LM01].
Since LADM pushes its readers very early and very hard to think about
theorems up to permutations of the arguments of associative and commutative
operators, AC matching is required, and Contejeans AC matching algorithm
[Con04] was adapted for the implementation. Currently, proof checking is almost
purely based on breadth-rst search in the rewriting relation generated by the
non-AC laws. (AC laws do not need to be applied since they are identities on the
abstract syntax for AC expressions.) This can fan out very quickly for certain
rules in larger terms, but in most cases, performance is not an issue. The depth of
the search is currently limited to two applications of rewrite rules induced by any
of the theorems that could identify as referenced by the \CalcStep
hint (currently only by their theorem number). As a matter of proof presentation,
two steps are almost always adequate: One would occasionally wish for three or,
in very large, repetitive expressions, even four rule application in a single proof
step, but rarely for more.
Once all requirements are settled, we envisage a reimplementation that itself
has a mechanised correctness proof, and might therefore move from Haskell to a
dependently-typed programming language, for example Agda [Nor07].
7 Conclusion and Future Work
Although LADM essentially concentrates on teaching rigorous informal mathe-
matics, at least large parts are accessible to formal treatment. Since there appears
to be no previous mechanised support for LADM, we contributed a mechanised
proof checker, , intended to be used for teaching with LADM, and
therefore required to be useful without demanding signicant extra eort of for-
mality beyond the use of L
A
T
E
X.
The LADM logic is not intended for mechanisation, but rather for training stu-
dents in successful communication of rigorous mathematical arguments. Forcing
proofs to be integrated into L
A
T
E
X documents is, in our opinion, more conducive
to this goal than using a stand-alone syntax, and is, in fact, very similar to the
spirit of literate programming [Knu84,Knu92].
In addition, acquiring the CV-listable skill of using L
A
T
E
X for document
formatting appears to be more attractive for many students than learning the
special-purpose syntax of an academic tool. Since a -checked assign-
ment submission is rst of all a L
A
T
E
X document, and the -specic
syntax has intentionally been designed as a set of minor constraints on the use
228 W. Kahl
of a particular set of L
A
T
E
X macros, the use of appears to be perceived
by the students to come with the cost of having to learn (some) L
A
T
E
X, but oth-
erwise just do its job, namely to aid them with producing correct proofs, but
without requiring non-reusable special-purpose skills.
Future continued development of will strive to at least preserve this
accessibility.
For improving the user experience, we plan to more fully support Unicode
source les; already parses Unicode representations of most LADM
symbols where this is appropriate; work is now needed mostly on the L
A
T
E
X
side. With that, not only the L
A
T
E
X and outputs will be similar in
appearance to handwritten variant that students will continue to be expected
to produce, but also the source le they are editing, hopefully further increasing
the overall accessibility of the experience.
Another signicant usability improvement would come from exible under-
standing of theorem names in the proof step hints, so that students do not
need to memorise or look up the theorem numbers all the time, but can instead
concentrate of learning the theorem names, which will be much more useful in
the long term.
As mentioned previously, support for higher-level proof structure is still miss-
ing, and this includes explicit assumption management also for the purpose of
properly treating global occurs assumptions.
Proper dependency management needs to be added, and this goes beyond com-
paring theorem numbers, where usually, the proof of a theorem with number n
may only use theorems with smaller numbers. However, if a theorem with a larger
number n + k can be shown using only theorems with numbers smaller than n,
then theoremn+k may be used in the proof of theoremn, and such detours can be
important for didactic purposes. will therefore need to keep track of the
precise dependencies between the proofs contained in the checked le in addition
to the theorem number ordering of the reference theorem list.
Dependency management also aects matching, and even display of expres-
sions: Only operators for which associativity and commutativity laws are in scope
can be treated accordingly by the AC matching mechanism, and have parentheses
omitted in output. It might be useful to add, for example, self-inverse operations
like Boolean negation and relation converse, to special treatment by future exten-
sions of the current AC matching mechanism.
Since substitution theorems like
(3.84a) e = f E[z := e] e = f E[z := f ]
are normally applied without making the substitution involved explicit, second-
order matching is necessary. However, being able to switch it o may still be useful
for didactic purposes.
Particularly pressing is the addition of type-checking, with understandable type
error messages. For this, it should be possible to build on previous research con-
cerning type error messages in programming languages, e.g. [Hee05]. Note that the
LADM notations used for universe sets and set complements depend normally
CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 229
on implicit type arguments (and possibly on explicit xing the universe of dis-
course to an arbitrary set, which does not need to be a type), so without type-
checking, it is impossible to check most proofs involving properties of complement.
Sometimes, students appear to give up in their attempts of producing a fully
OK-ed proof, and assume that their could not justifymessages are due only to
limitations of , even in cases where the step in question is invalid, so that
no possible hint can justify it. Not only in propositional logic, but also in purely
propositional steps of predicate logic proofs, validity of proof steps is decidable,
and reporting invalid steps will be a useful aid for students.
Although individual students reported that they found taught them
to know more precisely what they were doing when doing mathematics, the main
measurable didactic eect of using in the past year appears to have been
that students now routinely produced syntactically correct formulae even on the
hand-written exam and outside calculational proofs, that is, in particular in for-
malisation exercises this was not the case in the previous year. Once the stu-
dents are using an accessible system that is similarly strict in pointing out type
errors and invalid proof steps, this should make a further noticeable dierence in
the resulting active language skills in the language of discrete mathematics.
References
ABP
+
04. Andrews, P.B., Brown, C.E., Pfenning, F., Bishop, M., Issar, S., Xi, H.:
ETPS: A system to help students write formal proofs. Journal of Auto-
mated Reasoning 32, 7592 (2004),
doi:10.1023/B:JARS.0000021871.18776.94
ACP01. Abel, A., Chang, B.-Y.E., Pfenning, F.: Human-readable machine-
veriable proofs for teaching constructive logic. In: Proceedings of Work-
shop on Proof Transformation, Proof Presentation and Complexity of
Proofs (PTP 2001). Universit`a degli Studi Siena, Dipartimento di Ingeg-
neria dellInformazione, Tech. Report 13/0 (2001),
http://www2.tcs.ifi.lmu.de/~abel/tutch/
AH01. Allen, C., Hand, M.: Logic Primer, 2nd edn. MIT Press (2001),
http://logic.tamu.edu/
ASS08. Aldrich, J., Simmons, R.J., Shin, K.: SASyLF: An educational proof assis-
tant for language theory. In: Huch, F., Parkin, A. (eds.) Proceedings of the
2008 International Workshop on Functional and Declarative Programming
in Education, FDPE 2008, pp. 3140. ACM (2008)
BZ07. Borak, E., Zalewska, A.: Mizar Course in Logic and Set Theory. In: Kauers,
M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS
2007. LNCS (LNAI), vol. 4573, pp. 191204. Springer, Heidelberg (2007)
Con04. Contejean, E.: A Certied AC Matching Algorithm. In: van Oostrom, V.
(ed.) RTA 2004. LNCS, vol. 3091, pp. 7084. Springer, Heidelberg (2004)
GKN10. Grabowski, A., Kornilowicz, A., Naumowicz, A.: Mizar in a nutshell. J.
Formalized Reasoning 3(2), 153245 (2010)
GRB93. Goldson, D., Reeves, S., Bornat, R.: A review of several programs for the
teaching of logic. The Computer Journal 36, 373386 (1993)
230 W. Kahl
GS93. Gries, D., Schneider, F.B.: A Logical Approach to Discrete Math. Mono-
graphs in Computer Science. Springer, Heidelberg (1993)
Hee05. Heeren, B.: Top Quality Type Error Messages. PhD thesis, Universiteit
Utrecht, The Netherlands (September 2005)
HHPJW07. Hudak, P., Hughes, J., Jones, S.P., Wadler, P.: A history of Haskell: Being
lazy with class. In: Third ACM SIGPLAN History of Programming Lan-
guages Conference (HOPL-III), pp. 1211255. ACM (2007)
HR96. James Hoover, H., Rudnicki, P.: Teaching freshman logic with mizar-mse.
Mathesis Universalis, 3 (1996),
http://www.calculemus.org/MathUniversalis/3/; ISSN 1426-3513
Knu84. Knuth, D.E.: Literate programming. The Computer Journal 27(2), 97111
(1984)
Knu92. Knuth, D.E.: Literate Programming. CSLI Lecture Notes, vol. 27. Center
for the Study of Language and Information (1992)
LM01. Leijen, D., Meijer, E.: Parsec: Direct style monadic parser combinators for
the real world. Technical Report UU-CS-2001-27, Department of Computer
Science, Universiteit Utrecht (2001),
http://www.cs.uu.nl/~daan/parsec.html
Nip03. Nipkow, T.: Structured Proofs in Isar/HOL. In: Geuvers, H., Wiedijk, F.
(eds.) TYPES 2002. LNCS, vol. 2646, pp. 259278. Springer, Heidelberg
(2003)
NK09. Naumowicz, A., Kornilowicz, A.: A Brief Overview of Mizar. In:
Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009.
LNCS, vol. 5674, pp. 6772. Springer, Heidelberg (2009)
Nor07. Norell, U.: Towards a Practical Programming Language Based on Depen-
dent Type Theory. PhD thesis, Department of Computer Science and En-
gineering, Chalmers University of Technology (September 2007)
Spi89. Spivey, J.M.: The Z Notation: A Reference Manual. Prentice Hall Inter-
national Series in Computer Science. Prentice Hall (1989), Out of print;
available via http://spivey.oriel.ox.ac.uk/mike/zrm/
Spi08. Spivey, M.: The fuzz type-checker for Z, Version 3.4.1, and The fuzz Man-
ual, 2 edn. (2008), http://spivey.oriel.ox.ac.uk/mike/fuzz/ (last ac-
cessed June 17, 2011)

You might also like