The document describes a proof checker called CALCCHECK designed for Gries and Schneider's textbook "A Logical Approach to Discrete Math". CALCCHECK takes LaTeX-formatted proofs as input and checks that each step is justified by the theorems referenced. It provides feedback to help students determine if their proofs are correct and develop skills in rigorous mathematical writing. While not complete, CALCCHECK can verify most proofs in the textbook's chapters on propositional and predicate logic.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
329 views15 pages
Proof Checker PDF
The document describes a proof checker called CALCCHECK designed for Gries and Schneider's textbook "A Logical Approach to Discrete Math". CALCCHECK takes LaTeX-formatted proofs as input and checks that each step is justified by the theorems referenced. It provides feedback to help students determine if their proofs are correct and develop skills in rigorous mathematical writing. While not complete, CALCCHECK can verify most proofs in the textbook's chapters on propositional and predicate logic.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15
The Teaching Tool :
A Proof-Checker for Gries and Schneiders
Logical Approach to Discrete Math Wolfram Kahl McMaster University, Hamilton, Ontario, Canada [email protected] Abstract. Students following a rst-year course based on Gries and Schneiders LADM textbook had frequently been asking: How can I know whether my solution is good? We now report on the development of a proof-checker designed to answer exactly that question, while intentionally not helping to nd the solutions in the rst place. provides detailed feedback to L A T E X- formatted calculational proofs, and thus helps students to develop con- dence in their own skills in rigorous mathematical writing. Gries and Schneiders book emphasises rigorous development of math- ematical results, while striking one particular compromise between full formality and customary, more informal, mathematical practises, and thus teaches aspects of both. This is one source of several unusual re- quirements for a mechanised proof-checker; other interesting aspects arise from details of their notational conventions. 1 Introduction When teaching a rst-year course on Logic and Discrete Mathematics for Com- puter Science following Gries and Schneiders textbook A Logical Approach to Discrete Math(LADMfor short) [GS93] for the rst time, I obtained feedback from students feeling that the book did not contain suciently many worked ex- amples, that insucient solutions for exercises were available 1 , and, especially, that they felt at a loss since they did not see any way of knowing how good their answers were before the marked assignment was returned to them. The following year (2011), I therefore started to implement , a tool intended mainly as a proof-checker for the calculational proof style taught by LADM. For the time being, the usage paradigm of is the same as that of Spiveys Z type-checker f UZZ: also operates on L A T E X source by parsing and analysing the contents of specic formal environments, and providing feedback on those. Using L A T E X as input syntax has the advantage that students learn a general-purpose skill, with only very little formalism-specic overhead. 1 AnInstructors Manual containing solutions exists, but is made available explicitly only to instructors, with the proviso that answers to selected exercises may be used in lectures or distributed to students as answers to homeworks or tests. J.-P. Jouannaud and Z. Shao (Eds.): CPP 2011, LNCS 7086, pp. 216230, 2011. c Springer-Verlag Berlin Heidelberg 2011 CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 217 For example, the following proof can be found on p. 46 of LADM (without the Proving line): Proving (3.16) (p q) (q p): p q = Def. of (3.10) (p q) = Symmetry of (3.2) (q p) = Def. of (3.10), with p, q := q, p q p Using the L A T E X macro package accompanying , this proof rendering has been generated from the following L A T E X source: \begin{calc}[(3.16) $(p \nequiv q) \equiv (q \nequiv p)$] p \nequiv q \CalcStep{=}{Def.~of $\nequiv$ (3.10)} \lnot(p \equiv q) \CalcStep{=}{Symmetry of $\equiv$ (3.2)} \lnot(q \equiv p) \CalcStep{=}{Def.~of $\nequiv$ (3.10), with $p, q \becomes q, p$} q \nequiv p \end{calc} The L A T E X macros have been kept as unobtrusive as possible, with the aim of letting the skill of producing -checked proofs directly improve the skill of producing hand-written proofs in the exams. Running on an input le containing the above L A T E X fragment produces the following output to an HTML le, and also in Unicode to the terminal: 2 Proving (3.16) (p q) (q p): p q DeI.~oI $\nequiv$ (3.10) CalcCheck-0.2.12: (3.10) DeIinition oI - OK (p q) Symmetry oI $\equiv$ (3.2) CalcCheck-0.2.12: (3.2) Symmetry oI - OK (no change) (q p) DeI.~oI $\nequiv$ (3.10), with $p, q \becomes q, p$ CalcCheck-0.2.12: (3.10) DeIinition oI - OK q p CalcCheck-0.2.12: ProoI matches goal - OK 2 output included in this paper has been rendered by a WWW browser from the -generated HTML les. 218 W. Kahl This output is only produced if there are no syntax errors, and contains the relevant parts of the input together with additional annotations: The optional argument of the {calc} environment is the proof goal; in this case, the goal is recognised as one of the numbered LADM theorems. attempts to verify that the whole proof, (p q) = . . . = . . . = (q p) is actually a proof of the goal, assuming all steps are correct. LADM includes a number of dierent patterns how such calculational proofs can satisfy their goals (similar to the optional method argument of proof in Isabelle/Isar [Nip03], but rarely made explicit in LADM). In LADM, each proof step requires a hint stating the theorem(s) applied in this step; attempts to verify for each proof step that it can be obtained from the theorems mentioned in the hint. Currently, relies on the theorem numbers, e.g., (3.10), but it is planned to make it recognise also theorem names, e.g. alsoDef. , that are perfectly acceptable in the context of hand-written mathematics. 3 Therefore, rst of all reports which theorems it recognises as mentioned in the hint, or Could not extract information if it recognised none. Following that, it adds OK if it can derive the proof step from these theorems, and could not justify this step otherwise. For an example of the latter, here is the output for one student proof the rst could not justifyshould really have alerted the student to the simple typo here (v for r in the second expression), and looking closely at the second could not justify would have revealed that the referenced theorem number belongs to a dierent theorem: p q r (3.59) Alternative DeIinition oI Implication CalcCheck-0.2.12: (3.59) DeIinition oI ; could not justiIy this step! p (q v) (3.46) Distributivity oI $\lor$ over $\land$ CalcCheck-0.2.12: (3.46) Distributivity oI over ; could not justiIy this step! (p q) (p r) (3.59) Alternative DeIinition oI Implication CalcCheck-0.2.12: (3.59) DeIinition oI - OK (p q) (p r) is not complete, that is, it cannot justify all acceptable correct proof steps, and, due to the not-fully-formal nature of LADM proofs, also never will be 3 The course website continues to list the same rules as in the previous year: Theorem numbers are never necessary for marks in this course Theorem numbers are nice for disambiguation [. . . ] Typically, a hint with just one of [name of the theorem], [theorem number], and [the theorem [...], that is, the Boolean expression] is acceptable, although not necessarily nice. [. . . ] CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 219 complete. However, for the central LADM Chapter 3 Propositional Calculus, can certify all correct proofs that are given in sucient detail, which is rarely more that given in typical LADM proofs. For predicate logic (chapters 89) and the theories of sets, functions, and relations (chapters 11 and 14), occasionally more detail is required; for example in the following proof about the domain of relations, one would normally probably hope to be able to contract another two or three of the eight steps into larger steps: x {p p R fst.p} = (11.3) Membership in set comprehension, with occurs(p, b) ( p p R x = fst.p) = (8.21p) Pair dummy expansion ( b, c (p R)[p := b, c] (x = fst.p)[p := b, c]) = (14.4p) Pair projection ( b, c b, c R x = b) = (9.19) Trading for , (1.3) Symmetry of = ( b, c b = x b, c R) = (8.20) Nesting, (8.14) One-point rule (c x, c R) = Changing relational notation ( c x R c) = (11.7) x {x ( c x Rc)} = (14.16) Domain of relations x Dom.R The resulting output below demonstrates some additional features: Provisos concerning variable binding are derived automatically from the the- orem statement, and always documented in the output. Proviso handling is still incomplete occurs(b, c, R) fails to interpret R as a meta-variable. This proviso should be a global assumption, but handling of such assumptions is also still missing. Nevertheless, the listing of the used occurs assumptions is helpful especially for students who are new to the intricacies of variable binding. In cases where does not understand the hint (Could not extract information), it still accepts certain trivial steps, in the case here a change of input notation that is not reected in the abstract syntax, and therefore also does not inuence the output. (Merging this change of notation step with one of the previous steps would of course be accepted, too, but has been left separate here for demonstration.) 220 W. Kahl can evaluate substitutions this happens here at the occur- rence of the one-point rule (8.14). However, second-order matching is not yet implemented. Therefore, certain applications of rules involving substitution require the user to make this matching explicit; here, this is the case for the result of the second step, which uses the following rule not found in LADM: (8.21p) Pair Dummy Expansion: Provided occurs(x, y, R, P), ( p : t 1 t 2 R P) = ( x : t 1 ; y : t 2 R[p := x, y] P[p := x, y]) (The output below also demonstrates some deviations from LADM notation: Quantication and set comprehension {. . . . . . . . .} use a bullet instead of a colon, since the colon is used also for typing, and is less visually separating. Also, pairs are displayed (x, y) instead of x, y, but both notations are accepted in input.) x p , p R Ist.p} (11.3) Membership in set comprehension, with $\lnot\occursp}b}$ CalcCheck-0.2.12: (11.3) Set membership - OK: occurs(p`, x`) ( p , p R x Ist.p ) (8.21p) Pair dummy expansion CalcCheck-0.2.12: (8.21p) Pair dummy expansion - OK: occurs(b, c`, x Ist.p, p R`) ( b, c , (p R)|p : (b, c)| (x Ist.p)|p : (b, c)| ) (14.4p) Pair projection CalcCheck-0.2.12: (14.4p) Pair projection - OK ( b, c , (b, c) R x b ) (9.19) Trading Ior $\exists$, (1.3) Symmetry oI $$ CalcCheck-0.2.12: (9.19) Trading Ior , (1.3) Symmetry oI - OK ( b, c , b x (b, c) R ) (8.20) Nesting, (8.14) One-point rule CalcCheck-0.2.12: (8.20) QuantiIication nesting, (8.14) One-point rule - OK: occurs(b`, x`) ( c , (x, c) R ) Changing relational notation CalcCheck-0.2.12: Could not extract inIormation - OK (no change) ( c , (x, c) R ) (11.7) CalcCheck-0.2.12: (11.7) x x , R } R - OK x x , ( c , (x, c) R ) } (14.16) Domain oI relations CalcCheck-0.2.12: (14.16) Domain - OK: occurs(c, x`, R`) x Dom.R CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 221 In addition to this support for checking calculational proofs, also has initial support for checking declarations produced as part of formalisation exer- cises, or English to Logic translation (LADM chapters 2, 5, and sections 8.1 and 9.3). Using during the work on their assignments does give students a useful rst taste of proof certication, and increases their ability to produce and appreciate rigorous proofs. Section 3 presents additional features of , and in Sect. 4 we further explain the use of in the course setting. Section 5 explains the main challenges encountered producing formal support for the particular kind of semi- formal mathematics practised in LADM, and in Sect. 6 we quickly describe the current implementation. During the term of initial development, was made available to the students both as source code and as compiled executables for their most common computing platform; it is now available via http://CalcCheck.McMaster.CA/. 2 Related Work The only related system I am currently aware of that uses a L A T E X-based input syntax is Spiveys Z type-checker f UZZ [Spi08], which analyses declarations and expressions of the Z specication notation [Spi89], and performs syntax- and type- checking. An argue environment is provided for typesetting calculational proofs, but f UZZ does no proof-checking, and also does not type-check the contents of ar- gue environments. It is possible to turn argue proofs into legal zed expressions by commenting out the proof hints at the T E X-level; although these zed expressions can then be type-checked by f UZZ, this is still an unsatisfactory kludge. All other systems use their own specic input syntax. A general-purpose proof assistant that has been used for teaching, including rst-year courses [HR96,BZ07] is Mizar, which pioneered formalisation of the structure of conventional mathematical proofs. The resulting large proof struc- ture language also appears to be a central topic of the Mizar-based courses, which makes that approach quite dierent in avour than the emphasis of LADM on calculational proofs. SASyLF [ASS08] is a proof checker designed specically for teaching program- ming language theory and type theory (to graduate students); it has special syntax to present denitions of syntax, semantics, and typing rules of object languages, and checks structured proofs of language theoretical properties. Aldrich et al. [ASS08] report extensively on their eorts to evaluate the pedagogic eects of using their proof checker, and emphasise in particular the early feedback aspect. Several systems are available that provide support for Hilbert-style proofs, including Tutch [ACP01] (which concentrates on intuitionistic logics), EPTS [ABP + 04], and the Logic Daemon interactive website accompanying Allen and Hands Logic Primer [AH01]. While ETPS seems to be used mainly via an in- teractive user interface, and the Logic Daemon is available only as a web service, Abel et al. argue [ACP01] that the batch-mode operation of Tutch, where edit- ing is separate from proof checking, and the proof checker is used similarly to a 222 W. Kahl programming language compiler, is advantageous for acquiring tool-independent proof skills. (The proof programming facilities of Tutch also allow more struc- tured proofs.) Yet another approach to tool support for teaching logic concentrates on model construction and exploration; several of the systems described in [GRB93] fall into this category. 3 CALCCHECK Overview The current usage paradigm of follows that of Spiveys Z type-checker f UZZ [Spi08]: The user writes a L A T E X source le using a dedicated L A T E X package dening the rendering of special-purpose T E X macros, and while this le can di- rectly be processed using L A T E X for typesetting, it can also be passed to for analysis of the formal content. Not all T E X mathematics is analysed, but only that contained in the following special environments: {calc} environments contain calculational proofs, and also displayed math- ematical expressions (which could be understood as zero-step calculational proofs). {decls} environments contain declarations and denitions. For declarations, inside the decls environment the following special macros are available: \declType for type declarations (type annotations in other contexts just use :). \declEquiv for denition of propositions and predicates declared as equivalent \declEqu for denition of other constants and functions declared as equal \remark for remarks at the end of a line \also to separate multiple declarations Furthermore, natural-language fragments are permitted in \mbox{. . . }, making it possible to assign, in a formal manner, informal meaning to formal identiers, following the practise of LADM. (To avoid confusion with the use of the colon in type declarations and annotations, we render \declEquiv as : and \declEqu as :=, whereas LADM tends to use just : there, too.) With this, the formalisation of the LADM example sentence Henry VIII had one son and Cleopatra had two proceeds as follows: We declare: \begin{decls} h \declEquiv \mbox{Henry VIII had one son} \also c \declEquiv \mbox{Cleopatra had two sons} \end{decls} Then the original sentence is formalised as: \begin{calc} h \land c \end{calc} We declare: h : Henry VIII had one son c : Cleopatra had two sons Then the original sentence is formalised as: h c CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 223 Relating formal identiers to their informal meaning can be achieved via em- bedding informal material inside formal denitions in \mbox{. . . }, or by adding a \remark{. . . } to a formal declaration both are ignored by . \begin{decls} P \declEqu \mbox{set of persons} \also A \declType P \remark{Alex} \also J \declType P \also J \declEqu \mbox{Jane} \end{decls} P := set of persons A : P Alex J : P J := Jane Functions and predicates can be introduced with conventional denitions, again either informal, that is, in \mbox{. . . }, of formal. For hard line breaks inside formal material, there is a \BREAK macro (that can also be used in {calc} environments). ignores most common L A T E X spacing commands. \begin{decls} called \declType P \times P \tfun \BB \also called(p,q) \declEquiv \mbox{$p$ called $q$} \also lonely \declType P \tfun \BB \also lonely . p \declEquiv \lnot (\exists \ q : P \BREAK \strut\; \withspot called(q,p) ) \end{decls} called : P P B called(p, q) : p called q lonely : P B lonely.p : ( q : P called(q, p)) Most features of the {calc} environment have already been introduced in Sect. 1. If the optional goal argument is provided, the goal may be shown also by proving it equal to an already-known theorem; the special macro \ThisIs{. . . } is used to refer to that theorem in what is typeset as a comment (following LADM practise), but checked by . Such a \ThisIs{. . . } annotation can follow either the rst or the last line of a proof. \begin{calc}[(3.5) Reflexivity of $\equiv$, $p \equiv p$ ] p \equiv p \CalcStep{=}{(3.3) Identity of $\equiv$} \true \ThisIs{(3.4)} \end{calc} Proving (3.5) Reexivity of , p p: p p = (3.3) Identity of true This is (3.4) Throughout these example, it should be obvious that the eort involved in pro- ducing input is almost completely contained in the eort necessary for 224 W. Kahl producing L A T E X source for the desired output. occasionally prescribes the use of particular L A T E X macros, but rarely requires truly additional eort. Even with respect to the choice of L A T E X macros, is more lenient than f UZZ, by allowing also standard L A T E X macros like \wedge and \vee instead of the more mnemonic \land and \lor proposed for use with . This decision was made to lower the friction for students who are not only new to , but at the same time also new to L A T E X, and, at least in some in- stances, tended to use the rst macro they found in any L A T E X-related material for the symbol they had to produce. 4 Teaching with CALCCHECK This rst time that was used, it was developed while the course was delivered. Once had been fully introduced into the course, the following rule was added to the weekly assignments: You must submit a L A T E X le with correct syntax with syntax errors or L A T E X errors, your submission earns 0 points. To emphasise the dierence between the phases of syntax analysis and proof checking, produces, after successful parsing, the following message: CalcCheck-0.2.11: No syntax errors. CalcCheck-0.2.11: Now checking... At the same time it was emphasised that the students retained full responsibility for the correctness of their submitted proofs: If were to OK an incorrect step it would still count as a mistake this rule was stated only for pedagogical reasons, to alert students to the fact that even mechanised proving systems are not necessarily to be trusted; although is not formally veried, I still have high condence that it is sound. On the other hand, where could not justifya step that the markers found to be correct, it still earned full marks. In the context of propositional logic, such cases were limited to single proof steps involving more than two rewrite steps, since for certain rules, even two steps could involve a lengthy search. Therefore, for propositional logic, students never had to submit a proof with steps that could not justify; they always had the choice of making the intermediate steps explicit to obtain a fully checked proof (with the run nishing much faster). Some students nevertheless had the condence to submit correct but uncertied larger steps. Since rules with provisos were not implemented during the course, proofs in predicate logic and set theory were expected to contain steps that could not justify. (And some had the condence to submit incorrect steps, in les without syntax errors, so that one would expect that they had seen that could not justify their work.) CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 225 5 Formalising LADM Even though LADM is certainly one of the most rigorous textbooks for discrete mathematics currently available, it makes no claim to present a formal system in full detail, and implementing mechanical support for LADM does in fact show up a number of issues that are not covered conclusively by the book. For example, the conjunction and disjunction operators and are assigned the same precedence, but no rule is given for aggregation among occurrences of dierent operators of the same precedence. The rule that All nonassociative operators associate to the left, except . . . does not cover the expression 53+2, although terms like that do occur in LADM, and are, as usual, interpreted as unambiguously denoting (53)+2. The current version of generalises this to arbitrary (non-right associating) operators of the same precedence, so that p q r (which does not occur in LADM, but is also not explicitly forbidden) denotes (p q) r by virtue of association to the left. However, since many students routinely omit any parentheses between and , no matter what the intended structure is, it is probably more useful to just forbid unparenthesised occurrences of these operators together. (In such cases, does show the inserted parentheses in its output, but at least some students do not use this as help.) Another precedence-related issue concerns inx notation for membership in relations, where LADM says (p. 269): In general, for any relation : b, c and b c are interchangeable notations. [. . . ] By convention, the precedence of a name of a relation that is used as a binary inx operator is the same as the precedence of =; furthermore, is considered to be conjunctional. Now the name of a relation that is used as a binary inx operator can be a complex expression; LADM p. 272 contains a proof where not only ( ) is used as inx operator, but also ( ) (without enclosing parentheses), producing the expression a ( ) d. Extending this, LADM appears to allow us to write a ( ) (b + c) S , and, due to conjunctionality, this has to parse as a, b + c (( ) ) (b + c) S , although locally, both parenthesised expressions could also be arguments of func- tion applications, which means that the following parse would be legal, too: ((a ( )) ( (b + c))) S Therefore, the grammar resulting from a strict reading of the LADM rules for inx relations is ambiguous. Although this ambiguity probably can always be re- solved via type checking, where sucient typing information has been supplied, 226 W. Kahl this is still not only non-trivial to implement, but also potentially quite confus- ing for students. Currently, does not accept unparenthesised binary operator applications as inx relation names. Another area full of pitfalls for any not-fully-formal system is that of variable binding. The approach of LADM towards variable binding is probably best char- acterised as rst-order abstract syntax with implicit metavariable binding, and with a slight tendency to use object-level language also on the meta-level, and to treat substitutions as explicit, as demonstrated most clearly by the extension of the denition of textual substitution to cover quantication: (8.11) Provided occurs(y, x, F), ( y R P)[x := F] = ( y R[x := F] P[x := F]) LADM introduces a general quantication syntax for arbitrary abelian monoids; if is a symmetric and associative operator and has a unit, then Expression ( x : X R : P) denotes the application of operator to the values P for all x in X for which range R is true. 4 They do point out that, as a result, not all quantications are dened, and some theorems include denedness of certain quantications as provisos, which will, in general, not be decidable for an automatic proof checker. It appears to me that the provided axioms for quantication are insucient to prove ( x, y R P) = ( y, x R P) without side conditions restricting variable occurrences, but this is silently used in the proof of (8.22) on p. 151, so I assume this as an additional quantication axiom. In the chapters introducing quantication (8 and 9), potential capture or rebinding of variables is dealt with carefully via explicit provisos formulated using the formal meta-level predicate occurs( , ) taking a list of variables and a list of expressions as arguments. However, in the chapter on set theory, many necessary provisos are omitted from theorem statements without warning; it even happens that a proviso is checked in a proof hint where the invoked theorem was stated without provisos. As mentioned in the introduction, calculates these binding-related provisos from the theorem statement by checking for each metavariable whether it occurs under dierent sets of binders. This calculation needs to take implicitly bound metavariables into account, too for example, the following theorem needs no proviso: (11.7) x {x R x} R This is because both occurrences of R are in the scope of a binder for x, where the binder for the RHS occurrence is the implicit meta-level binder induced by the free occurrence of x in the LHS. An area where LADM is more explicitly informal is that of higher-level proof structures. Although proof techniques like assuming the antecedent and case analysis are introduced in chapter 4 with a formal-looking syntax, this syntax 4 After putting this to a vote in the course, we replaced the second : in the quanti- cation notation with the also used in the Z notation in that place. CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 227 is not used in later applications of these techniques. It therefore appears to be sensible to refer to existing systems that do oer high-level proof structuring like Mizar [GKN10,NK09] or Isabelle/Isar [Nip03] for the purpose of designing an appropriate variant for future versions of . 6 Implementation Aspects is currently implemented in less than 5000 lines of Haskell [HHPJW07]. The front-end uses the monadic parser combinator library Parsec [LM01]. Since LADM pushes its readers very early and very hard to think about theorems up to permutations of the arguments of associative and commutative operators, AC matching is required, and Contejeans AC matching algorithm [Con04] was adapted for the implementation. Currently, proof checking is almost purely based on breadth-rst search in the rewriting relation generated by the non-AC laws. (AC laws do not need to be applied since they are identities on the abstract syntax for AC expressions.) This can fan out very quickly for certain rules in larger terms, but in most cases, performance is not an issue. The depth of the search is currently limited to two applications of rewrite rules induced by any of the theorems that could identify as referenced by the \CalcStep hint (currently only by their theorem number). As a matter of proof presentation, two steps are almost always adequate: One would occasionally wish for three or, in very large, repetitive expressions, even four rule application in a single proof step, but rarely for more. Once all requirements are settled, we envisage a reimplementation that itself has a mechanised correctness proof, and might therefore move from Haskell to a dependently-typed programming language, for example Agda [Nor07]. 7 Conclusion and Future Work Although LADM essentially concentrates on teaching rigorous informal mathe- matics, at least large parts are accessible to formal treatment. Since there appears to be no previous mechanised support for LADM, we contributed a mechanised proof checker, , intended to be used for teaching with LADM, and therefore required to be useful without demanding signicant extra eort of for- mality beyond the use of L A T E X. The LADM logic is not intended for mechanisation, but rather for training stu- dents in successful communication of rigorous mathematical arguments. Forcing proofs to be integrated into L A T E X documents is, in our opinion, more conducive to this goal than using a stand-alone syntax, and is, in fact, very similar to the spirit of literate programming [Knu84,Knu92]. In addition, acquiring the CV-listable skill of using L A T E X for document formatting appears to be more attractive for many students than learning the special-purpose syntax of an academic tool. Since a -checked assign- ment submission is rst of all a L A T E X document, and the -specic syntax has intentionally been designed as a set of minor constraints on the use 228 W. Kahl of a particular set of L A T E X macros, the use of appears to be perceived by the students to come with the cost of having to learn (some) L A T E X, but oth- erwise just do its job, namely to aid them with producing correct proofs, but without requiring non-reusable special-purpose skills. Future continued development of will strive to at least preserve this accessibility. For improving the user experience, we plan to more fully support Unicode source les; already parses Unicode representations of most LADM symbols where this is appropriate; work is now needed mostly on the L A T E X side. With that, not only the L A T E X and outputs will be similar in appearance to handwritten variant that students will continue to be expected to produce, but also the source le they are editing, hopefully further increasing the overall accessibility of the experience. Another signicant usability improvement would come from exible under- standing of theorem names in the proof step hints, so that students do not need to memorise or look up the theorem numbers all the time, but can instead concentrate of learning the theorem names, which will be much more useful in the long term. As mentioned previously, support for higher-level proof structure is still miss- ing, and this includes explicit assumption management also for the purpose of properly treating global occurs assumptions. Proper dependency management needs to be added, and this goes beyond com- paring theorem numbers, where usually, the proof of a theorem with number n may only use theorems with smaller numbers. However, if a theorem with a larger number n + k can be shown using only theorems with numbers smaller than n, then theoremn+k may be used in the proof of theoremn, and such detours can be important for didactic purposes. will therefore need to keep track of the precise dependencies between the proofs contained in the checked le in addition to the theorem number ordering of the reference theorem list. Dependency management also aects matching, and even display of expres- sions: Only operators for which associativity and commutativity laws are in scope can be treated accordingly by the AC matching mechanism, and have parentheses omitted in output. It might be useful to add, for example, self-inverse operations like Boolean negation and relation converse, to special treatment by future exten- sions of the current AC matching mechanism. Since substitution theorems like (3.84a) e = f E[z := e] e = f E[z := f ] are normally applied without making the substitution involved explicit, second- order matching is necessary. However, being able to switch it o may still be useful for didactic purposes. Particularly pressing is the addition of type-checking, with understandable type error messages. For this, it should be possible to build on previous research con- cerning type error messages in programming languages, e.g. [Hee05]. Note that the LADM notations used for universe sets and set complements depend normally CALCCHECK: A Proof-Checker for Gries and Schneiders LADM 229 on implicit type arguments (and possibly on explicit xing the universe of dis- course to an arbitrary set, which does not need to be a type), so without type- checking, it is impossible to check most proofs involving properties of complement. Sometimes, students appear to give up in their attempts of producing a fully OK-ed proof, and assume that their could not justifymessages are due only to limitations of , even in cases where the step in question is invalid, so that no possible hint can justify it. Not only in propositional logic, but also in purely propositional steps of predicate logic proofs, validity of proof steps is decidable, and reporting invalid steps will be a useful aid for students. Although individual students reported that they found taught them to know more precisely what they were doing when doing mathematics, the main measurable didactic eect of using in the past year appears to have been that students now routinely produced syntactically correct formulae even on the hand-written exam and outside calculational proofs, that is, in particular in for- malisation exercises this was not the case in the previous year. Once the stu- dents are using an accessible system that is similarly strict in pointing out type errors and invalid proof steps, this should make a further noticeable dierence in the resulting active language skills in the language of discrete mathematics. References ABP + 04. Andrews, P.B., Brown, C.E., Pfenning, F., Bishop, M., Issar, S., Xi, H.: ETPS: A system to help students write formal proofs. Journal of Auto- mated Reasoning 32, 7592 (2004), doi:10.1023/B:JARS.0000021871.18776.94 ACP01. Abel, A., Chang, B.-Y.E., Pfenning, F.: Human-readable machine- veriable proofs for teaching constructive logic. In: Proceedings of Work- shop on Proof Transformation, Proof Presentation and Complexity of Proofs (PTP 2001). Universit`a degli Studi Siena, Dipartimento di Ingeg- neria dellInformazione, Tech. Report 13/0 (2001), http://www2.tcs.ifi.lmu.de/~abel/tutch/ AH01. Allen, C., Hand, M.: Logic Primer, 2nd edn. MIT Press (2001), http://logic.tamu.edu/ ASS08. Aldrich, J., Simmons, R.J., Shin, K.: SASyLF: An educational proof assis- tant for language theory. In: Huch, F., Parkin, A. (eds.) Proceedings of the 2008 International Workshop on Functional and Declarative Programming in Education, FDPE 2008, pp. 3140. ACM (2008) BZ07. Borak, E., Zalewska, A.: Mizar Course in Logic and Set Theory. In: Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS (LNAI), vol. 4573, pp. 191204. Springer, Heidelberg (2007) Con04. Contejean, E.: A Certied AC Matching Algorithm. In: van Oostrom, V. (ed.) RTA 2004. LNCS, vol. 3091, pp. 7084. Springer, Heidelberg (2004) GKN10. Grabowski, A., Kornilowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formalized Reasoning 3(2), 153245 (2010) GRB93. Goldson, D., Reeves, S., Bornat, R.: A review of several programs for the teaching of logic. The Computer Journal 36, 373386 (1993) 230 W. Kahl GS93. Gries, D., Schneider, F.B.: A Logical Approach to Discrete Math. Mono- graphs in Computer Science. Springer, Heidelberg (1993) Hee05. Heeren, B.: Top Quality Type Error Messages. PhD thesis, Universiteit Utrecht, The Netherlands (September 2005) HHPJW07. Hudak, P., Hughes, J., Jones, S.P., Wadler, P.: A history of Haskell: Being lazy with class. In: Third ACM SIGPLAN History of Programming Lan- guages Conference (HOPL-III), pp. 1211255. ACM (2007) HR96. James Hoover, H., Rudnicki, P.: Teaching freshman logic with mizar-mse. Mathesis Universalis, 3 (1996), http://www.calculemus.org/MathUniversalis/3/; ISSN 1426-3513 Knu84. Knuth, D.E.: Literate programming. The Computer Journal 27(2), 97111 (1984) Knu92. Knuth, D.E.: Literate Programming. CSLI Lecture Notes, vol. 27. Center for the Study of Language and Information (1992) LM01. Leijen, D., Meijer, E.: Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-27, Department of Computer Science, Universiteit Utrecht (2001), http://www.cs.uu.nl/~daan/parsec.html Nip03. Nipkow, T.: Structured Proofs in Isar/HOL. In: Geuvers, H., Wiedijk, F. (eds.) TYPES 2002. LNCS, vol. 2646, pp. 259278. Springer, Heidelberg (2003) NK09. Naumowicz, A., Kornilowicz, A.: A Brief Overview of Mizar. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 6772. Springer, Heidelberg (2009) Nor07. Norell, U.: Towards a Practical Programming Language Based on Depen- dent Type Theory. PhD thesis, Department of Computer Science and En- gineering, Chalmers University of Technology (September 2007) Spi89. Spivey, J.M.: The Z Notation: A Reference Manual. Prentice Hall Inter- national Series in Computer Science. Prentice Hall (1989), Out of print; available via http://spivey.oriel.ox.ac.uk/mike/zrm/ Spi08. Spivey, M.: The fuzz type-checker for Z, Version 3.4.1, and The fuzz Man- ual, 2 edn. (2008), http://spivey.oriel.ox.ac.uk/mike/fuzz/ (last ac- cessed June 17, 2011)