Teaching The Art of Functional Programming Using A
Teaching The Art of Functional Programming Using A
1 INTRODUCTION
Online programming environments, such as Ask-Elle [Gerdes et al. 2017] and the one used by
the popular learning platform Codecademy [Sims and Bubinski 2011], offer immense potential to
enhance students’ educational experience in courses with high programming content. For one,
having an entire development environment just a click away in a browser drastically lowers the
entry barrier; students can start programming right away and concentrate on learning the course
material, rather than getting frustrated and bogged down in the technical aspects of installing a new
language and possibly learning a new editor. This is particularly the case for a lot of typed functional
Authors’ addresses: Aliya Hameer, School of Computer Science, McGill University, Canada, [email protected];
Brigitte Pientka, School of Computer Science, McGill University, Canada, [email protected].
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,
This work
contact theisowner/author(s).
licensed under a Creative Commons Attribution 4.0 International License.
© 2019 Copyright held by the owner/author(s).
2475-1421/2019/8-ART115
https://doi.org/10.1145/3341719
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
115:2 Aliya Hameer and Brigitte Pientka
programming languages, such as OCaml, SML, or Haskell, where few integrated development
environments (IDEs) exist, and support across platforms and operating systems differs widely. More
significantly, online programming platforms feature autograding facilities that allow students to
run and get feedback on their code while they are developing their programs, giving them the
opportunity to fix bugs and address errors in their understanding right away. Having access to
immediate feedback on their code has been recognized to significantly improve student learning
outcomes and engagement with their assignments (see, e.g., [Gramoli et al. 2016; Sherman et al.
2013; Wilcox 2015]).
However, these automatic graders tend to focus heavily or even solely on the input/output
correctness of an implementation, since this is the most straightforward property to assess without
human interference. This approach neglects several other aspects of a student’s code that are
important to consider in order to give useful feedback; for instance, input/output testing cannot
determine whether a student used the intended algorithm, language constructs such as pattern
matching, or even good coding practices to solve a particular problem. As a consequence, students
miss out on an entire class of valuable feedback that is extremely important for their learning.
In the Fall 2018 term, we piloted the use of the Learn-OCaml online programming platform
[Canou et al. 2017, 2016] to deliver the course assignments in a second-year university course on
programming languages and paradigms. The goal of the course is to provide a new perspective
on fundamental concepts that are common in many programming languages through the lens of
functional programming. It is taught primarily in OCaml and has had a combined enrollment of
642 students over the Fall 2018/Winter 2019 terms.
Learn-OCaml is an exercise environment originally developed for the massive open online course
(MOOC) Introduction to Functional Programming in OCaml [Di Cosmo et al. 2015], colloquially
known as łThe OCaml MOOCž. The platform allows students to write, typecheck, and run OCaml
code directly in their browser. It includes an integrated grader: while working on their assignments,
students can run the grader, see the grade they would get, and then improve their code until
they are satisfied with the results. Graders are written in OCaml using the high-level grading
combinators and lower-level grading primitives provided by Learn-OCaml’s testing library. The
bulk of the high-level portion of the library focuses on input/output testing, with much of the other
functionality being as of yet underdeveloped; however, what makes the Learn-OCaml platform
truly interesting, and why it offers so much promise for grading, is that grader code is given access
to the student’s code AST. This opens up a number of possibilities for performing static checks on
the student’s code using OCaml’s compiler front-end libraries.
In this paper, we report on our experience in delivering and grading assignments using the
Learn-OCaml platform in a classroom environment. Topics covered in our course and evaluated
using Learn-OCaml include: higher-order functions; stateful vs. state-free computation; modelling
objects and closures; using exceptions and continuations to defer control; and lazy programming.
We discuss which concepts were easy to test using the platform and which were unexpectedly
difficult; problems that arose when students started using the grader in ways other than what
was intended; and the extensions to the platform that our experience motivated in order to better
accomplish our teaching goals. The concrete contributions of this paper are the following:
(1) We describe how we used the built-in functionality of the Learn-OCaml grading library to
evaluate students on a variety of functional programming concepts (Section 2).
(2) We release a set of homework problems and associated graders used in the Fall 2018 offering
of the course, available upon request, for use by other instructors wishing to incorporate
Learn-OCaml into their curriculum.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
Teaching the Art of Functional Programming using Automated Grading 115:3
(3) In order to emphasize good software testing practices, even in the presence of an automated
grader, we provide a mutation testing framework for Learn-OCaml. This framework can be
used by instructors to require students to write unit tests for every function they implement,
and evaluate these test cases on both correctness and coverage (Section 3).
(4) We present a set of extensions we have written for Learn-OCaml’s grading library (Section
4). In particular, to teach students about good functional programming style, we release an
extensible framework that supports style checking and provides suggestions (Section 4.2).1
In summary, we provide an account of our own experience and extend the out-of-the-box
functionality of the Learn-OCaml platform in order to promote functional programming educa-
tion, enhance students’ educational experience, and make teaching and learning typed functional
programming more accessible to both instructors and students.
The built-in functionality of the Learn-OCaml platform worked quite well for this exercise.
The grading library provided us with the student’s code AST, and a simple syntactic check could
determine whether the student used pattern matching to implement their functions. Input/output
testing was then sufficient to verify that the student’s code worked as intended. There was one
issue we did not anticipate: several students tried to brute-force a function to compare two cards
by writing a pattern matching with a case for each of the 81 possible pairs of ranks. This is easy to
prevent by extending the syntactic check to confirm that the pattern matching contains a reasonable
1 Our extensions to Learn-OCaml can be found at https://github.com/teaching-the-art-of-fp/learn-ocaml/tree/teaching-fp
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
115:4 Aliya Hameer and Brigitte Pientka
number of cases, and our grader has been modified to do this. To give an idea of the scope of this
issue, we found that 63% percent of submissions used more than the 10 pattern matching cases that
were needed for our solution.
Modelling Objects with Closures. One homework assignment asked students to write a function
new_account which, given an initial password, would return a new value of type bank_account as
defined in Figure 3. All operations on a bank account required that the correct password be given
along with the operation, and providing the wrong password three times in a row was supposed to
lock the account, not allowing any more operations to be performed. Problems involving modelling
objects with state using closures are not difficult to grade automatically, just as it is not difficult to
test implementations of simple classes in an object-oriented language.
The unexpected difficulties in grading this assignment came from the fact that student code
returned a bank account łobjectž, which was a record of several functions. The high-level utilities
in Learn-OCaml’s grading library focus on comparing the output of running the student’s code to
some expected output; in this case the outputs were łobjectsž which we wished to run our own
series of tests on, instead of simple values that we would want to compare to some others. Testing
the outputs in this way while providing the students with informative feedback required us to copy
a lot of code from the grading library and modify it slightly for our purposes. This was a problem
that came up repeatedly in other contexts and is not specific to the topic of modelling objects.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
Teaching the Art of Functional Programming using Automated Grading 115:5
Control Flow. Students implemented a parser for a simple arithmetic language twice: once using
exceptions for control flow, and once using continuations. Both were very easy to test using
the built-in functionality of the grading library. Since the testing functions by default compare
exceptions raised by the student’s code and the solution as part of input/output testing, the only
extra work we had to do in this case was adjust how exceptions carrying more complex values
like lists were printed, so that the feedback given to the students was properly meaningful. In the
case of the implementation using continuations, testing the student’s code with a few predefined
continuations with different output types was sufficient to confirm that they were indeed calling
the given continuation with the expected inputs.
Lazy Programming. One of the final assignments of the term focused on lazy lists, as defined
in Figure 4. Delayed computation is modelled by a function that takes an input of type unit; lazy
lists are defined as a record of a head element and the delayed computation for the tail of the list.
Students were asked to implement map, append, and flatten functions for lazy lists, and then
to use these to write a function permute which lazily computes all the permutations of a given
(non-lazy) list.
Students were also tasked with implementing a function hailstones that lazily generates the
hailstone sequence (also known as the Collatz sequence) starting from a given positive integer. The
hailstone sequence (ai )i ∈N starting from a positive integer n is defined as follows:
a0 = n
(
ai
2 ai is even
ai+1 =
3ai + 1 ai is odd
For up to extremely large values of n, the hailstone sequence starting from n has been checked to
always eventually reach the cycle 4, 2, 1, 4, 2, 1, · · · . However, while it is conjectured that this is
true for all positive integers n, this has not been proven; thus students were asked to generate the
sequences using lazy lists.
Automatically grading these functions was predictably quite difficult, since potentially infinite
structures were involved. One difficulty was in simply giving the students meaningful feedback.
Lazy lists, by their definition, do not have an easy-to-read printed representation, and since they
have the potential to be infinite, naïvely trying to print out the elements of a lazy list is not practical.
The default printed representation of a lazy list is shown in Figure 5a, and is clearly not useful
to students trying to replicate a failed test case. An acceptable solution might be to print out the
elements of the list only up to a certain bound.
Comparing a lazy list against an expected output could not be done with 100% certainty, as the
structures involved could be infinite. Indeed, in the case of the hailstones function, all outputs
were expected to represent infinite sequences. We wrote a custom tester that steps through two
lazy lists to compare them pairwise until a certain value is reached (in the case of hailstones,
that value was 1). An example error report given by this tester is shown in Figure 5b.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
115:6 Aliya Hameer and Brigitte Pientka
Other Topics. Other concepts covered in the assignments over the term include references and
state, and implementation of a type-checker and evaluator for a small toy language. These topics
did not bring any new insights to the table in terms of implementing graders, so we do not give
further detail on them here.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
Teaching the Art of Functional Programming using Automated Grading 115:7
students to do the extra work of writing conversion functions between the new datatype and some
type built in to OCaml, but particularly for earlier assignments we would prefer not to do this.
It is also infeasible to keep coming up with graders for new assignment problems year after year.
Therefore, we plan to build a repository of pre-written graders, starting with the ones we have
written for the Fall 2018 term, and re-use them over the years. As a contribution of this paper, we
release these homework problems and associated graders to other instructors teaching courses
in OCaml upon request. We hope for this to be a starting point for a bank of tried and tested
homework assignments that can be shared among instructors in the community at large, making
adoption of the Learn-OCaml platform more accessible to those who wish to make the switch.
Improper Use of the Grader. A major issue we experienced over the course of the term was that
students were becoming overreliant on the automated grader and using it as a substitute for testing
their code on their own. The Learn-OCaml platform provides an interactive toplevel for students to
test their code; however, in an end-of-term survey, 35% of students responded that they used the
toplevel to test their code łrarelyž or łneverž. On earlier assignments, students could get away with
writing their programs by trial and error based on the grader’s feedback. Later assignments where
the questions were more complicated made this strategy much more difficult and prohibitively
time-consuming. A common complaint through the term was that the grader did not give the
expected output for failed test cases, and students were thus required to figure out the expected
outputs themselves. This phenomenon has been previously observed when relying on automated
grading for programming assignments; for some examples, see [Chen 2004; Edwards 2003].
One possible solution to this problem is to limit the number of times students can run the grader
on their submissions, thus requiring them to test their code carefully before using up one of their
limited tries. This has been observed by Pettit et al. [2015] to increase the average quality of student
submissions.
We have decided to take a different route, in which we require students to write their own tests
and make use of the automated testing infrastructure to grade students on their test quality. A
similar approach has been tried in a third-year computer organization course by Chen [2004], and
is still in use in the course today. We give details on this solution, and how we have implemented it
as a library for Learn-OCaml, in Section 3.
Limitations of the Grading Library. The grading library that Learn-OCaml provides specializes in
testing input/output correctness against a solution and in performing simple syntactic checks on
the student’s code: for instance, checking that a certain name occurs syntactically in the body of a
function, or verifying that a function is implemented without using for loops. More complicated
syntactic checks one might like to perform, such as checking if a certain call is a tail call, are doable
with the library’s provided functionality, but require users to write a lot of extra code from scratch.
A major syntactic property of a student’s code that we wish to evaluate is code style. The meaning
of code style is subjective among instructors, but it is commonly taken into account in some form
when grading programming assignments manually; when using Learn-OCaml’s provided library
out of the box, students miss out on this entire class of valuable feedback. Again, writing style
checks is possible with the current grader library, but involves a lot of extra code.
We have written extensions for Learn-OCaml’s grading library to mitigate this issue, which we
discuss in Section 4.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
115:8 Aliya Hameer and Brigitte Pientka
let pow_tests = [
((4 , 1) , 4);
((0 , 5) , 0);
((2 , 2) , 4)
]
let pow n k = n * k
let pow n k =
if k = 0 then 0 (c) Sample grading report for mutation testing
else n * pow n (k - 1)
platform similar to the idea described in Chen [2004], by which we can leverage the automated
grader to evaluate students not only on their code, but also on unit tests that they write for it.
We assess the quality of a student’s tests using a strategy based on mutation testing [DeMillo
et al. 1978]. Suppose that we wish the student to write tests for a function pow such that, for natural
numbers n and k, pow n k evaluates to nk (we do not care what pow 0 0 returns). The student
provides the tests as a list of input/expected output pairs as shown in Figure 6a. Programmed
into the grader is a set of buggy implementations of the pow function, which we call mutants. An
example of some mutants of the pow function is given in Figure 6b: one buggy implementation
simply multiplies its inputs instead of performing exponentiation, and the other has a typo in the
first line of the if-expression, resulting in all calls returning 0.
The student’s tests are first checked for correctness by running them against the correct solution
code, as seen in the first section of the report in Figure 6c. Including this functionality allows
students to experiment with how they think the functions they are asked to implement should
work, by writing test cases and then verifying their understanding against the solution.
Once all of the student’s tests have been deemed correct, they are next checked for coverage by
running them against each of the mutants. Given a specific mutant, if at least one of the student’s
test cases results in an output from the mutant that does not match the indicated expected output,
the student’s tests are said to have exposed the bug in that implementation. Otherwise the tests are
insufficient to reveal the bug, and the student must revisit their test cases and consider what cases
they might not be covering. In Figure 6, the given test cases are enough to uncover the bug in the
second mutant in 6b, but not the first, as the test cases are not enough to distinguish exponentiation
from simple multiplication. Adding the test case ((2, 3), 8) to pow_tests in Figure 6a would be
sufficient for the test suite to uncover the bugs in both mutants.
Normally in a software development setting, a large number of mutants are generated automati-
cally by making small modifications to the code being tested, such as replacing a mathematical
operator with a different one, or replacing a variable name with a constant or a different variable;
this is meant to mimic common programmer errors. A drawback of this is that since so many
mutants need to be generated and then executed against the test suite for this to be effective, the
computational cost of mutation testing can be quite high [Jia and Harman 2011]. In our use case,
we already know what a desired solution looks like. We can thus write a small number of very
specific mutants manually while writing the grader, and use these for mutation testing. Writing
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
Teaching the Art of Functional Programming using Automated Grading 115:9
the mutants manually allows us to isolate certain cases that we want to make sure students are
testing for. Mutation testing is thus ideal here since we can achieve a high level of effectiveness
with a low computational cost.
Having students write their unit tests as input/output pairs is simple enough that they can
start writing tests right away on their very first homework assignment, given a proper template.
However, it does not allow them to account for console output, express that a test case should
raise an exception, or write test cases for functions involving randomness, for example. The simple
input/output testing functionality is sufficient for the non-error cases in the vast majority of
functions we have students write, so we believe this is not a significant drawback, and we expect
based on the experience by Chen [2004] that it will help us significantly in achieving our broader
learning goal of emphasizing good software development practices for our students.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
115:10 Aliya Hameer and Brigitte Pientka
aliasing and mutation, and we do not attempt to address such factors in our implementation;
however, we feel that our implementation is sufficient for handling the scenarios that come up in
practice with student code. We traverse the code AST of the function in question and check if we
can find any recursive call that is not a tail call. If so, we flag it as an error and provide the student
with the context in which the function call appears outside of tail position. As an example, consider
an attempt to implement exponentiation by repeated squaring tail-recursively using the helper
function fast_pow_tl given in Figure 8a. This function is not tail recursive, since a recursive call
to fast_pow_tl appears in a binding of a let-expression on line 6. Our syntactic check generates
a report as shown in Figure 8b, informing the student that a recursive call has been found outside
of tail position and pinpointing the location where this happens.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
Teaching the Art of Functional Programming using Automated Grading 115:11
[ e1 ] @ e2 =⇒ e1 :: e2
if e1
then true =⇒ e1 || e2
else e2
if e = [] match e with
then e1 =⇒ | [] -> e1
else e2 | hd :: tl -> e2 ′
where e2 = [List.hd e / hd, List.tl e / tl] e2′
(a) Sample of the rewrite rules implemented in the style checker extension
(b) Sample feedback report corresponding to code that triggers the rewrite rules in (a)
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
115:12 Aliya Hameer and Brigitte Pientka
a singleton list so that the list e2 can be appended to it, instead of simply using the :: (łconsž)
operator which has the same purpose. The second example shows a conditional statement that
could be transformed into a use of the || (łorž) operator. For the final example, we emphasize
that pattern matching on list structure is strictly preferable to comparing to the empty list ([])
and using the explicit list selectors List.hd and List.tl. A full list of the rewrite rules we have
implemented can be found in the Appendix. We have also included checks that will emit a warning
if the student’s code uses more than a given number of cases in any one pattern matching or
conditional expression.
When the style checking extension is enabled, a section on code style will be added to the
student’s feedback report whenever they run the autograder on their code. A sample such report
can be found in Figure 9b. Note that we have two categories of style feedback: suggestions (gray
background) and warnings (yellow background). Suggestions are those rewritings that one may
or may not want to apply depending on concerns such as code readability. Warnings are those
rewrites that we feel should always be applied. Using features already present in Learn-OCaml’s
grading library, it is possible to have the style report contribute points to the student’s grade, or to
refuse to grade the functionality of the student’s code if they have more than a certain number of
warnings.
We believe that by continuously emphasizing ideas of good style to our students with examples
involving their own code, this style checking extension will be a great help in teaching our students
elements of good functional programming style.
5 RELATED WORK
Using automated grading for programming classes has been suggested as early as 1960 [Hollingsworth
1960] and the practice has become common in higher education, with the development of systems
such as ASSYST [Jackson and Usher 1997], Ceilidh [Benford et al. 1995], Web-CAT [Edwards and
Perez-Quinones 2008], BOSS [Joy et al. 2005], and many more. Ala-Mutka [2005] provides a survey
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
Teaching the Art of Functional Programming using Automated Grading 115:13
of the field up to the mid-2000s. These systems generally follow a simple model: students write code
on their own machines and submit their work to a server. The server then executes the student’s
code in some sort of sandboxed environment and, usually after some delay, sends some report back
to the student.
In our experience, the Learn-OCaml online programming platform provides a more expressive
and flexible environment. Students do not need to install anything, and can work on and grade their
code all in one place. Instructors can not only take advantage of the simple out-of-the-box support
for checking input/output correctness of a students’ program, but also tap into its potential to teach
software testing practices via mutation testing and suggestions on elements of programming style.
We believe this can substantially enhance the learning experience of the student.
The DrRacket IDE, augmented with the Handin Server plugin2 , provides a similar experience
for students as the Learn-OCaml platform. Students download and install DrRacket and a course-
specific plugin and are then able to submit their work simply by clicking a łHandinž button in
the IDE, which uploads their work to a server, runs some instructor-defined tests, and returns the
results of these tests to the student. The test automation framework allows instructors to test for
input/output correctness, perform syntactic checks on the student’s code AST, and evaluate the
code coverage of student test suites. Student submissions remain on the server after tests are run
for teaching staff to carry out further manual grading if desired. Our work on the Learn-OCaml
platform is in a similar spirit to the DrRacket project: the DrRacket IDE has made languages in
the Scheme family popular in the university setting, but there exists no such platform for typed
functional programming languages in the ML family.
Other tools exist that suggest code refactorings for better style in the sense of our style checker.
Examples of these include Tidier, for Erlang [Sagonas and Avgerinos 2009], and HLint3 , for Haskell.
Tidier functions simlarly to our style checker, suggesting a variety of program transformations
based on ideas such as using appropriate language constructs and making good use of pattern
matching. HLint focuses heavily on pointing out functional equivalences, such as replacing concat
(map f x) with concatMap f x, though it includes rules for several of the transformations we
have implemented for our style checker as well. The set of checks can be extended with custom
rules added via a configuration file.
6 CONCLUSION
In this paper we have reported on our experience in using the Learn-OCaml platform, which was
originally developed as an exercise environment for a MOOC, in a university setting. We have
discussed how we used the features of the platform to test students on a variety of concepts that
come up in a programming languages course, the problems that came up while doing so, and the
extensions to the platform that we subsequently implemented for evaluating students on properties
of their code beyond input/output correctness.
There still remains work to be done. In the future, we plan to work on several fronts to incorporate
new ideas into Learn-OCaml:
(1) Plagiarism detection. Systems for measuring code similarity such as MOSS4 [Schleimer et al.
2003] are commonly used in large programming classes to detect possible cases of plagiarism
by comparing student code submissions. No such mechanism currently exists in the Learn-
OCaml platform.
2 https://docs.racket-lang.org/handin-server/Handin-Server_and_Client.html
3 https://github.com/ndmitchell/hlint
4 https://theory.stanford.edu/ aiken/moss/
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
115:14 Aliya Hameer and Brigitte Pientka
(2) Interactive lessons. In addition to the exercise functionality, the Learn-OCaml platform allows
instructors to create interactive tutorials and lessons for their students. In the future we
would like to convert our course notes to this format to help facilitate independent learning.
(3) Typed holes. We encourage our students to develop their programs incrementally by łfollowing
the typesž; as such, it would be quite useful to be able to type-check partial programs, as
in the live programming environment Hazel [Omar et al. 2017]. Being able to insert typed
holes into incomplete programs would allow students to check, for example, the type of an
argument that they need to provide to a function, or the type that a certain sub-expression
needs to evaluate to in order for the full expression to have a desired type.
(4) Incremental stepping. The DrRacket environment includes an algebraic stepper5 which al-
lows students to step through their code and visualize what conceptionally happens when
evaluating a given program. Cong and Asai [2016] presents an implementation of a stepper
for a small subset of OCaml, implemented in a fork of the Caml language.
(5) Complexity analysis. Aside from correctness, test quality, and code style, another metric of
student code that would be useful to assess automatically is time complexity. Hoffmann
et al.’s work on automatically analyzing resource bounds of OCaml programs has led to
the development of Resource Aware ML (RAML) [Hoffmann et al. 2012, 2017], a resource-
aware version of the OCaml language. Currently only a subset of OCaml is supported and
the language is not ready for deployment in a learning setting. In the future it would be
interesting to see if these ideas could be ported to Learn-OCaml.
By sharing our experience and our tools together with a suite of homework problems, we hope
to make it easier for other instructors to offer high-quality functional programming courses and
allow us to promote best functional programming practices in our community and beyond.
ACKNOWLEDGMENTS
This material is based upon work supported by the Natural Science and Engineering Council
(NSERC) under Grant No. 206263 and by the OCaml Software Foundation under Grant No. 249299.
Any opinions, findings, and conclusions or recommendations expressed in this material are those
of the author and do not necessarily reflect the views of NSERC or the OCaml Software Foundation.
REFERENCES
Kirsti M Ala-Mutka. 2005. A Survey of Automated Assessment Approaches for Programming Assignments. Computer
Science Education 15, 2 (2005), 83ś102. https://doi.org/10.1080/08993400500150747
Steve D Benford, Edmund K Burke, Eric Foxley, and Christopher A Higgins. 1995. The Ceilidh system for the automatic
grading of students on programming courses. In Proceedings of the 33rd annual on Southeast regional conference. ACM,
176ś182.
Benjamin Canou, Roberto Di Cosmo, and Grégoire Henry. 2017. Scaling up functional programming education: under
the hood of the OCaml MOOC. Proceedings of the ACM on Programming Languages 1, ICFP (aug 2017), 1ś25. https:
//doi.org/10.1145/3110248
Benjamin Canou, Grégoire Henry, Çagdas Bozman, and Fabrice Le Fessant. 2016. Learn OCaml, An Online Learning Center
for OCaml. In OCaml Users and Developers Workshop 2016.
Peter M Chen. 2004. An Automated Feedback System for Computer Organization Projects. IEEE Transactions on Education
47, 2 (May 2004), 232ś240. https://doi.org/10.1109/te.2004.825220
Youyou Cong and Kenichi Asai. 2016. Implementing a stepper using delimited continuations. contract 1 (2016), r1.
Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing
programmer. Computer 11, 4 (1978), 34ś41.
Roberto Di Cosmo, Yann Regis-Gianas, and Ralf Treinen. 2015. Introduction to Functional Programming in OCaml. (October
2015). https://www.fun-mooc.fr/courses/parisdiderot/56002/session01/about
5 https://docs.racket-lang.org/stepper/index.html
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.
Teaching the Art of Functional Programming using Automated Grading 115:15
Stephen H Edwards. 2003. Using test-driven development in the classroom: Providing students with automatic, concrete
feedback on performance. In International Conference on Education and Information Systems: Technologies and Applications
(EISTA’03), Vol. 3.
Stephen H Edwards and Manuel A Perez-Quinones. 2008. Web-CAT: automatically grading programming assignments. In
ACM SIGCSE Bulletin, Vol. 40. ACM, 328ś328.
Alex Gerdes, Bastiaan Heeren, Johan Jeuring, and L Thomas van Binsbergen. 2017. Ask-Elle: an adaptable programming
tutor for Haskell giving automated feedback. International Journal of Artificial Intelligence in Education 27, 1 (2017),
65ś100.
Vincent Gramoli, Michael Charleston, Bryn Jeffries, Irena Koprinska, Martin McGrane, Alex Radu, Anastasios Viglas, and
Kalina Yacef. 2016. Mining Autograding Data in Computer Science Education. In Proceedings of the Australasian Computer
Science Week Multiconference (ACSW ’16). ACM, New York, NY, USA, Article 1, 10 pages. https://doi.org/10.1145/2843043.
2843070
Robert W. Harper. 2013. Programming in Standard ML. (draft available at https://www.cs.cmu.edu/~rwh/isml/book.pdf).
Jan Hoffmann, Klaus Aehlig, and Martin Hofmann. 2012. Resource Aware ML. Lecture Notes in Computer Science (2012),
781âĂŞ786. https://doi.org/10.1007/978-3-642-31424-7_64
Jan Hoffmann, Ankush Das, and Shu-Chun Weng. 2017. Towards automatic resource bound analysis for OCaml. ACM
SIGPLAN Notices 52, 1 (Jan 2017), 359âĂŞ373. https://doi.org/10.1145/3093333.3009842
Jack Hollingsworth. 1960. Automatic graders for programming classes. Commun. ACM 3, 10 (1960), 528ś529.
David Jackson and Michelle Usher. 1997. Grading Student Programs Using Assyst. ACM SIGCSE Bulletin 29, 1 (1997),
335ś339. https://doi.org/10.1145/268085.268210
Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Transactions on
Software Engineering 37, 5 (Sep. 2011), 649ś678. https://doi.org/10.1109/TSE.2010.62
Mike Joy, Nathan Griffiths, and Russell Boyatt. 2005. The boss online submission and assessment system. Journal on
Educational Resources in Computing (JERIC) 5, 3 (2005), 2.
Greg Michaelson. 1996. Automatic analysis of functional program style. In Australian Software Engineering Conference.
38ś46. https://doi.org/10.1109/aswec.1996.534121
Cyrus Omar, Ian Voysey, Michael Hilton, Jonathan Aldrich, and Matthew A Hammer. 2017. Hazelnut: a bidirectionally
typed structure editor calculus. ACM SIGPLAN Notices 52, 1 (2017), 86ś99.
Raymond Pettit, John Homer, Roger Gee, Susan Mengel, and Adam Starbuck. 2015. An Empirical Study of Iterative
Improvement in Programming Assignments. In 46th ACM Technical Symposium on Computer Science Education (SIGCSE
’15). ACM, 410ś415. https://doi.org/10.1145/2676723.2677279
Konstantinos Sagonas and Thanassis Avgerinos. 2009. Automatic refactoring of Erlang programs. Proceedings of the
11th ACM SIGPLAN conference on Principles and practice of declarative programming - PPDP âĂŹ09 (2009). https:
//doi.org/10.1145/1599410.1599414
Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing. Proceedings of the 2003 ACM SIGMOD international
conference on on Management of data - SIGMOD âĂŹ03 (2003). https://doi.org/10.1145/872757.872770
Mark Sherman, Sarita Bassil, Derrell Lipman, Nat Tuck, and Fred Martin. 2013. Impact of Auto-grading on an Introductory
Computing Course. J. Comput. Sci. Coll. 28, 6 (June 2013), 69ś75. http://dl.acm.org/citation.cfm?id=2460156.2460171
Zach Sims and Ryan Bubinski. 2011. Codecademy. (2011). http://www.codecademy.com
Chris Wilcox. 2015. The role of automation in undergraduate computer science education. In Proceedings of the 46th ACM
Technical Symposium on Computer Science Education. ACM, 90ś95.
Proc. ACM Program. Lang., Vol. 3, No. ICFP, Article 115. Publication date: August 2019.