Status Report: Specifying Javascript With ML: David Herman Cormac Flanagan
Status Report: Specifying Javascript With ML: David Herman Cormac Flanagan
Evaluation:
1. Evaluate LogicalORExpression. fun evalCondExpr (regs:REGS)
(cond:EXPR)
2. Call GetValue(Result(1)). (thn:EXPR)
3. Call ToBoolean(Result(2)). (els:EXPR)
: VAL =
4. If Result(3) is false, go to step 8. let
5. Evaluate the first AssignmentExpression. val v = evalExpr regs cond
6. Call GetValue(Result(5)). val b = toBoolean v
in
7. Return Result(6). if b
8. Evaluate the second AssignmentExpression. then evalExpr regs thn
else evalExpr regs els
9. Call GetValue(Result(8)). end
10. Return Result(9).
Figure 1. Pseudocode from ECMAScript Edition 3 (left) compared with corresponding Standard ML code (right).
more problematic was the fact that the pseudocode was not exe- act structure of abstract syntax tree), and a rather surprising amount
cutable, which precluded testing. This resulted in quite a few bugs of iteration and refactoring. The reference implementation is now
in the standard (Horwat 2003a). over 20 KLOC of ML code, with the main phases being:
Due to the limitations of earlier, informal specification mech- parsing (7 KLOC),
anisms, there was a clear desire on the part of the committee for
some sort of formal or executable specification, whereby the abil- a definition phase that includes name resolution and identifying
ity to execute this specification on a variety of JavaScript programs compile-time constants (3 KLOC),
would help detect errors early in the language design process, and type checking (2 KLOC),
would provide additional confidence in the correctness, complete-
and evaluation (5 KLOC).
ness, and consistency of the final specification.
In the initial stages of development of ECMAScript Edition 4, In addition, there is around 10KLOC of ES4 code that defines most
Waldemar Horwat addressed the lack of precision in previous stan- of the Javascript standard libraries. Writing this code in ES4 helps
dards by defining an Algol-like, typed metalanguage. Early propos- reduce the complexity of the core semantics, with some cost in
als used this metalanguage to specify the language constructs, and performance.
an implementation in Common Lisp served as an early reference A pre-release of this reference implementation is available at
implementation (Horwat 2003b,c). Horwat attempted to provide a http://www.ecmascript.org/download.php.
denotational interpretation for the types and terms of the metalan-
guage, but this proved unwieldy. 5. Language specification styles
Beginning in early 2006, we explored the use of term-rewriting
languages such as Stratego (Visser 2001) or PLT Redex (Matthews In this section we briefly reflect on our experiences to date in using
et al. 2004) to develop an executable operational semantics. In order ML to write a definitional interpreter for ECMAScript Edition 4,
to accomodate the non-trivial static semantics and syntactic sugar and compare this approach with two commonly-used alternatives:
in the language, we considered designing yet another intermediate informal prose, such as is used in the Java Language Specifica-
language that would be close in flavor to the pseudocode in pre- tion (Gosling et al. 2000);
vious specifications, while still being fully formalized. However,
formal, mathematical specifications, such as that used to specify
this approach would essentially have required designing two lan-
guages concurrently (ECMAScript Edition 4 and its specification Standard ML (Milner et al. 1997).
language), introducing significant additional work and perhaps un- Thus, the choice of language specification styles can be succinctly
necessary complexity. summarized as
Hence, in November 2006, we decided to use an existing pro-
gramming language as the specification language for Edition 4, Code vs. Prose vs. Math
with ML being an obvious choice for the specification language. 5.1 Language specifications: Code vs. Prose
There was some subsequent discussion of which dialect of ML to
use, with the committee initially leaning towards OCaml (in part Our initial discussions before November 2006 almost exclusively
due to somewhat better tool support, error messages, etc), but even- used prose, together with some JavaScript code fragments, both
tually choosing SML (based in part on arguments that it is a more in person, on whiteboards, and on a wiki (Ecma 2007). Many of
mature language and is formally specified (LtU 2006)). these discussions were at a fairly high level, and assumed a fairly
Over the next several months, much of the work of the commit- substantial amount of background knowledge regarding JavaScript
tee became essentially a software engineering effort, based around implementations. As might be expected, underlying assumptions
a version control system (Monotone 2007) and, later, a bug tracking were often left implicit and occasionally mis-understood, and the
database (Trac 2007). This work largely has involved reifying the interactions between various features were not always explored in
current language design as code. There has been a fair amount of complete detail.
discussion of various implementation details (for example, the ex- This communication style worked well early in the design pro-
cess, as various design alternatives were being compared, and there
was little benefit to fully formalizing a design alternative that may cussions and descriptions always felt overly vague and imprecise.
later be discarded. One of the authors (Flanagan) developed several formal models
Once we switched to a definitional interpreter, the interaction of the operational semantics and type systems for gradual typing,
style of the committee changed substantially, from monthly 1 12 -day but these were inaccessible to many committee members. More re-
discussion-oriented meetings to 3-day hackathons, interspersed cently, we developed a definitional interpreter for the gradually-
with technical discussions, as various corner cases in the language typed lambda calculus, which finally provided a concise and pre-
design and implementation were discovered and resolved. The def- cise description that all committee members could understand and
initional interpreter worked well in forcing the committee to clarify discuss. That is, in this instance, code succeeded where prose and
many unspoken assumptions, and provided a concrete artifact that math had both failed.
grounded many discussions that might otherwise have been overly Many of the committee members found formal semantics daunt-
abstract. It also provided valuable implementation experience for ing, especially working on an aggressive timeline. However, ev-
Edition 4. ery single member of the committee is an expert programmer. Ex-
The style of code is an important aspect of a definitional inter- pressing semantics in a programming language, albeit unfamiliar
preter. Overall, there was fairly clear agreement on clarity over to some, turned out to be far more accessible than many semantics
performance, that is, the primary goal of the definitional inter- formalisms. This allowed more committee members to contribute
preter is to be define the language specification, rather than describe to the specification rather than leaving a small subset of the mem-
a realistic, efficient implementation of that language. We strive to bers on the critical path. We expect this will have benefits for the
emphasize clarity, readability, and abstractness in our code, never readability of the specification as well, since more people in the tar-
efficiency. Of course, this results in a slow implementation, but the get audience are likely to be familiar with functional programming
purpose of the reference implementation is specification rather than than with formal semantics.
usability. Definitional interpreters do work at a somewhat lower level
Another important guideline we have followed is to keep the of abstraction than operational or denotational semantics, in part
core semantics as small as possible by modeling most of the stan- because they deal with more low-level details. Nevertheless, we
dard library in JavaScript, minimizing the reliance on magic believe that it has been significantly easier for the committee to
hooks into the semantics. For example, the reference implementa- formalize the Edition 4 semantics as code than as mathematics,
tion does not implement regular expressions natively in ML, even because:
though most realistic implementations would do so to improve per-
1. it requires much less specialized training;
formance.
As might be expected, writing a definitional interpreter for 2. it leverages prior experience on programming language imple-
a large and realistic language such as ECMAScript Edition 4 mentation (as opposed to semantics);
involved a substantial time investment, and required significant 3. SML provides various linguistic features (side effects, callcc,
communication and co-operation by committee members. This etc) that have proven quite useful; and
time investment included both essential and accidental complex-
ity (Brooks 1986): the essential complexity being the actual cost 4. as mentioned above, type systems and test suites are invaluable
of specifying the language semantics in full detail; the acciden- in debugging the language semantics.
tal complexity included the learning curve with SML and its tool
5.3 Language specifications: Code and Prose
suite, wrestling with unintuitive parts of the SML language, and
dealing with imprecise error messages (eg, there is a type error The increased precision of code over prose can also be a draw-
somewhere in these 200 lines of code). We partially overcame the back: because code operates at a lower level of abstraction than
latter problem by providing explicit types for all top-level func- semantics, it can result in overspecification. For example, libraries
tions. Also, the SML module system provides limited support for often leave portions unspecified to allow for multiple implemen-
mutually-recursive modules, with the result that mutually-recursive tation strategies; but an actual implementation does not have the
conceptual modules must be sometimes coalesced into a mono- freedom to leave anything undefined. Often such implementation
lithic SML module, with some loss in clarity. decisions are observable to user programs. For instance, a library
Overall, despite the overheads and costs of the definitional in- function may document its result type as an abstract interface, but
terpreter, our experience to date suggests that it works much bet- reflection facilities would allow programs to observe the concrete
ter in several regards (consistency, completeness, implementation class used to implement that interface.
experience, early defect detection, etc) than an informal English To avoid overspecification, the reference implementation does
specification. not stand on its own as a complete specification, and parts of it will
not even be included in the normative standard. Rather, the doc-
5.2 Language specifications: Code vs Math ument will excerpt portions of the interpreter where appropriate,
The definitional interpreter has essentially two goals: surrounding code with prose where necessary. The reference im-
plementation will likely be provided as an informative appendix or
to precisely define the language semantics, and companion document.
to communicate this semantics to the intended audience (to
other committee members, to language implementors, and to 6. Implementation overview
other language users).
In this section we describe some of the techniques we use for
Other language definition styles, such as operational or denota- modeling JavaScript features in ML. Because of the feature set
tional semantics, could also have satisfied the first goal but not the of Standard ML, it is possible for us to model JavaScript in a
second, in large part because mathematical semantics involves spe- direct style, using the implicit control and store of ML to model
cialized notation that is unfamiliar to large parts of the target au- those of JavaScript. Of course, we could write the interpreter in
dience. (Additional formal notations would be necessary to also continuation-passing and store-passing style, using ML as little
specify the type system of the language.) more than an executable lambda calculus. This would bring the
This limitation became quite clear in the committees discus- model closer to a formal semantics. Indeed, in some cases the price
sions of lightweight strategies for gradual typing. Our English dis- we pay for direct style is the need for somewhat less natural models
datatype VAL = Object of OBJ catches instances of TailCallException and invokes the asso-
| Null ciated thunk.
| Undef
First-Class Continuations
and OBJ = The one non-standard feature of SML that we are considering ex-
Obj of { ident: OBJ_IDENT, ploiting is callcc. Because the semantics of generators (see Sec-
tag: VAL_TAG, tion 3) involves suspending and reifying a delimited portion of
props: PROP_BINDINGS, the current continuation, some amount of reification of control is
proto: VAL ref, necessary. To convert the entire interpreter to continuation-passing
magic: MAGIC option ref } style just to support this one, largely orthogonal language feature
would be unfortunate. Instead, we could use a non-native encoding
and VAL_TAG = of the delimited continuation operators shift and reset imple-
ObjectTag of FIELD_TYPE list mented with native callcc (Herman 2007). While non-standard,
| ArrayTag of TYPE_EXPR list the semantics of continuations are well understood and widely im-
| FunctionTag of FUNC_TYPE plemented.
| ClassTag of NAME
6.2 Engineering the reference implementation
and MAGIC = The current pre-release of the reference implementation is built
UInt of Word32.word with Standard ML of New Jersey (Appel and MacQueen 1991).
| Int of Int32.int We are currently working on ports to MLton (Weeks 2006) and
| ... SML.NET (Benton et al. 2004). Porting to multiple implementa-
tions of SML has helped us to discover non-standard features we
withtype OBJ_IDENT = int used unwittingly, improving portability and forcing us to code to
the standard language. We also hope to reap the benefits each im-
and PROP = { ty: TYPE_EXPR, plementation has to offer, specifically performance from MLton
attrs: ATTRS, and interoperability from SML.NET.
state: VAL } The reference implementation is already delivering on its
promise to help with testing. The first and probably most impor-
and PROP_BINDINGS = (NAME * PROP) list ref tant tests we have performed are regression tests: both Mozilla
and Adobe have contributed sizeable test suites from their own
Figure 2. Definition of runtime values in ECMAScript Edition 4. implementations of ECMAScript. The current build passes more
than 93% of the Mozilla regression tests. The failing test cases are
caused both by out-of-date tests (written for previous versions of
of individual features. But writing in direct style allows us to keep the language), or tests that use features that the reference imple-
these representations localized, resulting in a more modular and mentation does not yet implement correctly.
comprehensible language definition.