0% found this document useful (0 votes)
187 views27 pages

Kernelf

Kernelf embedded language

Uploaded by

scribdgrrla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views27 pages

Kernelf

Kernelf embedded language

Uploaded by

scribdgrrla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

KernelF - an Embeddable and

Extensible Functional Language

Markus Voelter
independent/itemis AG
[email protected]

Abstract of these behavioral paradigms one can find expressions,


Expressions and simple functional abstractions are at and by extension, a small functional language: all of the
the core of most DSLs we have built over the last years, mentioned paradigms require arithmetics, conditions or
in domains ranging from embedded software to medical other simple “calculations”. Many of them can benefit
systems to insurance contracts. To avoid reimplement- from having functions, records or enums.
ing this functional core over and over again, we have Assuming the language implementation technology
built KernelF, an extensible and embeddable functional supports modular, reusable (embeddable and extensible)
language. It is implemented based on JetBrains MPS, languages, reinventing this core functional language
which facilitates extension and embedding. Because of for each DSL is a huge waste of effort. Instead, a
this focus on embedding and the reliance on a language better solution to this problem is to develop a small
workbench, the design decisions driving KernelF are functional language that can be reused in (and adapted
quite different from other functional languages. In this for) all these DSLs. More specifically, the language
paper we give an overview over the language, describe should be extensible (so new expressions can be added)
the design goals and the resulting design decisions. We and embeddable (so it can be used in all the contexts
use a set of case studies to evaluate the degree to which mentioned above).
KernelF achieves the design goals.
In this paper we describe the design and implemen-
tation of KernelF, a modern functional core language
1. Introduction built on top of MPS, a language workbench that facil-
After designing and implementing dozens of domain- itates modular and reusable language.
specific languages (DSLs) over the last years, we have
found a recurring pattern in the high-level structure Access to the Code The core of KernelF (i.e.,
of DSLs (see Fig. 1). All DSLs rely on domain-specific everything that is described outside the case studies
data structures, be they the structure of refrigerators, in Sec. 7) is open source software. It lives in the IETS3
data schemas for legal contracts or insurance products or repository at
sensor and actor definitions in industrial automation. No https://github.com/IETS3/iets3.opensource
two DSLs are similar in these structures. The behavioral The examples discussed in Sec. 2 can be found in the
aspects of DSLs is often based on versions of established following root, in the above repo:
behavioral paradigms, such as functional or object- 1 project: org.iets3.core (repo/code/languages/org.iets3.core)
oriented programming, rules executed by solvers or other
rule engines, data flow models or state machines. Using
an established behavioral paradigm makes the semantics
of DSLs easier to tackle – and checkers and analyzers
easier to build. However, at the core of the vast majority

Permission to make digital or hard copies of part or all of this work for personal
or classroom use is granted without fee provided that copies are not made or dis-
tributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned
by others than ACM must be honored. Abstracting with credit is permitted. To
copy otherwise, to republish, to post on servers, or to redistribute to lists, contact
the Owner/Author. Request permissions from [email protected] or Publica-
tions Dept., ACM, Inc., fax +1 (212) 869-0481. Copyright held by Owner/Author.
Publication Rights Licensed to ACM.
Figure 1. The three typical layers of a DSL: domain-
specific data structures, behavior based on an existing
Copyright
c ACM [to be supplied]. . . $15.00
paradigm, and at the core, functional expressions.
2 module: sandbox.core.expr.os includes code completion, type checking, refactoring and
3 model: sandbox.core.expr.os.expressions
4 node: Paper [TestSuite] (root)
debugging. In addition, programs should be executable
5 url: http://localhost:8080/select/org.iets3.core/ (by an interpreter) directly in the IDE to support quick
6 r:3dff0a9d-8b1d-4556-8482-b8653b921cfb/
7 7740953487934666415/
turnaround and the ability of end users to “play” with
the programs.
Portability The various languages into which KernelF
1.1 Design Goals will be embedded will probably use different ways of
Simplicity KernelF should be used as the kernel of execution. Likely examples include code generation to
DSLs. The users of these DSLs may or may not be Java and C, direct execution by interpreting the AST and
programmers – the overwhelming majority will not be as well as transformation into intermediate languages
experts in functional programming. These users should for execution in cloud or mobile applications. KernelF
not be “surprised” or “overwhelmed”. Thus, the language should not contain features that prevent execution on
should use familiar or easy to learn abstractions and any of these platforms. Also, while not a core feature of
notations wherever possible. the language, a sufficient set of language tests should be
provided to align the semantics of the various execution
Extensibility Extensibility refers to the ability to add platforms.
new language constructs to the language to customize it
for their domain-specific purpose. Specifically, it must be
2. KernelF Described
possible to add new types, new operators or completely
new expressions, such as decision tables. These must be In this section we describe the KernelF language. The
added to the language without invasively changing the description is complete in the sense that it describes
implementation of KernelF itself. every important feature. However, it is incomplete in
that it does not mention every detail; for example, several
Embeddability Embedding refers to the ability to use
of the obvious binary operators or collection functions
the language as the core of arbitrary other languages. To
are not mentioned. They can be found out easily through
enable this, several ingredients are needed: the existing
code completion in the editor.
set of primitive types must be replaceable, because
alternative types may be provided by the host language. 2.1 Types and Literals
More generally, the parts of the language that may not
be needed must be removable. And finally, extension also Three basic types are part of KernelF: boolean, number,
plays into embedding, because embedding into a new and string. This is a very limited set, but it can be
context always requires extension of the language with extended through language engineering. They can also
expressions that connect to (i.e., reference) elements from be restricted or entirely replaced if a particular host
this context (e.g., expressions that refer event arguments language wants to use other types.
in a state machine language). val aBool: boolean = true
val anInt: number = 42
Robustness The users of the DSLs that embeds val aReal: number{2} = 33.33
KernelF may not be experienced programmers – in val aString: string = "Hello"

fact, they may not see themselves as programmers at


all. This means that the language should not have Boolean types are obvious; for strings, it is worth men-
features that make it easy to make dangerous mistakes tioning that KernelF also support string interpolation,
(such as pointer arithmetics). To the contraty, the because this is usually more understandable to non-
language should be structured in a way that makes it programmers than concatenating strings with +:
straightforward to ”do the right thing”. For example, val concatString = "Hello " + anInt + " and " + (3 + anInt)
handling errors should be integrated into the type val interpolString = ’’’Hello $(anInt) and $(3 + anInt)’’’

system as opposed to C’s approach of making checking


of errno completely optional. It should also enable The number type needs a little bit more explanation.
advanced analyses, for example, through solvers, possible. A number has a range and a precision. The following
Importantly, it should ship with language abstractions patterns exist to specify number types:
for writing and running unit tests to facilitate test-driven // integer type, unlimited range
development. number => number[-inf|inf]{0}
// positive integer
IDE Support In our experience, DSLs must come number[0|inf] => number[0|inf]{0}
// integer type, range as specified
with an IDE, otherwise they are not accepted by users. number[10|20] => number[10|20]{0}
This means that an IDE must be available for the // decimal type with 2 decimal places, unlimited range
number{2} => number[-inf|inf]{2}
language, but also that the language should be designed // range as specified, precision derived from range decimals
so that it can be supported well by IDEs. Such support number[3.3|4.5] => number[3.3|4.5]{1}
The precision of numbers can be modified with the be an actual number value, or nothing (i.e., none),
precision operator: depending on the if condition and the value of aBool.
type preciseT: number[0|10]{5}
This is why the constant is typed as an option<number>
type roundedT: number[0|10]{2} instead of just number. The if expression then produces
type wholeT: number[0|10]{0}
val precisePI: preciseT = 3.14156
either none or 42.
val roundedPI: roundedT = precision<round up to 2>(precisePI)
val maybe : option<number> = if aBool then 42 else none
val wholePI wholeT = precision<cut to 0>(0)
test case Precision {
assert precisePI equals 3.14156 <number[0|10]{5}> Most operators, as well as many dot operations, are
assert roundedPI equals 3.15 <number[0|10]{2}>
assert wholePI equals 3 <number[0|10]>
overloaded to also work with option<T> if they are
} defined for T. If one of the arguments is none, then the
whole expression evaluates to none. In this sense, a none
There are also operators to ensure a value stays in its value ”bubbles” up. Note that the type system represents
bounds but cutting too big or too small values. this; the + operator and the length call in the example
val high = limit<wholeT>(20) below are also option types!
val mid = limit<wholeT>(5)
val low = limit<wholeT>(-1) val nothing : opt<number> = none
test case TestLimit { val something : opt<number> = 10
assert high equals 10 <number[0|10]> val noText : opt<string> = none
assert mid equals 5 <number[0|10]>
assert low equals 0 <number[0|10]> nothing + 10 ==> none <option[number[-inf|inf]{0}]>
} something + 10 ==> 20 <option[number[-inf|inf]{0}]>
noText.length ==> none <option[number[0|inf]{0}]>

We tried to use the various brackets consistently. We


use regular round parentheses for value constructors, To test whether an option actually contains a value, you
functions calls, built-in functions (like limit above) and can use the isSome expression as shown below:
for precedence. We use angle brackets for everything val maybeHasAValue : boolean = isSome(maybe)
that relates to types, specifically type arguments (as in
list<int>). Finally, we use square brackets for tuples, To explicitly extract the value from an option type
indexed collection access, number ranges (as shown (i.e., to essentially transform an option<T> to a T), a
above). Curly braces are used for blocks and in the special form of the if expression can be used for this
special case of number precision. purpose, as shown in the example below. As mentioned
above, the isSome expression is as a query that tests if
2.2 Basic Operators the option contains a value; inside the then part, the
KernelF provides the usual unary and binary operators, val expression refers to the value extracted from the
using infix notation. Precedence is similar to Java, option; val cannot be used in the else branch, so it is
parentheses are available. Note that the type system syntactically impossible to access the (then non-existent)
performs type inference (discussed in more detail in value in the option.
Section 5.2). As part of that, it performs basic arithmetic if isSome(maybe) then maybe else 0 ==> 42 <number[42]>
computations on the ranges of numeric types.
If the name val is ambiguous, then the name can be
42 + 33 ==> 75 <number[75|75]{0}>
42 + 2 * 3 ==> 48 <number[48|48]{0}>
changed using an as clause; the example also illustrates
aReal + anInt ==> 75.33 <number[75.33|75.33]> that several expressions can be tested at the same time.
if aBool then 42 else 33 ==> 42 <number[33|42]{0}>
if isSome(f(a)) as t1 && isSome(f(c)) as t2
type tt: number[-10|10] then t1 + t2 else 0
val n3, n4: tt = 0
val n34: number[-100|100] = n3 * n4
A shorthand operator opt ?: alt is also available; it
A few less trivial operators are also available, expressed returns the value inside the option if the option is a
as member functions. For example, you can test for some, and the alt value otherwise:
membership in a list of values or a range: val anInt = maybe(a, b) ?: 0

val fourtyTwo = 42
fourtyTwo.oneOf[33, 42, 666] ==> true <boolean>
fourtyTwo.inRange[0..42] ==> true <boolean>
2.4 Error Handling using Attempt Types
// notice open upper bracket: excluded upper limit
fourtyTwo.inRange[0..42[ ==> false >boolean>
In the same way that KernelF encodes null checks into
the type system using option types, KernelF also pro-
vides type system support for handling errors using
2.3 Null Values and Option Types attempt types. An attempt type has a base type that
Option types are used to handle null values in a typesafe represent the payload (e.g., return value in a function)
way. The constant maybe in the code below can either if the attempt succeeds. It also has a number of error
literals that have to be handled by the client code. An at- reports an error on the try expression directly if not all
tempt type is written down as attempt<baseType|err1, errors are handled:
err2,..,errN>. As a consequence of type inference,
val toDisplay =
such a type is hardly ever written down in a program. // try will have error b/c error404 is not handled
Error handling has two ingredients. The first step try complete getHTML("http://mbeddr.com") => val
error<timeout> => "Timeout"
is reporting the error. In the example below, this is
performed in the getHTML function. Depending on what Similar to option types, the attempt types are also
happens when it attempts to retrieve the HTML, it overridden wrt. to their success type for the same
either returns the payload or reports an error using operators and dot expressions. The error literals are
error(<error>). The type inference mechanism inferes propagated accordingly.
the type attempt<string|timeout, err404> for the
getHTML("http://mbeddr.com").length ==> 4
alt expression and, transitively, the function getHTML. <attempt[number[0|inf]{0}|[error404, timeout]]>
getHTML("http://doesntExist.com").length ==> error(error404)
fun getHTML(url: string) : attempt<string|timeout, error404>
<attempt[number[0|inf]{0}|[error404, timeout]]>
= alt |..successful.. => theHTML |
|..timeout.. => error(timeout) |
|..unreachable.. => error(error404) |
2.5 Functions and Extension Functions
The client has to “unpack” the payload from the attempt
Even though function syntax may be domain-specific,
type using the try expression. In the successful case,
KernelF includes a default abstraction for functions.
the val expression provides access to the payload of the
Functions have a name, a list of arguments, an optional
attempt type. Errors can either be handled one by one
return type and an expression as the body; the code
(as shown in Figure ??), or with a generic error clause.
below shows a few examples. The body can use the block
val toDisplay : string = expression, which supports values as temporary variables
try getHTML("http://mbeddr.com") => val
error<timeout> => "Timeout" (similar to a let expression, but with a more friendly
error<error404> => "Not Found" syntax). As with variables, the return type is optional.
As with the unpacking of options using isSome, it is fun add(a: number, b: number) = a + b
fun addWithType(a: number, b: number) : number = a + b
possible to assign a name to the result of the called fun biggerFun(a: number) = {
function, so that name can be used instead of val in the val t1 = 2 * a
val t2 = t1 + a
success case: t2
}
try getHTML("http://mbeddr.com") as data => data
...
KernelF also supports extension functions. They must
If not all errors are handled, the type of the try expres- have at least one argument, the one that acts as the this
sion remains an attempt type. In the above example, we variable. They can then be called using dot notation
may not handle the error404 case: on an expression of the type of the first argument. In
contrast to regular functions, the advantage is in IDE
val toDisplay =
try getHTML("http://mbeddr.com") => val support: code completion will only show those functions
error<timeout> => "Timeout" that are valid for the first argument. Note that, at least
for now, no polymorphism is supported.
In this case, the type of try, and hence of toDisplay,
would be attempt<string|error404>. This way, error ext fun isSomethingInIt(this: list<number>) = this.size != 0
list(1, 2, 3).isSomethingInIt() ==> true <boolean>
handling can be delegated to an upstream caller. To
force complete handling of all errors, two strategies can
be applied. The first one involves a type constraint to 2.6 Function Types, Closures, Function
express that the success type is expected: References and Higher-Order Functions
val toDisplay: string = KernelF has full support for function types, closures and
try getHTML("http://mbeddr.com") => val
error<timeout> => "Timeout"
function references as well as higher-order functions.
We start by using a typedef to define abbreviations
In an incomplete case, where not all errors are handled for two function types. The first one, INT_BINOP is the
(either individually or with a generic error clause), the type of functions that take two numbers and return a
type of try will remain an attempt type with the non- number. The second one represents functions that map
handled errors. If an explicit return type expects a non- one number to another. Using typedefs is not necessary
attempt type, this type incompatibility will return in an for function types, they can also be used directly. But
error. A way of forcing the try expression to handle all since these types become long’ish, using a typedef
errors is to use the complete flag, as shown below. It makes sense.
type INT_BINOP : (number, number => number) reals ==> list(1.41, 2.71, 3.14)
type INT_UNOP : (number => number) <list<number[0.00|100.00]{2}>>
reals.add(1.00) ==> list(1.41, 2.71, 3.14, 1.00)
<list<number[0.00|100.00]{2}>>
Next, we define a function mul that is of type INT_BINOP. reals.at(1) ==> 2.71 <number[0.00|100.00]{2}>
We can verify this by assigning a reference to that reals[2] ==> 3.14 <number[0.00|100.00]{2}>
names.isEmpty ==> false <boolean>
function (using the colon operator) to a variable mulFun names.size ==> 2 <number>
: INT_BINOP. Alternatively we can define a closure, i.e., hometowns["Tamas"] ==> "Budapest" <string>

an anonymous function, and assign it to a similarly


Notice that the reals.add(1.00) will lead to an
typed variable mulCls. Closures use the vertical bar for
error because it tries to add a 1.00 to a list of
deliniation.
number[1.41|3.14]{2}, i.e. 1.00 is out of range! To
fun mul(a: number, b: number) = a * b fix this, the reals collection must be given an explicit
val mulFun: INT_BINOP = :mul
val mulCls: INT_BINOP = |a: number, b: number => a * b| type, for example number[0.00|100.00]1 .
The usual higher order functions on collections are
We can now define a higher-order function doWithTwoInts also available. They can be used in three forms: you can
that takes two integers as arguments, as well as value of pass in a function reference, a closure (both introduced
type INT_BINOP. The body of the function executes the before), and also a shorthand version of the closure,
function or lambda, forwarding the two arguments. where the it argument is implicit. The latter is the
The next two lines verify this behavior by calling default.
doWithTwoInts with both mulFun and mulCls. val ints = list(1, 2, 3, 4)
fun isGreaterTwo(it: number) = it > 2
fun doWithTwoInts(x: number, y: number, op: INT_BINOP) = ints.where(:isGreaterTwo) ==> list(3, 4)
op.exec(x, y) <list<number[1|4]>>
doWithTwoInts(2, 3, mulCls) ==> 6 <number> ints.where(|number r => r > 2|) ==> list(3, 4)
doWithTwoInts(2, 3, mulFun) ==> 6 <number> <list<number[1|4]>>
ints.where(|it > 2|) ==> list(3, 4)
Finally, KernelF also supports currying, i.e., the bind- <list<number[1|4]>>

ing of some of a function’s arguments, returning new More examples are shown below; the list of operations
functions with correspondingly fewer arguments. The is expected to grow over time.
value multiplyWithTwo in the example below is a func-
ints.map(|it + 1|) ==> list(2, 3, 4, 5) <list<number>>
tion that takes one argument, because the other one ints.any(|it < 0|) ==> false <boolean>
has already been bound to the value 2 using bind. ints.all(|it > 3|) ==> false <boolean>
We could add an optional type to the constant (val
There is also a foreach which requires the lambda
multiplyWithTwo: INT_UNOP = ...) to verify that the
expression inside to have a sideeffect; it ”performs” the
type is indeed INT_UNOP. For demonstration purposes
sideeffect and then returns the original list.
we define another higher-order function and call it.
Inside where, foreach and map, the variable counter
val multiplyWithTwo = mulCls.bind(2) is available; it has a zero-based index value of the current
fun doWithOneInt(x: int, op: INT_UNOP) = op.exec(x)
doWithOneInt(5, multiplyWithTwo) ==> 10 <number> iteration (i.e., 0 in the first iteration, 1 in the second,
etc.).
2.7 Collections 2.8 Tuples
KernelF has lists, sets and maps. All are subtypes Tuples are non-declared multi-element values. The type
of collections. While KernelF does not have generics is written as [T1, T2, .., Tn], and the literals look
in general, the collections are parametrized with their essentially the same way: [expr1, expr2, .., exprN].
element types. They are also covariant. Tuple elements be accessed using an array-access-like
bracket notation.
val reals = list(1.41, 2.71, 3.14)
val names = set("Markus", "Markus", "Tamas") ext fun minMax(this: list<number>) = [this.min, this.max]
val hometowns = map("Markus"->"Heidenheim", ints.minMax() ==> [1, 4] <[number, number]>
"Tamas" ->"Puspokladany") ints.minMax()[0] ==> 1 <number>
val col : collection<real> = reals ints.minMax()[1] ==> 4 <number>

The collections support the usual simple operations,


2.9 Records and Path Expressions
a few are shown in the following example code. Of
course, like all other values in KernelF, collections are Like Tuples, records are structured data, but they are
immutable; the operations to not modify the value on explicitly declared. KernelF has them primarily for
which they are called, the return a modified copy. This is 1 In later version of the type system, a suitable type might be
illustrated by the second line, where the original reals derived automatically. Currently, the element added to a list must
list is still list(1.41, 2.71, 3.14). be a subtype of the element in the list.
Figure 2. The builder expression uses collapsible trees
to build hierarchical structures.
completeness; we expect most data structures to be
domain-specific and hence contributed by a language Figure 3. Example code showing grouping, projection
that embeds KernelF. of anonymous records and string joining/termination.
record Company {
offices: list<Office> the type of the key expression and MT is the type of
emps : list<Person> the members of each group (the type of the original
}
record Person { collection). In the example in Figure 3 the KT is string
lastName : string and the MT is Item. On a variable of type group<KT, MT>
middleInitial: option<string>
firstName : string one can use the key operation to retrieve the current
} group’s key, and members to acccess all the members of
record Office {
branchName: string that group.
}
Anonymous Records The project operation sup-
A literal syntax is also supported: ports the on-the-fly creation of anonymous records. In
the example in Figure 3, we create one that has two
val officeLuenen = #Office{"Luenen"}
val comp = #Company{
fields, author and cats.2 project is typed to a col-
list(#Office{"Stuttgart"}, lection of this anonymous record. As a consequence of
officeLuenen),
list(#Person{"Markus", none , "Voelter"},
type inference, the anonymous record can be used with
#Person{"Tamas", "M", "Szabo"}) full IDE support; however, since the type has no name
}
it cannot be mentioned in the program. So, for exam-
Path expressions can be used to navigate along nested ple, the authorCats value could not be annotated with
records structures, as shown in the examples below. an explicit type, and it cannot be used as a function
argument (because this would require an explicit type).
comp.emps.firstName
==> list("Voelter", "Szabo") <collection<string>> String Lists Lists of strings can be transformed into
comp.emps.firstName.last
==> "Szabo" <string>
a single string using the join(s) and terminate(s)
comp.emps.map(|Person p => "Hello " + p.firstName|).first expressions. join separates two subsequent strings by s,
==> "Hello Markus" <string>
whereas terminate terminates each one with s. Figure
In addition, a semi-graphical builder expression is avail- 3 shows an example.
able for constructing complex structures. An example 2.10 Enums
is shown in Figure 2. It can be used for any hierarchi-
cal structure, not just records, if a suitable adapter is Enums are also supported in KernelF, with regular and
provided. valued flavors. Regular enums just define a list of literals;
Like all other values in KernelF, record instances are their type is the enum itself (see the use of Color in
immutable. However, there is a convenient syntax to the code snippet below). Literals can be marked as
“modify” record instances, i.e., create copies with some qualified, which means that their literals have to be
member values changed: referenced using enum name before the colon to deal
with potentially overlapping literal names.
val me = #Person{"Markus", none, "Voelter"}
val meWithX = me.with(firstName = old + "X", enum Color { red, green, blue }
lastName = lastName + "X") enum Starbucks qualified { large, venti, monster }
val meSwitched = me.with(firstName = lastName, val ocean: Color = blue
lastName = firstName) val coffee = Starbucks:large
val brother = me.with(firstName = "Mathias")

brother ==> #Person{"Mathias", none, "Voelter"} <Person> Valued enums associate an arbitrary value with each
meWithX ==> #Person{"MarkusX", none, "VoelterX"} <Person> literal; all values of a particular enum must be of the
meSwitched ==> #Person{"Voelter", none, "Markus"} <Person>
same type. That type is declared after the name of
the enum, adding that type makes an enum a valued
Grouping The groupBy operation supports grouping
the entries in an existing collection by a key. The result 2 This is short for categories and does not related to the animal
is a new collection of type group<KT, MT> where KT is :-).
enum. From an enum literal reference, you can get the
associated value using the value operation.

enum StarbuckSizes<number> {
big -> 100
venti -> 200
mega -> 300
}

enum Family<Person> {
me -> #Person{"Markus", none, "Voelter"}
myBrother -> #Person{"Mathias", none, "Voelter"}
}

me.value.firstName ==> "Markus" [string]


big.value ==> 100 [number]

Figure 4. Constraints can be attached to values, to


functions (in the form of pre- and postconditions), to
2.11 Unit Tests and Constraints records, and to types. In the latter case, they are checked
Tests Built-in support for unit tests is important, whenever a type is explicitly specified in values, function
because, as we describe in Sec. 5.3, the semantics of arguments, return types and type constraint expressions.
KernelF is defined via a test suite; so we needed the
ability to conveniently write collections of unit tests even • Attached to functions in the form of pre- and post-
during the implementation of KernelF. Test support is conditions. They are checked before and after the
also essential to help users write good code. execution of the function, respectively.
At the core of the unit test support is the test case: a
Constraint failures lead to a target platform-specific
test case has a name and a number of test case items. The
form of diagnostic output. The default implementation in
default item is an assertion that compares an expected
the interpreter throws a ConstraintFailureException
and an actual value. The comparison operator itself is
(whose occurence can be tested using the confail test
equation by default, but can be extended through lan-
item). The output of the exception logs a stacktrace of
guage extension. The second test case item is confail,
the failed constraint; see below. The long URL in line
which expects a constraint failure to occur as part of the
two is the URL of the node in the MPS source code that
evaluation of the actual result (see below for constraints).
failed; you can paste it into your browser, and MPS will
The constructs that can go into a test case, the test case
select the particular node.
items, can be extended as well. For example, users can
add set up or tear down code if they want to test ex- 1 ERROR: Postcondition failed for res.inRange[0..1]
2 http://localhost:8080/select/DEFAULT/r:3dff0a9...
pressions with side effects. A test suite finally groups 3 at [Function] PaperDescription.oddOrEven(10)
tests, plus other top level contents (records, functions, 4 [Function] PaperDescription.function1(10)
5 [Function] PaperDescription.function2(10)
constants). It is also possible to reference entities outside
the test suite. Fig. 10 shows an example. In case of a failed constraint, execution terminates. If, in
Constraints KernelF also supports checking of run- the example above, the error should be communicated
time constraints. Several forms exist, all illustrated in back to the caller, regular error handling should be used:
Fig. 4: fun oddOrEven(i: number) = alt | i == 1 => success(0) |
| i == 2 => success(1) |
| i == 3 => success(0) |
• Attached to a value: it is checked after the value has | i == 4 => success(1) |
been computed. | otherwise => error(range) |

• Attached to a typedef or record, it is checked when-


For constraints on types, it is also possible to query the
ever a value is checked against an explicitly declared conformance of a value against this type explicitly from
type: when assigning to a value, when returning from the program (i.e., without throwing a runtime exception).
a function, and when passing an argument into a func- Types can contribute constraints as well as custom error
tion. For chained typedefs, the constraints are joined messages that can be reported to the user.
in a conjunction (”anded” together). Constraints for
type Speed: number[-50|250]
records are also checked when the record is instan- type FwdSpeed: Speed where it >= 0
tiated using a record literal #R{..} or when it is
val validSpeed1 = check<Speed>(-10)
”changed” using the with operation. val validSpeed2 = check<Speed>(50)
• Type check on an expression: it checks the type and val invSpeed = check<Speed>(300)
val invFwdSpeed1 = check<FwdSpeed>(-10)
also the constraints associated with the type. val invFwdSpeed2 = check<FwdSpeed>(300)
by the type system, but the value has to conform to the
test case TestConstraintsCheck {
assert validSpeed1.ok equals true
target type at runtime. Note that there is no way to
assert validSpeed2.ok equals true avoid all static and runtime checks; KernelF always at
assert if validSpeed1 then 1 else 0 equals 1
assert invSpeed.ok equals false
least provides runtime safety.
assert invSpeed.err equals "value is over minimum (250)"
assert invFwdSpeed1.ok equals false 2.12 Type Tags
assert invFwdSpeed2.ok equals false
} A type tag [1] is additional information attached to the
type, that is tracked and checked by the type system.
If you want to test for constraints explicitly, you cannot Consider a web application that processes data entered
assign the type to the variable, because this would lead by the user. A function process(txt: string) may be
to a constraint before the explicit test gets invoked. Thus, defined to handle the data entered by the user. To ensure
the following code would be illegal, because the assertion that txt does not contain executable code (cf. code-
in the test case would never be executed; the runtime injection attacks), the string has to be sanitized. Until
constraint check in val aSpeed: Speed = 300 would this happens, the data must be considered tainted [2].
occur first. Type tags can be used to ensure that a function can
val aSpeed: Speed = 300 only work with sanitized strings:
val validSpeed = check<Speed>(aSpeed)
test case TestConstraintsCheck { // returns an arbitrary string
assert validSpeed.ok equals true fun getData(url: string) : string { "data" }
} // accepts a string that must be marked as sanitized
fun storeInDB(data: string<sanitized>) : boolean = ...
...
Using an unconstrained integer (or not specifying a type // v is a regular string
at all) solves this problem: val v = getData("http://voelter.de")
// trying to pass it storeInDB fails because it
val aSpeed: number = 300 // does not have the sanitized tag
val validSpeed = check<Speed>(aSpeed) val invalid = storeInDB(v) // error
// sanitize is a special operator that cleans up the string,
// and them marks it as sanitized; passing to storeInDB works
Forcing Types Assigning a “bigger” type to a “smaller” val valid = storeInDB(sanitize[v])

type is prevented by the type system; thus the following


error: The sanitized tag is an example of a unary tag.
A type can be marked to have the tag (<tag>), to
val bigRge : number[0|100] = 50
val smallRge : number[10|20] = bigRge // error
not have the tag (<!tag>), or to be unspecified. The
tag definition determines the type compatibility rules
However, in the following piece of code, we know that between those three options. For sanitized, a type with
the value will fit into number[10..20], even though the no specification corresponds to <!sanitized>; in other
type system cannot figure it out3 and will report an words, if we don’t know, we cannot assume the string
error. has been sanitized.
In addition, the system supports n-ary tags as well.
val smallRge : number[10|20] =
if bigRge > 20 then 20 else bigRge They define a set of tag values (e.g., confidential,
secret, topsecret) with an ordering between them
To solve this issue, you need an explicit type cast: (e.g., confidential < secret < topsecret). The type
val smallRge : number[10|20] = checking for tags takes this ordering into account, as is
cast<number[10|20]>(if bigRge > 20 then 20 else bigRge) illustrated by the code below:
A cast essentially prevents type checks and delegates val somethingUnclassified : string = "hello"
val somethingConfidential : string<confidential> = "hello"
checking to runtime; in other words, the runtime con- val somethingSecret : string<secret> = "hello"
straint checks of the target type are applied to the value val somethingTopSecret : string<topsecret> = "hello"

returned by the cast expression (range between 10 and fun publish(data: string) = ...
20 in this case). Note that, because of type inference, val p1 = publish(somethingUnclassified)
val p2 = publish(somethingConfidential) // ERROR
the type of the val can be omitted, resulting in the val p3 = publish(somethingSecret) // ERROR
following code: val p4 = publish(somethingTopSecret) // ERROR

val smallRge = fun putIntoCIAArchive(data: string<confidential+>) = ...


cast<number[10|20]>(if bigRge > 20 then 20 else bigRge) val a1 = putIntoCIAArchive(somethingUnclassified) // ERROR
val a2 = putIntoCIAArchive(somethingConfidential)
val a3 = putIntoCIAArchive(somethingTopSecret)
To recap: a type specified on an argument or value is val a4 = putIntoCIAArchive(somethingSecret)
checked by the type system. A cast type is not checked
fun tellANavyGeneral(data: string<secret->) = ...
val g1 = tellANavyGeneral(somethingConfidential)
3 Futureversion of the type system may be able to figure it out val g2 = tellANavyGeneral(somethingSecret)
by improving the number range caclulcaton. val g3 = tellANavyGeneral(somethingTopSecret) // ERROR
val g4 = tellANavyGeneral(somethingUnclassified) val a = 1 + 2
val b = 3 + a
val x = a + b

3. Stateful KernelF 1 + 2 creates a new value 3, and adding a and b creates


3.1 Effects Tracking a new value c. Values can also not be reassigned because
anybody who has a reference to x now sees the value of
KernelF at its core is a functional language and none
x change.
of the expressions in KernelF have a side effect. This
means, for example, that an execution engine can cache val x = a + b
x = x + 1 // invalid
the results of functions that are called repeatedly with
the same arguments; the default KernelF interpreter Instead you have to invent a new name for the new value,
does this. However, KernelF may be extended to sup- however, this leads to many new (temporary) names.
port expressions with side effects or be embedded in a Let us look at collections. Assume you have a list of
language that has effects. Then, it must be possible to three elements and you add a fourth one:
analyze which functions (or other parts of programs) can
val l1 = list(1, 2, 3)
be cached, and which cannot because they have effects. val l2 = l1.plus(4)
Similarly, it must be allowed to call a function with an assert l1.size == 3
assert l2.size == 4
effect without capturing its return value (which is an
error otherwise). Here, too, the original list remains unchanged and you
To enable this, KernelF supports effects tracking. It get a new list, one that now has a fourth element, as the
distinguishes between read and write effects, and for result of l1.plus(4).
write effects it also tracks idempotence. So, how do you store changing global state, for
Consider the following example: example, a database of measurements? Using a new
fun standardize/RM(data: number) { variable for each updated “state of the database” is not
val filtered = filter(data) a solution because it is the database that is supposed
effect[data]
if filtered > data then filtered else data to change. One solution would be to introduce variables
} (as opposed to the values used so far):
Here, effect[..] is a demo expression provided by a var db = list(1, 2, 3) // note the r instead of the l
fun store/M(x: int) {
language extension that has a side effect. This is signalled db.add(x)
to the checker by implementing IMayHaveEffect in the }

language concept and returning an EffectDescriptor For this to work, you will have to mark the add operation
from its effectDescriptor method; the descriptor has to have an effect, which will, transitively, also give store
Boolean flags for the various supported kinds of effects. an effect. However, add does not exist on immutable
Because it is called inside the standardize function, lists, so you need a whole second set of APIs for mutable
that function must also be marked to have an effect. collections. The list in this example cannot be the same
This is done by entering /R (reads), /M (modifies) or list as the one used earlier; it’s a mutable list, maybe
/RM (reads + modifies) behind the function name; an called mlist. In clonclusion, you need mutable versions
error will be reported otherwise. The mechanism also of all collections. This approach is a valid solution, and
works for function types: you can mark a function type some languages, for example, Scala, use it. However, it
as allowing effects, by entering the flag after the arrow is a lot of work and should be avoided.
in the function type; this is shown in the argument of
the function below. If declared this way, it is legal to Boxes Boxes are an alternative approach that do not
pass in functions that has an effect (or not). require mutable version of all immutable data structures.
Boxes explicitly values inside. The box itself is immutable
fun doSomethingWithAnEffect/RM(f: ( =>/RM string)) =
f.exec/RM()
(i.e., its own reference stays stable), but its contents can
change:
Note that the function call (to exec in this case) is val globalcounter: box<int> = box(0)
automatically marked to have an effect if the called fun incrCounter() {
globalcounter.update(globalcounter.val + 1)
function has an effect. }

3.2 Boxes Apart from creation, boxes have two operations: a


Immutable data means that you cannot change a value val that returns the contents of the box, as well as
once it has been created. For primitive types, this is a update(newval) that sets a new value. The former
intuitive: has a read effect, the latter a modify effect. When you
update the box’s content, you pass a new value; you do
states, an initial state and operational state. The init
event goes from the initial init state to the count
state, where it then accepts inc events. If the by value
is less than 10, the counter gets incremented, other-
wise a counter of invalids incrementation attempts is
increased.
Since the state machine’s purpose is to represent
changing state over time, we don’t have to pretend any-
thing is immutable. This is why we allow an assignment
operator := inside a state machine. Inside a state ma-
chine you can also read one of its variables by just
Figure 5. A state machine is an example of natively mentioning its name (as in invalids + 1); you don’t
mutable data structure. need the val.
The following code shows how to use a state machine
not need additional APIs for changing value. The boxes from client code:
themselves are generic, as shown with the next example
val ctr = start(CounterToMax).init(0) // start creates
of boxed collections: instance
fun doStuffWithCounter() {
val db = box(list(1, 2, 3)) // we’re back to a value here!
ctr.inc(5) // now 5
fun store(x: int) {
ctr.inc(3) // now 8
db.update(db.val.plus(4))
ctr.inc(20) // invalid; still 8
}
assert ctr.counter == 8
assert ctr.invalids == 1
The big advantage of this approach is that no mutable }

data structures are required, the original immutable


APIs (plus the generic box functionality) are enough. Note that even though there is mutable state (and the
However, the syntax is a little bit chatty. To make it various operations on state machines have effects), there
more consise, the it expression provides access to the are no boxes; no update or val is required. However,
current content of the box: internally the state machines still have box semantics
(in the implementation, several interfaces for IBoxLike
val globaxlcounter = box(0)
val db = box(list(1, 2, 3)) things are used to generalise box-like behavior). But
fun incrCounter() { globalcounter.update(it + 1) } state machines have been purpose-built to have state,
fun store(x: int) { db.update(it.plus(4)) }
there is no need to reuse existing immutable APIs, as
was the case for primitive operators and collections.
Interpreter In terms of implementation, for ex-
ample, in an interpreter, boxes are really just wrap- Interpreter Let’s look at the interpreter. To imple-
per objects with a method to get and set a generic ment the variable references inside state machines, we
java.lang.Object box content. The val and update use an interface ICanBeUsedAsLValue to mark that they
operations call those methods on the runtime Java ob- can be used on the left side of an assignment (an “lvalue”).
ject. The interface has a method isUsedAsLValue that de-
tects structurally, from the AST, if a particular variable
1 public class BoxValue {
2 private Object value; reference is on the left side of an assignment. The in-
3 public BoxValue(Object initial) { this.value = initial; } terpreter uses this method to determine what it needs
4 public void set(Object newValue) { this.value = newValue; }
5 public Object get() { this.value; } to evaluate to: the box if it is used as a lvalue, and the
6 } box contents otherwise. Here is the generic interpreter
for the assignment; note how it relies on the runtime
3.3 Native Mutable Data representation of things that can be lvalues to imple-
ment the ILValue interface to generically implement
The reason for boxes is that existing immutable data
this functionality:
types can be used in a mutable way. However, this is
only useful if you have immutable data structures to 1 Object rvalue = #right; // recursively call interpreter
reuse this way in the first place; some data structures are 2 Object lvalue = #left; // on the two arguments
3 if (lvalue instanceof ILValue) { // must be an ILValue
inherently mutable, and they can use a box-free syntax. 4 // which has update method
5 ((ILValue) lvalue).updateValue(rvalue);
State Machine Example The embodiment of chang- 6 } else {
ing state are state machines, and Figure 5 shows minimal 7 throw new InvalidValueException
8 (node.left, "not an ILValue");
one that represents a (slightly contrived) counter: 9 }
It defines two events, one to initialize the machine’s 10 return rvalue;

counter, and the other one to increment it. It has two


In the case of state machines, the interpreter plays to-
gether with, the VarRef concept that represents refer-
ences to state machine variables:
1 SmValue currentMachine = (SmValue) env[SmValue.THIS];
2 SMVarValue value = currentMachine.getVar(node.var);
3 if (node.isUsedAsLValue()) {
4 return value; // returns the box
5 } else {
6 return value.value(); // returns box contents
7 }

It first retrieves the currently executing instance of the


state machine from the environment (the triggers put
that there), and then asks the current state machine for
the variable that it references. Note that this returns
the ILValue-implementing class that represents the
variable. Then comes the crucial distinction: if the
current variable reference is used in lvalue position, we
return the ILValue (so that the assignment interpreter
can call update). Otherwise we directly return the
contents of the box (e.g., an int) Figure 6. An example of using transactions with dif-
3.4 Transactions ferent mutable data structures.

Take a look at the following code: of state machines and other boxes, as shown in the ex-
ample below where the state machine modifies other
type intLE5: int where it <= 5
val c1: box<intLE5> = box(0) global data.
val c2: box<intLE5> = box(0) The language also supports nested transactions
fun incrementCounters(x1: int, x2: int) {
c1.update(it + x1) (which can be rolled back individually) as well as the
c2.update(it + x2) distinction between starting a new transaction (with
}
fun main() { newtx) and a block requiring to be executed in an
incrementCounters(1, 1) existing transaction (using intx).
incrementCounters(3, 5)
} Interpreter The reason why transctions work also
with state machines is that the current total state of
Boxes respect the constraints on their content type: if a state machine is also an immutable object; in other
you set a value that violates a constraint, than the update words, it also implements ITransactionalValue. The
fails. What actually happens then is configurable, at least implementation of the transaction in the interpreter
in KernelF’s default interpreter: output a log message looks like this:
and continue, or throw an exception that terminates the
1 Transaction tx = new Transaction(node);
interpreter. While, in the second case, the program stops 2 env[Transaction.KEY] = tx; // store in env for nested calls
anyway, and so it does not matter which value is set, 3 try {
4 Object res = #body;
in the first case we run into the problem that, for the 5 tx.commit();
second invocation of incrementCounters, c1 is updated 6 return res;
7 } catch (SomethingWentWrong ex) {
correctly, but the update of c2 is faulty. Transactions 8 tx.rollback();
can help with this: 9 } finally {
10 env[Transaction.KEY] = null; // no tx active anymore
fun incrementCounters(x1: int, x2: int) newtx{ 11 }
c1.update(it + x1)
c2.update(it + x2) This form of transactional memory is also used in Clojure,
}
as far as I understand.
A transaction block is like a regular block, but if some- 3.5 State Machines
thing fails inside it (interpreter: an exception is thrown),
it rolls back all the changes to mutable data inside that We have introduced basic state machines above. In this
transaction. Because the box contents themselvers are section we’ll introduce the remaining features of state
immutable, the interpreter simply stores the value of machines.
each box (or more generally, ITransactionalValue) be- Nested States States can be nested. A state S that
fore it performs the update and remembers them in the itself contains states considers the first F one as the
transaction. On rollback, it just re-sets the value. This initial state. Any entry into S automatically enters F,
also works with state machines, and with combinations recursively.
Actions State machines support entry and exit actions MPS provides syntax highlighting, code completion,
on states as well as transition actions. Ordering of their goto definition, find usages, and type checking. Because
execution is always exit-transition-entry. For nested MPS is a projectional editor, it also implicitly provides
states, the exit actions are executed inside-out, the entry formatting. Since all of this is pretty standard, we will
actions are executed outside in. not discuss this further.
Automatic Transitions In addition to transitions What is worth mentioning is that this IDE support
that are triggered by events (expressed using the on also automatically works for all extensions of KernelF,
keyword), automatic transitions are also supported. and it keeps working if KernelF is embedded into an-
They are introduced by the keyword if and do not other language. No ambiguities arise from combining
include a triggering event, only a guard condition. They grammars, and no disambiguation code has to be writ-
are executed upon state entry (after the entry actions) ten.
or if no triggered transition fires. 4.2 Interpreter
Timeouts A particular use case for automatic tran- KernelF comes with an in-IDE interpreter that directly
sitions is to use the timeInState variable in the guard interprets MPS’ AST. The semantic implementation
condition to implement time-dependent behaviours. It of the language concepts is implemented in Java. Note
contains the time since the last (re-)entry of the state. that it is not optimized for performance (in which case
Notice that if a transition on E -> S fires, this counts a completely different architecture would be required),
as a reentry. If you want to “stay” in the state, then avoid but for quick feedback for DSL code, in particular for
the -> S. Note that if you do not specify a target state, test cases. The interpreter can be executed on assert
then the transition must have an action. A transition entries in test cases; it can be started either from the
with no action and no target state is illegal (because it context menu or with Ctrl/Cmd-Alt-Enter. Complete
does not do anything). test cases and test suites can also be executed using the
same menu/keys.
3.6 Clocks
Notice that the interpreter performs extensive caching
KernelF supports clocks. There is a built-in type for expressions that have no effects. In particular, func-
clock whose values have a time operation that re- tion calls with the same arguments are executed only
turns the current time millis of the underlying clock. once (per interpreter session) if the function has no effect.
New values of type clock can be created by us- It is thus important that effect tracking is implemented
ing two expressions: systemclock returns a clock correctly in language concepts.
that represents the clock of the underlying system.
artificialclock(init) returns a clock initialized to 4.3 Read-Eval-Print-Loop
the init value. Note that artificialclock is also KernelF ships with a read-eval-print-loop (REPL; Figure
of type artificialclock, which, in addition to time, 7 shows an example). It is represented as its own root
also has an advanceBy(delta) operation that moves and is persisted; but its interaction is more like a
the clock forward by delta units. The tick operation console in the sense that whenever you evaluate an entry
corresponds to advanceBy(1). (using Ctrl/Cmd-Alt-Enter) the next one is created
Artificial clocks are useful for testing. However, built- and focused. Each entry is numbered, and you can refer
in expressions such as the timeInState mentioned above to each one using the $N expression.
default to the global clock. By default, the global clock is By default, each entry in a REPL is evaluated once,
the systemclock. If you want to use an artificial clock and you “grow” the REPL by adding new expressions.
for testing. you must register it as the global clock using However, by checking the downstream updates option,
the §global-clock pragma. you can change any REPL expression, and all the
transitively dependent ones are then reevaluated as well.
4. Tooling The easiest way to start a REPL is to select any
expression in a KernelF program and use the Open
4.1 MPS-based IDE REPL intention. It then creates a new REPL, adds the
The KernelF language is of course not dependent on expression in the first entry and evaluates it. By using
any particular IDE. However, what makes KernelF the Close and Return button in the REPL, the REPL
relevant (and not just another functional language) is is deleted and the node from which it was opened is put
its extensibility and embeddability. For this, it relies back in focus.
on MPS’ meta programming facilities. In other words,
KernelF can only be sensibly used within MPS. This 4.4 Debugger
also means thart the IDE support MPS provides is the One of the benefits of a functional language is that there
IDE support for KernelF. Like for any other langauge, is no program state to evolve; all computations can be
(consider val f = x() + x(), recursion, or the lamb-
das in higher-order functions). The frame tree shows the
hierarchical nesting of those computation steps. Each
node in the tree has an optional label (for example, cond
or then), the (abbreviated) syntax, the (abbreviated)
value and the time it took to compute it5 . The tree node
shows a yellow [E] if that node has (had) an effect. If
the node throws a constraint failure, this is highlighted
in red, in place of the blue value.
Next to the frame tree we see the value inspector.
When clicking on a node in the tree, the inspector shows
Figure 7. An example of a REPL session on a clock the structure (if any) of the value of the tree node. For
expression. example, an instance of a record as a tree, and if an
expression returns an MPS node, that node is clickable,
selecting that node in the MPS editor.
When double-clicking a node in the frame tree, the
respective node is decorated in the source. As shown
in Figure 9, it associates a value with each AST node.
Depending on the node’s complexity, it shows no value
at all (for literals, because the value would be the same
as the node syntax), or shows it next to or below the
Figure 8. The frame tree as shown in the debugger. node. The color is goverened by the nesting depth. The
decorated code always represents one particular value
seen as a tree of computed values. This means that assignment. Thus, to debug the values for lambda in
debugging does not require the step-and-inspect style the iterations of a coll.where(lambda) higher order
we know from imperative languages. Instead, debugging functions, you would click on the respective nodes in the
can just illustrate the computation tree in a convenient frame tree, highlighting each instance in the code.
way. Debugger UI The debugger opens a new frame tree
KernelF ships with a debugger that is based on this for each root for which the user opens the debugger. The
approach. Fundamentally, a computation in the KernelF red X closes the current tab. The green arrow reexecutes
interpreter collects a trace, and this trace can be in- the same root, if it is reexecutable (as determines by the
spected.4 The debugger, also known as the tracer, can be debugged program node). This is useful after updating
invoked for anything that has no upstream dependencies, the code. Node that the expansion state of the tree
i.e., test case assertions, gloval values and functions is retained across reexecutions. The little grey round
that have no arguments. Other domain-specific “main X removes all code decorations created by the current
program like”-constructs may be available in a DSL. tab. The blue filter icon toggles between the regulae
Whereas the interpreter is invoked via Ctrl-Alt-Enter, tree where only coarse-grained frames are shown and
the debugger is invoked with Ctrl-Alt-Shift-Enter a view where all interpreter steps are included. While
(or the Show Trace menu item in the context menu of this is usually overwhelming, it can sometimes be useful.
the respective program node). The reset arrow reverts the tree to its original expansion
Debugger Components The debugger comes with state (see below). The collapse all and expand all buttons
three components: the frame tree, the value inspector should be obvious.
and the code decorator; we will discuss each in turn. Breakpoints and Run To Breakpoints and Run
The frame tree shows a hierarchy of frames. Frames are To are two features known from classical debuggers. A
“coarse-grained” entities in the computation tree such as breakpoint stops execution on a specific statement, and
functions and function calls, local values or if expres- Run To runs the program until it reaches a particular
sions. Importantly, the tree does not show the program statement. We have adapted these ideas in the tracer to
nodes, it shows the computation steps involving these the world of debugging functional code. A program node
program nodes. This is important, because any node can be marked as REVEAL using an intention. Marked this
may be executed several times during as comoputation, way, when the debugger is invoked, the tree is expanded
but with differnet values, producing a differnet result to show all instances of that node, marked with a red
4 The trace can also be collected from other sources, for example,
a KernelF program that has been generated to Java code, as long 5 We might evolve the tracer to also support a simple form of
as the runtime also collects trace data. profiling in the future.
Figure 9. Decorated code that associates values with syntax nodes.

maximum complexity thresholds can be defined to force


developers to write ”unit tests” (low complexity/size)
and ”integration tests” (higher complexity/size).
Interpreter coverage refers to the coverage of the
interpreter that runs the language by default. It verifies
that the evaluator for all language concepts is executed
at least once. By marking branches in the interpreter,
one can also ensure that all relevant branches in the
interpreter code are executed at last once. Furthermore,
if the interpreter works with collections (such as an
argument list of a function), one can check that the
interpreter runs at least once with an empty list, with
Figure 10. Test suites in KernelF. They can either be
a list of one element, and with a list of more than one
executed automatically (as part of MPS’ type system)
element. Finally, the interpreter coverage analyzer can
or on demand (by pressing Ctrl-Enter at any level in
also track the ranges and distributions of numeric (and
the suite). Color coding highlights success and failure.
potentially other) values to make transparent the range
[R]. This way it is easy to identify a particular node in of numbers used to exercise the interpreter.
an execution trace. Run To means that you execute the The main limitation of the analyzer is that it does
program to a particular point. In the tracer, the Select not analyze combinatorial coverage, i.e., the possible
Next Trace selects the next trace for the node on which combinations of language concepts and/or value ranges.
it is called in the tree. Select All Traces highlights
all of them. 4.7 Test Case Generation
KernelF supports test case generation; an example is
4.5 Test Execution shown in Figure 12. While this requires a more detailed
The default execution mechanism for test suites is explanation, here are the core characteristics. The gen-
the built-in interpreter. Depending on the execute erator works on any language construct that accepts a
automatically flag, tests are run automatically (techni- vector-style input, such as functions. There are different
cally, in the MPS type system) or manually. In the latter producers6 , currently we support random (which creates
case, Ctrl-Alt-Enter triggers a test item, a test case the specified number of random values that are each
or a test suite, depending on where Ctrl-Alt-Enter is compatible with the n-th vector element’s type) and
pressed. eqclass (which seelcts “interesting” values for each type
and then generates vectors with all permutations). If a
4.6 Coverage Measurement
vector is executed, several things can happen:
KernelF is being used for a wide range of applications,
some of them in safety-critical areas. It is thus impor- • A precondition (if one is given) can fail, reported
tant to ensure the quality of the language itself, plus its as [PRE] error message. Using an intention, such
extensions. This is why KernelF ships with a coverage vectors can be marked as Invalid Input, which,
analyzer for its test cases. The coverage analyzer pro- when running the vector again, makes the vector
vides structural and interpreter coverage checking. An green. A second intention can physically remove all
assessment reports various staticstics on the coverage, Invalid Input vectors.
as shown in Figure 11. • A postcondition (if one is given) can fail. This is a
Structural coverage means that the analyzer checks genuine test failure and must be adressed.
that all properties, children, and references are used in
test cases. Heuristics assess the average complexity and 6 Currently they work only for primitive types, not for collections
size of the expressions in the test case. Minimum and or records. This will be improved in the future.
• Replacement of some boolean relations with others,
e.g. > with >=, == and <=
• Exchange of the then and else part of conditions.

Currently we support mutation testing only vector test


itemis, i.e., those that define a set of test vectors for a
single test subject. They are also used for the test case
generation discussed above.
Figure 13 shows an example. A vector test item is used
for the function add, and, using an intention, we have
attached a mutator to the item. Using another intention,
the specified number mutation attempts can be executed.
Technically, we create a clone of the current model for
each mutation; those mutations where the set of tests
does not fail are kept around; the other are deleted
(unless keep all is set to true). Another intention can
be used to delete all the mutant models.
The original model, the one where we started the
mutation process, contains pointers to all the mutated
nodes to provide an overview of the problematic code
locations; they are attached to the mutator with the ->
Figure 11. Example of interpreter coverage measure- notation. Following the references leads to the mutated
ment. Users specify the language, concepts that should code which shows the new and the original node side-by-
be ignored (because they are not interpreted and should side. A couple of examples are shown in Figure 14. Note
hence not be part of the coverage analyis). The analyzer that the mutator can also touch indirectly used functions;
reports missing branches, calculates a coverage ratio, the particular scope of the mutations is defined by the
and tracks number ranges. test subject adapter.

• If an expected result is specified, and the vector 5. Design Decisions


evaluates to something else, this is also a failure that
Based on the goals for KernelF outlined in Sec. 1.1, we
must be adressed.
have made a number of design decisions which we outline
• If not result value is given, and no constraints fail, in this section.
all vectors will succeed. The actual values can then
be copied into the result column using an intention. 5.1 Exploit Language Workbench Technology
While this looks initially pointless, such vectors are The core functional abstractions, and the design for
useful as a safety net for downstream refactorings of robustness are independent of the technologies used for
the test subject. implementing the language. However, the support for
embedding, extension and the IDE support relies on the
4.8 Mutation Testing fact that KernelF is designed to be used with language
workbenches that support modular language extension
The testing infrastructure also supports mutation testing. and embedding. Specifically, we have built it on top of
Mutation testing is about judging the quality of a test Jetbrains MPS.
suite by making ”random” changes to the test subject By deciding to rely on the capabilities of MPS, IDE
and then detecting if one or more tests in the suite fail. support comes essentially for free (a few refactorings,
If no test fails, this means that the tests are not specific such as extracting an expression into a value, have been
enough. A high quality test suite is one where for each implemented manually). Similarly, the language does not
introduced mutation at least one test fails. The changes require an elaborate type system or meta programming
performed by the mutator are extensible; currently we support to enable extension and embedding. Instead, we
support rely on the language workbench to achieve extension
• Replacement of boolean subexpressions with true and and embedding.
false 5.2 The Type System
• Negation of boolean expressions
Static Types KernelF is statically typed. This means
• Replacement of some arithmetic operations with that every type is known by the IDE (as well as the
others, e.g. + with *, - with / interpreter, or a future generator). If a user is interested
Figure 12. A couple of examples for test case generation; refer to the text for details.

Numeric Types An early version of KernelF had int


and real types, implemented as Java long and double
in the interpreter. We received feedback immediately
that doubles and longs are not suitable, and that the
implementation should be changed to use BigDecimal
and BigInteger – to get rid of the range limitations.
Further feedback from business domains led to the
need for explicitly specified ranges (most quantities
in business domains have a range) and an explicitly
specified precision (number of decimal digits). Instead
of making this an optional (project-specific) extension
of KernelF, we decided to replace the int and real
types with number[min|max]{prec}, as explained in
Figure 13. An example of mutation testing.
the introduction. The feedback from our users is very
positive.
The type system performs simple range computations,
such as those listed below.

• Number literals have a type that has a singleton range


based on their value and number of decimal digits
(e.g., 42.2 has the type number[42.2|42.2]{1}.
• Supertypes of numeric types merge the ranges (for ex-
ample, the supertype of number[5|5], number[10|20]
and number[30|50] is number[5|50]. This is an
overapproximation (i.e., simplification in the type
system implementation), because the type system
Figure 14. Highlighting of code mutations. The mu-
could know that, for example, the value 25 is not
tated code is red, the original one is grey.
allowed. However, to implement this, a number type
in the type of an expression, they can always press would have to have several ranges; we decided that
Ctrl-Shift-T to see that type. This helps with the this would be too complicated (both for users and
design goals of Simplicity and IDE Support, but the language implementor) and induce performance
also with Robustness, because more aspects of the penalties in type checking; so we decided to live with
semantics can be checked statically in the IDE. Examples the overapproximation.
include suntyping errors as well as violations of number • For arithmetic operations (currently +, -, * and
ranges. /), the type system computes the correct result
ranges; for example, if variables of type number[0|5] until it makes sense to them; no nested if isSome(...)
and number[3|8] are added, the resulting type is ... are required.
number[3|13]. Effect Tracking and Types Effect tracking, as
• A division always results in an infinite precision value; discussed in Sec. 3.1, is not implemented with the type
if a different precision is required, the prevision<>() system: an effect is not declared as part of the type
operator has to be used. Since we cannot technically signature of a function (or other construct). There are
represent infinite precision currently, we approximate two reasons for this decision. First, for various technical
it with a precision of 10. reasons of the way the MPS type system engine works,
this would be inefficient. Second, language extenders
We are making the simplifying tradeoffs consciously, and embedders would have to deal with the resulting
because, in the extreme, we would have to implement a complexity when integrating with KernelF’s type system.
type system that supports dependent types (or abstract Instead, the analysis is based on the AST structure and
interpretation of code); this is clearly out of scope. relies on implementing the IMayHaveEffect interface
Type Inference To avoid the need to explicitly specify and overriding its hasEffect correctly. While this is
types (especially the attempt types, collections and simpler for the language implementor or extender, a
number types can get long), KernelF supports type drawback of this approach is an overapproximation in
inference; this also helps with Simplicity. The types of one particular case: if you declare a function to take
all constructs are inferred, with the following exceptions: a function type that has an effect, then, even if a call
passes a function without an effect, the call will still be
• Arguments and record members always require ex- marked as having an effect:
plicit types because they are declarations without
fun f*(g: ( =>* string)) = g.exec()* // declaration
associated expressions from which the type could be f*(:noEffect) // call
inferred.
• Recursive functions require a type because our type We are working on an interprocedural data flow analysis,
system cannot figure out the type of the body if this which will solve this problem.
body contains a call to the same function.
5.3 Definition of the Semantics
If a required type is missing, an error message is an- The semantics of KernelF are given by the interpreter
notated. Users can also use an intention on nodes that that ships with the language, together with a sufficiently
have optional type declarations (functions, constants) large amount of test cases. No other formal definition
and have the IDE annotate the inferred type. of the language semantics is provided. KernelF does
No Generics KernelF does not support generics in not ship with a generator, because, in the interest
user-defined functions, another consequence of our goal of Portability, a generator would always be target
of Simplicity. However, the built-in collections are platform-specific. To align the semantics of generators
generic (users explicitly specify the element type) and with the reference semantics given by the interpreter, one
operations like map, select, or tail retain the type can simply generate the test cases to the target platform
information thanks to the type system implementation and then run them there – if all pass, the (functional)
in MPS. As a consequence of the extensibility of Ker- semantics are identical.
nelF, users can also define their own “generic” language
extensions, similar to collections. 5.4 Extension
Option and Attempt Types To support our goal We provide more details on extension and embedding
of Robustness, the type system supports option types in Sec. 7, but here is a quick overview of the typical
and attempt types. Option types are useful to explicitly approaches used for extension of KernelF.
deal with null values and force client code to deal with Abstract Concepts A few concepts act as im-
the situation where null (or none) is returned. Similarly, plicit extension points. They are defined as abstract
attempt types deal systematically with errors and force concepts or interfaces in KernelF, so that extending
the client code to handle them (or return the attempt languages can extend these concepts. They include
type its own caller). Expression itself, IDotTarget (for things on the right
We decided not to implement full support for monads; side of a dot expression), IFunctionLike (for function-
for our current use cases, this is acceptable and keeps like callable entities with arguments), IContracted (for
the implementation of the type system simpler, which things with constraints or pre-/postconditions) and Type
supports our goal of extensibility. Note that, because (as the super concept of all types used in KernelF).
many operations and operators for T also work for IToplevelExprContent is the interface implemented
opt<T>, users can defer dealing with options and errors by all declarations (records, functions, typedefs).
achieving this is to use only those language modules that
are needed; see previous paragraph. If a finer granularity
is needed the host language can use constraints to
prevent the use of particular concepts in specific contexts.
A concept whose use is constraint this way cannot be
entered by the user – it behaves exactly as if it were
removed.
Exchangeable Primitive Types One particular part
of a language that may have to be removed (or more
specifically, exchanged) is the set of primitive types.
As per what we have said in the previous paragraph,
users can decide to not user kernelF.primitiveTypes
or constrain away some of the primitive types. However,
the type system rules in the kernelF.base language
Figure 15. Dependencies between the language mod- relies on primitive types (some built-in expressions must
ules in KernelF. be typed to Boolean or integer). This means that the
types constructed in those rules types must also be
Syntactic Freedom A core ingredient to extension is exchangeable. To make this possible, KernelF internally
MPS’ flexibility regarding the concrete syntax itself. As uses a factory to construct primitive types. Using an
we show in Sec. 7.1, tables, trees, math or diagrams are extension point, the host language can contribute a
an important enabler for making KernelF rich in terms different primitive type factory, thereby completely
of the user experience. replacing the primitive types in KernelF.
5.5 Embedding Structure vs. Types The types and the underlying
Making a language embeddable is more challenging – at typing rules can be reused independent from the lan-
least with MPS – than making it extensible. We outline guage concepts. For example, if a language extension
the core approaches here: defines a its own data structures (e.g., a relational data
model), the collection types from KernelF can be used
KernelF is Modular The language itself is modular;
to represent the type of a 1:n relation. Examples are
it consists of several MPS languages that can be (re-
given in the case studies.
)used separately, as long as the dependencies shown in
Figure 15 are respected. Importantly, it is possible to Scoping Scopes are used to resolve references.
use only the basic expressions (base), or expressions Every DSL (potentially) has its own way of look-
with functional abstractions lambda. Nothing depends ing up constants, functions, records, typedefs or its
on the simpleTypes, so these can be exchanged as well own domain-specific declarations. To make the lookup
(discussed below). We briefly discuss the dependencies strategy configurable, KernelF provides an interface
(other than those to base) between the languages and IVisibleElementProvider. Host language root con-
explain why they are acceptable: cepts can implement this interface and hence control
the visibility of declarations.
• A: required because of the higher-order functions
Overriding Syntax Imagine on embeds KernelF into
(where, map) on the collections
a language that uses German keywords. In this case the
• B: path navigation usually also has 1:n paths, which concrete syntax (in particular, the keywords) of KernelF
requires collections must be adapted. MPS’ support for multiple editors for
• C: repl is a utility typically used when developing the same concepts makes this possible.
larger systems, which usually also use toplevel Extension Finally, embedding KernelF into a host
expressions; so the dependency does not hurt. language usually also requires extending KernelF. For
• D: tests are themselves top level elements; also, a example, if KernelF expression were to be used as guards
dependency on toplevel does not hurt for a test in a state machine, then a new expression EventArgRef
model. would be required to refer to the arguments of the event
that triggered the current transition; an example is the
• E: the functions in toplevel require generic function-
reference to data after the if in the following snippet:
like support from lambda
state machine Protocol {
Removing Concepts In many cases, embedding a state Waiting {
on PacketReceived(data: list<number>)
language into a host language requires the removal of if data.size > 0 -> Active
some of the concepts from the language. One way of }
state Active { ... } constructs. There are several reasons for this decisions,
...
}
the main reason being simplified analyzability: if a lan-
guage contains first-class abstractions for semantically
To this end, everything discussed in Sec. 5.4 is relevant relevant concepts, analyses are easier to build. These,
to embedding as well. in turn, enable better IDE support (helping with Sim-
plicity and making the language easier to explore for
5.6 Miscellaneous
the DSL users) and also make it easier to build genera-
Algebraic Data Types not Essential Option types tors for different platforms (Portability) Finally, in
can be seen as a special case of algebraic data types, contrast to languages that do not rely on a language
with the following definition: workbench, the use of first-class concepts does not mean
1 type option<T> is some<T> | none; that the language is sealed: new first-class concepts can
be added through language extension easily.
Similarly, attempt types could also be built with a
generic algebraic data type language. However, we 5.7 A specific example: “unpacking” options
decided against having algebraic data types in the core
In this section we provide a more detailed discussion
of the language (they might become available as an
of one particular language design decision to illustrate
extension) for two reasons. First, as we have outlined at
how user expectations and MPS tool capabilities lead
the beginning of Sec. 1, we expect domain-specific data
to the final solution. We struggled with this one for a
structures to be contributed by the host language, so
while, and this section illustrates the thought process.
sophisticated means of modeling data, of which algebraic
The example is about “unpacking” option values, i.e.,
data types are an example, are unnecessary. Second,
checking if a valuf of type option<T> contains a T and
by making attmpt and option types first class, we can
not none.
provide support for them with special syntax and type
checks (e.g., the try expression for attempt types) or The Starting Point We started with a first-class
by making an existing concept aware of them (the if concept with some, plus an expression val that would
statement wrt. option types). provide access to the optioned value if it is some and not
none. Having a first-class concept makes analyses simple
No Monads We decided to not add a generic facility
to build, because it is simple to recognize a check for
for (user-definable) monads, for two reasons. First, they
some because the language concept directly expresses it.
are probably at odds with our design goal of Simplicity:
our users will probably not be able to understand them. fun f(x: option<number>) = with some x => val + 10

More importantly, however, they make the type system


much more complicated to implement in MPS. This, in We also experimented with using a dot expression to
turn, is a problem for extensibility, because extension access the optioned value:
developers would have to deal with this complexity. fun f(x: option<number>) = with some x => x.val none 10

No Exceptions KernelF does not support exceptions.


The reason is that these are hard or expensive to This second version would not work for complex expres-
implement on some of the expected target platforms sion such as function calls, since repeating the complex
(such as generation to C); Portability would be expression before the dot is syntactically ugly and leads
compromised. Instead, attempt types and the constraints to errors if the called function has side effects. We de-
can be used for error handling. cided on the first alternative.
Not Designed for Building Abstractions KernelF Naming However, this alternative will result in a
is not optimized for building custom structural or be- problem if several with some expressions are nested
havioral abstractions. For example, it has no classes because val would be ambiguous. The name of the
and no module system. The reason for this apparent expression used to refer to the value must be changeable.
deficiency lies in the layered approach to DSL design One solution would be to define a value explicitly:
shown in Fig. 1: the DSLs in which we see KernelF used fun f(x: number, y: number) = {
ship their own domain-specific structural and behavioral val xval = with some maybe(x) => val none 10
with some maybe(y) => val + xval none 20
abstractions. More generally, if sophisticated abstrac- }
tions are needed (for example, for concurrency), these
can be added as first-class concepts through language However, this is too verbose. We came up with two
engineering in MPS (cf. Sec. 5.1). versions of an abbreviation to define names for the tested
Keyword-rich In contrast to the tradition of func- value:
tional languages, KernelF is relatively keyword-rich; fun f(x: number) = with some v = maybe(x) => v none 10
which means, it has relatively many first-class language fun f(x: number) = with some maybe(x) as v => v none 10
We preferred <expr> as <name> over <name> = <expr> not easily possible with MPS’ type system. Alternatively,
because it cannot be confused with an assignment (which the second x could be made to be a different language
we do not support in KernelF). It is also easier from the concept (which comes with a different type), but then
perspective of the user, because you can add the name one has to prevent the use of the original x in the
(syntactically and in terms of typing sequence) after then part. This would require all reference concepts
the expression the user wants to test. Finally, KernelF to be aware of the mechanism; every scoping function
already has a facility for optionally naming things with would have to call a filter method. While this makes
an as suffix. The above can then be written as: language extension a little bit harder (users have to call
fun f(x: number, y: number) = {
the filtering function), we decided that this is worth it:
with some maybe(x) as xval since one cannot do anything else inside the then part,
=> with some maybe(y) as yval => xval + yval
none 0
providing the “unpacked” value there makes sense.
none 0
}
Final Design We settled on the following syntax.
The if conforms to users’ expectations, the as avoids
To avoid nesting, we allowed comma-separated tests: confusion with assignments, and we provided the magic
fun f(x: number, y: number) = of “automatic unpacking” inside the then part.
with some maybe(x) as xval, maybe(y) as yval
fun f(x: option<number>) = if isSome(x) then x else 10
=> xval + yval none 0
fun f(x: number) = if isSome(maybe(x) as v) then v else 10

Using If Expressions The first-class concept with For multiple tested values we now use && instead of the
some turned out to be ugly, and also introduced new comma, because the && is used in logical expressions al-
keywords for something where users intuitively wanted ready as a conjunction; note that other logical operators
to use an if; so we allowed the if statement to be used, are not supported on isSme tests.
again with the same options: fun f(x: number, y: option<number>) =
if isSome(maybe(x)) as xval && isSome(y)
fun f(x: option<number>) = if isSome(x) then val else
then xval + y else 0
10
fun f(x: option<number>) = if isSome(x) then x.val else
10
fun f(x: number) = if isSome(maybe(x)) then val else 6. Evolution over Time
10
fun f(x: number) = if isSome(maybe(x) as v) then v else 6.1 Number Types
10
Initially, KernelF had been designed with the usual
A problem with using the existing if expression is that types for numbers: int and float. However, even in
users can construct arbitrarily complex expressions, such our very first customer projects it turned out that those
as the following: numberic types are really too much focussed on the need
fun f(x: option<number>) = of programmers (or even processors), and that almost
if isSome(x) || g(x) then val else 10 no business domain finds those types useful. Thus we
quickly implemented the number types as described
In this case it cannot (easily) be statically checked that
earlier. Since this happened during the first real-world
inside the then branch, x always has a value. To enforce
use, so this evolution did not involve any migration of
this, we ensure that the isSome expression is the topmost
existing, real-world models of customers, making the
expression in the if; it cannot be combined with others.
evolution process very simple.
This is trivial to check structurally and avoids the need
for advanced semantic analysis of complex expressions. 6.2 Transparent Options and Attempts
We had the idea of interpreting an option type as
Initially, option types and attempt types were more
Boolean to allow this syntax:
restricted than what has been described in this paper.
fun f(x: option<number>) = if x then val else 10 For example, if a value of option<T> is expected, users
had to explicitly construct a some(t) instead of just
However, we discarded this option because, for our target
returning t. Similarly for attempt types: users had to
audience, we think that too much type magic is too
return a success(t). Options and attempts also were
complicated. Another idea was to use the name of the
not transparent for operators. For example, the following
tested variable (if it is a simple expression) in the then
code was illegal, users first had to unpack the options
part, and type it to the content of the option. This would
to get at the actual values, which lead to hard to read
allow the following syntax:
nested if expressions.
fun f(x: option<number>) = if isSome(x) then x else 10
val something : opt<number> = 10
val noText : opt<string> = none
This is harder to implement because the type of x is now something + 10 ==> 20 <option[number[-inf|inf]{0}]>
different depending on the location in the source. This is noText.length ==> none <option[number[0|inf]{0}]>
The reasons for the initial decision to do it in the more that, for example, use custom syntax or support things
strict way were twofold. One, we thought that the more like inheritance. This extension hook has been used in
explicit syntax would make it clearer for users what several KernelF-based DSLs by now.
was going on (less magic). Instead it turned out it
was perceived as unintuitive and annoying. The second 6.5 Range Qualifiers
reason was that the original explicit version was easier A very common situation is to work with ranges of num-
to implement in terms of the type system and the bers. With the original scope of KernelF, for example,
interpreter, so we decided to go with the simpler option. one could use an alt expression to compute a value r
The migration to the current version happened after based on slices of another value t:
significant end-user code had been written, and so we val r = alt | t < 10 => A |
implemented an automatic migration where possible: | t < 10 && t < 20 => B | // or t.range[10..20]
| t > 20 => C |
all some(t) and success(t)were replaced by just t
by migration script that was automatically executed However, as our users told us, this is perceived as
once users opened the an existing model once the unintuitive. The situation gets worse once uses range
new language version was installed. The unnecessary checks as part of decision tables, where many more such
unpackings were flagged with a warning that explained conditions have to be used. Our solution to this approach
the now possible simpler version. We expected users to was to create explicit range qualifiers, so one could write
make the change manually because we were not able the following code:
to reliably detect and transform all cases, and because val r = split t | < 10 => A |
automated non-trivial changes to users’ code is often | 10..20 => B |
| > 20 => C |
not desired by users.
6.3 Enums with data Note that these are not really expressions, because, for
example in < 10, there is no argument given on which
Originally, enums, as described in Section 2.10, were the check has to be performed. That argument is implicit
available only in the traditional form, i.e., without from the context. This is why these range qualifiers can
associated values. However, it turned out that one major only be used in surrounding expressions that have built
use case for enums was to use them almost like a database specifically for use with range qualifiers. The split
table, where the structured value of one enum literal expression is an example. We decided to make this part
would refer to another enum literal (through using tuples of the core KernelF language instead of an extension
or records in as their value type): because these constructs are used regularly.
enum T<TData> {
t1 -> #TData(100, true, u1) 6.6 Enhanced Effects Tracking
t2 -> #TData(200, false, u2)
t3 -> #TData(300, true, u2) Originally, there was only one effect flag: an expression
}
either had an effect or it did not. However, when
enum U<number> { extending KernelF with mutable data, it quickly became
u1 -> 42
u2 -> 33
clear that we have to distinguish between read and
} mofify effects because, for example, a precondition or a
condition in an if is allowed to contain expression that
6.4 Records have read effects, but it is an error for them to have
write effects. Interpreting “has effect” as “has modify
According to our own design goal to keep KernelF small effect” also does not work, because, even for expressions
and simple, and in particular, the assumption that the with read effects, caching is invalid.
host language would supply all (non-primitive) data So far we have decided not to distinguish further
structures, we originally did not have records. However, between different kinds of effects (IO, for example),
it turned out that this was a bidge too far: records are because this distinction is irrelevant for our main use of
useful as temporary data structures, even if the hosting effect tracking, namely caching in the interpreter.
DSL defines the notion of a component, class or insurance
contract. Records are also useful for testing many other 7. Case Studies
language constructs.
However we did not add advanced features to records, 7.1 The Utilities Extension
such as inheritance; we reserve such features for host Context Our first case study is an extension of the
language domain-specific data types. However, the inter- core KernelF languages with more end-user friendly ways
nal implementation infrastructure for records is based of writing complex expressions: decision tables, decision
on interfaces. This way, it is very easy for extension trees and mathematical notations. Figures 16, 17 and
developers to create their own, record-like structures 18 show examples.
Figure 16. A decision tables makes a decision over two
dimensions, plus an optional default value.

P
Figure 19. The definition of the editor for the Ex-
pression essentially maps the structural members (body,
lower, upper) to predefined slots in the notational prim-
itive for math loops.

If nested if expressions would be used instead of the


table, no assumption about completeness can be made,
Figure 17. A decision tree directly captures a step-wise
and the checks could not be performed (unless the user
decision-making procedure found in many technical and
annotates the set of nested if expressions with some
scientific domains.
must be complete annotation).
Implementation Structurally, all the new language
concepts – decision tree, decision table, fraction bar,
square root symbol and sum symbol – all extend
kernelf.base.Expression so they can be used wher-
ever an expression is expected, particular, as the im-
Figure 18. The mathematical notation helps capture plementation of functions. Some concepts are wrappers
mathematical calculations in a way a domain expert around functions; for example, the content cells of de-
might write them down on paper. cision tables are instances of DectionTableContent,
which in turn contain the value expression, but also
Notations and Abstractions The abstractions used point to their respective row and column headers to
should be fairly obvious. Their natural notations are define their position in the table.
extremely helpful when building languages for non- In terms of notation, we reuse existing notational
programmers, since the same notations would be used in primitives we have developed over the years for tables,
the proverbial Word document that is often the basis for trees and mathematical symbols. Once these are avail-
capturing knowledge (informally) in non-programmer able, the definition of the concrete syntax is straightfor-
organizations. The fact that first-class logical and math- ward. Fig. 19 shows the editor definition for the sum
ematical abstractions are used has, however, additional symbol. The editors for the table and the tree are a little
benefits: for example, for decision tables their complete- bit more complicated, since they dynamically construct
ness7 and overlap-freedom can be checked. In our partic- the tree and table structures. The integration with the
ular implementation, we do this by translating the table solver is the subject of the next subsection.
to the corresponding logical formulae in the Z3 solver.
Errors are highlighted directly in the table. 7.2 Solver Integration
For a table with n rows (ri ) and m columns (cj ),
Context In this case study we take a closer look at
we detect incompleteness if the following formula is
the integration of the solver: this explains more details
satisfiable:
n,m
about the architecture of the solver integration hinted at
¬
_
(ri ∧ cj ) above, and it is also a case study in the use of KernelF
i,j=1
itself.
Working with the solver, we have found a set of
Similarly, an overlap between conditions ca be found by
recurring “questions” that one asks from the solver:
checking the following conjunctions:
are the following set of expressions complete, are they
∀i, k = 1..n, j, l = 1..m : overlap free, do they contract themselves, is one a subset
i 6= k ∧ j 6= l ⇒ ri ∧ cj ∧ rk ∧ cl of another, or are two expressions identical (while have
different structure, think deMorgan laws). Answering
7 Assuming the range of the type is defined. many of these questions requires an often initially
Figure 21. An example of natural language syntax for
Figure 20. The integration of the solver, Z3 in our case,
extension function calls: in the test case, the function
into end user-facing DSLs.
is called with a multi-word text string instead of a
positional argument list.
unintuitive encoding of the expressions in the solver
(e.g., using negations).
Notations and Abstractions To avoid users’ having
to implement such encodings over and over again, we
have developed a set of solver tasks that represent these
questions. As shown in Fig. 20, a problem that should Figure 22. Associating a natural language function call
be addressed with the solver must be translated to one syntax with an existing extension function.
or more suitable solver tasks; these are then mapped to
the solver, taking into account the unintuitive encodings.
This simplifies the use of the solver (for typical problems) the SolverTaskVarRef has been introduced as part of
to the developer of a DSL. In addition, by isolating the the Solver DSL. During the transformation from the
DSL developer from the actual solver API, it also makes end user-visible DSL (in this case, the alt Expression
the solver exchangeable without any effect on the end of KernelF itself), the references to function arguments
user-DSLs: only the solver DSL with its tasks has to get are replaced with SolverTaskVarRef. The mechanics of
a new mapping to a new solver. how this is done is outside the scope of this paper.
Consider the following simple alt expression:
7.3 Natural Language Function Calls
fun decide(a: int) = alt | a < 0 => 1 |
| a == 0 => 2 | Context The KernelF should be usable for business
| a > 0 => 3 |
DSLs where its users are non-programmers. While such
For this to be correct, the three conditions should be users can easily deal with operators such as + or &&,
complete (there should not be a value for a that is not the notion of function calls with its parentheses and
covered by any option) and it should be overlap free (for positional arguments can be hard to communicate. These
any value of a, only one option should apply). Below we users often want a more “natural language-like” syntax.
show the encoding of these two problems in the solver Notations and Abstractions We continue to use
DSL (layout changed to save space). These formulations regular functions and extension functions, but provide an
are considerably simpler than the two mathematical additional call syntax, as shown in Fig. 21. The syntax
formulae given earlier; the mapping to the solver API is associated with existing extension functions via an
takes care of the mathematical encoding. annotation, as shown in Fig. 22. In this annotation, a text
variables:
template can be specified that embeds the arguments
a: int at the respective places. The notation also extends into
relationships:
<none>
the code completion menu: when you press control space
checks: after the dot on a Car instance, you will get a proposal
completeness { a < 0, a == 0, a > 0 }
non-overlapping { a < 0, a == 0, a > 0 }
accelerate to @[to] with @[at]. Goto definition on
the function call also still works as expected.
Implementation The Solver DSL embeds the Ker- Implementation The template is attached to the
nelF expressions. To do this, the checks (complete- function definition using an annotation. Annotations are
ness, non-overlapping, etc.) have children of type an MPS feature where it is possible to attach annotation
kernelf.base.Expression, as well as a type check nodes A to other nodes N without the definition of N
that ensures them to be of Boolean type. having to be aware of it.
Note that the solver tasks must be self-contained, The annotation contains a Text node that supports
i.e., no external references are allowed. So the uses of entering arbitrary, multi-line, unstructured text. Text
the a variable in the expressions in the solver task are nodes consist of a sequence of IWord nodes. Languages
references to the a declared in the variables part, not can define their own concepts that implement IWord,
to the argument of the decide function from which the which is how the @[..] placehoders are built: the
checks are derived. This is an example of an extension NatLangFunctionArgRef concept implements IWord
required because of the embedding: a new expression, and has a reference to a FunctionArgument, scoped to
could have been to directly recursively get all ports of
type IDiagnostics (however, this would not illustrate
the use of map):
this.allports<IDiagnostics>

Implementation The Component concept owns


the connect many clauses, which, in turn, embed
kernelf.Expression. Several new expressions have
been implemented for this language:
The this expression is used to refer to the component
Figure 23. An example components-based system with in which the connect many clause lives. It is typed to be
delegating connectors. a ComponentType that in turn refers to the surrounding
Component.
the arguments of the function to which the annotation allinstances is not an expression, but an IDotTarget,
is attached. the concept that can be used on the right side of a
The implementation of the caller syntax is a little DotExpression. A can be child constraint ensures that
bit more work: ca. 100 lines of customer MPS cell it can only be used if the context (the expression on the
provider are required to render the custom cell. While, left side of the dot) is of type ComponentType:
as a downside, this requires some detailed knowledge
of how the MPS editor works, the plus side is that it 1 parentNode:DotExpression.expr.
2 type.isInstanceOf(ComponentType);
is possible at all to add this kind of natural language-
influenced function call syntax at all. Once this cell Since this expression returns the list of all instances of the
provide is available, it is embedded into the function call ComponentType specified as the argument, it is typed to
expression’s editor like any other cell. be a list of this component type: <ListType(baseType:
7.4 Components Language # allinstances.component.copy)>; This is an exam-
ple of where KernelF’s existing collections are reused as
Context Components-based software development re- the types of custom expressions; ListType is KernelF’s
lies on composing systems from reusable components regular type for lists. This way, all list operations from
with well-defined interfaces. Components expose inter- KernelF can be used on the collection of component
faces through ports which are then connected hierar- instances returned by allinstances. The expression to
chically. One problem with this approach is that cross- return all ports allports works basically the same way.
cutting functionality, such as the diagnostics shown in A type system rule verifies that the two expressions
Fig. 23, leads to a lot of connectors, some of them may of the connect many clause are typed to PortType or
even have to be delegated through many layers of com- collection<PortType>.
ponent assembly. This is tedious and error prone.
Notations and Abstractions To solve this problem, 7.5 Variability Models
some ports should be connected programmatically, i.e., 7.6 A DSL for Medical Applications
by using expressions that enumerate instances and
Context A language has been developed for spec-
ports of specific types (e.g., the client port of the
ifying medical algorithms. The main abstractions are
WheelControl instances) to connect those to other ports
components (with input/output parameters and user
(e.g., the server port of the DiagnosticsController
interactions) for realizing modularity, as well as state
instance). In our system, one can write expressions, such
machines to implement asynchronous, verifiable behav-
as the following:
ior inside the components. Expressions play a role in
component Car { many contexts such as invariants for data structures as
connect many this.allinstances<WheelController>
.map(|it.ports<IDiagnostics>|)
well as transition guards.
to DiagnosticsController.server As part of the behavior in the algorithms, interactions
// more component contents
}
with databases are required. For example, averages of the
last N measurements of certain medical parameters are
Notice the special-purpose expressions: allinstances<T> required. In addition, such parameters must be stored
returns all recursively nested component instances of in the database in the first place. The expressions to
type T, and port<P> returns all ports with port type achieve this are expressions with sideeffects; this allows
P of a given component. this represents the compo- us to illustrate how effect tracking works in KernelF.
nent in which we write the expression. map is reused The system has several record definitions to represent
from kernelF.collections. An alternative formulation system data:
the freedom given by the grammars of the respective
record Patient { id: string }
record BSLMeasurement { language.8 In addition, the languages are all not purely
measuredAt: [date,time] functional and provide no support for explicit effects
quantity : sugarLevel
} tracking. We discarded this option early and clearly.
record BlooodPressureMeasurement {
measuredAt: [date,time]
systolic : int
8.2 Other Base Languages
diastolic : int
} where systolic < 120 &&
mbeddr C mbeddr[] is an implementation of C in
diastolic < 80 && MPS. As we have shown in many publications, it is very
diastolic < systolic
extensible; it uses the same extension mechanisms as
The database is essentially a unstructured store that KernelF because it is built on MPS as well. However,
can store data for a patient. The data is tagged by the it is unsuitable as a general-purpose embeddable base
kind; we use the name of the record we want to store language for DSLs because (a) it implicitly relies on
as that kind. The special-purpose expression db-store many C abstractions such as the primitive data types
performs storage; it takes the ID of the patient as a key, and some operators and (b) it has all the non-functional
and then associates the value val with the tag derived abstractions available in C and (c) comes with its own
from the name of the record. C-specific module system.
mbeddr C is implemented in a modular way, i.e., even
ext fun storeBloodSugarValue*(this: Patient, the core of C is split into several languages. One of them,
val: BSLMeasurement)
= db-store*[this.id][BSLMeasurement => val] com.mbeddr.core.expressions, contains only the C
ext fun storeBP*(this: Patient,
expressions and primitive types. In particular, it does
val: BloodPressureMeasurement) not have user-defined data types, pointers, statements,
= db-store*[this.id][BloodPressureMeasurement => val] or a module system. The idea was to make this a kind of
core expression language to be hosted in other DSL. In
The db-store operation obviously has a sideeffect –
practice, this works well as long as that DSL generates
storing the data in the database. To tell this to the
to C. However, even in this core language subset, there
type system, the db-store language concept implements
are many implicit assumptions about C, making it
IMayHaveEffect. Transitively, the two functions shown
unsuitable as a generic, embeddable expression language;
above also have an effect, as shown by the asterisk behind
building an interpreter is also tough. It also misses many
their names. If a function calls one of these, that function
useful features, such as higher-order functions.
also gets an effect (the cur-expressions are keyword
When we started seeing the need for a core expression
expressions).
language, we thought about generalizing the mbeddr
fun userEnteredBSLMeasurement*(value: int) expressions; however, we decided against it and started
= cur-patient.storeBloodSugarValue*(
#BSLMeasurement{[cur-date,cur-time], value}) KernelF: the required changes would have been too great,
making mbeddr C too complicated. The use cases are
Finally, if functions with sideeffects are called from a just too different.
lambda, this lambda also is marked with a sideeffect: MPS BaseLanguage MPS ships with a language
fun valueSubmitted*(v: int, action: ( int =>* boolean)) { called BaseLanguage – it wears its purpose clearly on its
action.exec(v)*
log("value submitted: " + v)
sleeve. It is fundamentally a slightly extended version
} of Java (for example, it had higher order functions and
fun mainLoop*() {
closures long before they were standardized as part of
..run UI code.. Java 8). It also ships with a set of (modular) extensions
valueSubmitted*(10,
|v: int => userEnteredBSMeasurement*(v)|* )
for meta programming, suppliying language constructs,
} to, for example, create, navigate and query ASTs.
BaseLanguage has been used successfully – by us and
others – as the basis for DSLs. If those DSLs either
8. Related Work extend Java or at least generate to Java, BaseLanguage
8.1 Dynamic Languages is a great fit and the recommended way to go. Even
A widespread approach for building embedded DSLs is though it is not built in a modular way, MPS’ support
the use of dynamic languages that support reflection and for restricting languages (using constraints) is powerful
flexible syntax. Prime examples are Groovy and Ruby. enough to cut it down to a subset that is relevant in any
However, the approach is not suitable for our purposes, particular DSL.
for several reasons. First, the implementation based on 8 Both of these points are clearly illustrated by a customer’s (not
reflection prevents static analysis and (automatic) IDE very satisfying) attempt at building a whole range of business
support. Second, the syntax of extensions is limited to DSLs with Groovy.
However, similar to mbeddr C, it suffers from its tight it. Thus, adopting this syntactical style was out of the
connection to Java in terms of data types, operators and question.
assumptions about the context in which expressions are Second, existing Lisp implementations are parser-
used. The fact that it is not a purely functional language based, and even the meta-programming facilities rely
and does not support effects tracking also makes it much on integrated parsing through macros. This limits the
harder to analyze. It also has several features, such as syntactic freedom to textual notations in general, and
generics, that make it harder to extend. Finally, its long to the capabilities of the macro system more specifically.
evolution in MPS also means that it carries around a We needed more flexibility, as shown throughout this
lot of baggage; we decided that it is worth the effort to paper.
build a new, clean base language. Third, we wanted language extensions to be first-class:
Xbase/Xtend Xbase is a functional language that instead of defining them through meta programming, we
ships with Xtext. Similar to KernelF, its purpose is to be wanted the power of a language workbench. Of course
extended and embedded in the context of DSLs. Xtend we could have implemented (a version of) Lisp im MPS
is a full programming language (with classes, modules and then used MPS’ extension mechanisms to build first-
and effects) that embeds Xbase expressions. Similar to class extensions. However, then we would not make use
Kotlin and Ceylon, its goal is to be a better, cleaned up of Lisp’s inherent extensibility, while still getting the
Java, while not being as sophisticated/complex as Scala. end-user-unsuitable syntactic style – clearly not a good
For the purposes of being an embeddable base language, tradeoff.
Xtend’s scope is too big (like Java or C), so we limit our Finally, Lisp language extensions only extend the
discussion in this paragraph to Xbase. language, not the IDE. However, for our use cases, the
In terms of its suitability as a base language, Xbase IDE is just as important as the language itself, so any
suffers from several problems. The most obvious one for language extension or embedding must also be known
our use case is that it is implemented in Xtext, and is to the IDE. Lisp does not support this (at least not out
thus useless for MPS-based languages. Of course, this of the box).
does not say anything about its conceptual suitability as
8.4 Embeddable Languages
a core language. However, there are also two siginificant
conceptual problems. First, because of the fact that Lua is a small, extensible and embeddable language. In
it is implemented in Xtext, its support for modular contrast to KernelF, it is not functional – it has effects
extension or embedding are limited: one cannot use and statements. Also, the notion of extension relates to
several independently developed extensions in the same extending the C-based runtime system, not the frontened
program in a modular way. Consequently, no such syntax. So, out of the box, Lua would not have been an
extensions are known to us, or documented in the alternative to the development of KernelF.
literature. Second, Xbase is very tightly coupled to Java: However, we could have reimplemented Lua in MPS
it uses Java classes, generates to Java and even its IDE and used MPS’ language engineering facilities for syn-
support is realized by maintaining Java shadow models tactic extension. While possible, this would still mean
in the background. While this is a great benefit for Java- that we would use a procedural language as opposed
based languages (the goal of Xbase), it is a drawback if to a functional one, which was at odds with our design
that dependency is not desired. goals.
In terms of its core abstractions, many of the ideas On the plus side is Lua’s small and efficient runtime
of Xbase are similar: everything is an expression, func- system. While we did not perform any comparisons,
tional abstractions, no modules or statements (those are it is certainly faster than our MPS-integrated AST
supplied by Xtend). interpreter. However, performance considerations are not
a core requirement for the IDE-integrated interpreter. If
fast execution is required, we would generate to Java or
8.3 Lisp-Style Languages C, or implement an optimized interpreter in C.
Lisp-style languages have a long tradition of being
extensible with new constructs and being used at the 9. Conclusion
core of other systems, such as Emacs. Racket takes this We have built KernelF as a base language. This means
to an extreme and allows significant syntactical flexibility that it must be extensible (so new, domain-specific lan-
for Lisp itself. We decided against this style of languge guage constructs can be added), embeddable (so it can
for several reasons: be used as part of a variety of host languages) and things
First, while, generally, it is a matter of taste (and of users do not need must be removable or replaceable. Our
getting used to it) whether developes like or hate the case studies show that we have resoundingly achieved
syntax, it is very clear that (our) end users do not like this goal. Since developing KernelF, we have used it
in all customer projects that required expressions or a
full-blown programming language.
Why were we successful? Two factors contribute. One
is that we have built KernelF after years and years of
building DSLs. So we had a pretty good understanding
of the features required for the language, and to make
it extensible and embeddable. In particular, the design
that enables extensiblity was based on our experience
with mbeddr C, which has proven to be extensible as
well. We also had a good understanding of what features
not to include, because they are typically contributed
by the hosting DSL. The second factor is MPS itself.
As we have analyzed in [? ], MPS supports this kind
of modular language engineering extremely well; it has
literally been designed for this purpose. In conjunction
with the team’s experience, the leads to an extremely
powerful tool.
Future Work Can a general monad system be built so
that extension developers don’t have to care (much)? A
generic logging system for expectations and constraints,
PPC?

References
[1] J. S. Foster, M. Fähndrich, and A. Aiken. A theory of
type qualifiers. ACM SIGPLAN Notices, 34(5):192–203,
1999.
[2] U. Shankar, K. Talwar, J. S. Foster, and D. Wagner.
Detecting format string vulnerabilities with type qualifiers.
In USENIX Security Symposium, pages 201–220, 2001.

You might also like