Cyclone Users Manual Version 0 1 3
Cyclone Users Manual Version 0 1 3
User’s Manual
16 November 2001
1
Contents
1 Introduction 5
1.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Pointers 36
4 Tagged Unions 43
4.1 tunion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 xtunion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5 Pattern Matching 51
5.1 Let Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Pattern Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 Switch Statements . . . . . . . . . . . . . . . . . . . . . . . . . 56
6 Type Inference 62
7 Polymorphism 65
2
8 Memory Management Via Regions 65
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.3 Common Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.4 Type-Checking Regions . . . . . . . . . . . . . . . . . . . . . 73
8.4.1 Region Names . . . . . . . . . . . . . . . . . . . . . . . 74
8.4.2 Capabilities . . . . . . . . . . . . . . . . . . . . . . . . 75
8.4.3 Assignment and Outlives . . . . . . . . . . . . . . . . 75
8.4.4 Type Declarations . . . . . . . . . . . . . . . . . . . . . 76
8.4.5 Function Calls . . . . . . . . . . . . . . . . . . . . . . . 77
8.4.6 Explicit and Default Effects . . . . . . . . . . . . . . . 77
9 Namespaces 78
10 Varargs 80
C Libraries 123
C.1 C Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.2 <array.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.3 <bitvec.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
C.4 <buffer.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.5 <core.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
C.6 <dict.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
C.7 <filename.h> . . . . . . . . . . . . . . . . . . . . . . . . . . 137
C.8 <fn.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
C.9 <hashtable.h> . . . . . . . . . . . . . . . . . . . . . . . . . 139
C.10 <list.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
C.11 <pp.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
C.12 <queue.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
C.13 <rope.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
C.14 <set.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
C.15 <slowdict.h> . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3
C.16 <xarray.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
D Grammar 164
F Tools 176
F.1 The compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
F.2 The lexer generator . . . . . . . . . . . . . . . . . . . . . . . . 178
F.3 The parser generator . . . . . . . . . . . . . . . . . . . . . . . 178
F.4 The allocation profiler, aprof . . . . . . . . . . . . . . . . . . 178
4
1 Introduction
Cyclone is a language for C programmers who want to write secure, ro-
bust programs. It’s a dialect of C designed to be safe: free of crashes, buffer
overflows, format string attacks, and so on. Careful C programmers can
produce safe C programs, but, in practice, many C programs are unsafe.
Our goal is to make all Cyclone programs safe, regardless of how care-
fully they were written. All Cyclone programs must pass a combination
of compile-time, link-time, and run-time checks designed to ensure safety.
There are other safe programming languages, including Java, ML, and
Scheme. Cyclone is novel because its syntax, types, and semantics are
based closely on C. This makes it easier to interface Cyclone with legacy
C code, or port C programs to Cyclone. And writing a new program in
Cyclone “feels” like programming in C: Cyclone tries to give program-
mers the same control over data representations, memory management,
and performance that C has.
Cyclone’s combination of performance, control, and safety make it a
good language for writing systems and security software. Writing such
software in Cyclone will, in turn, motivate new research into safe, low-
level languages. For instance, originally, all heap-allocated data in Cyclone
were reclaimed via a conservative garbage collector. Though the garbage
collector ensures safety by preventing programs from accessing deallo-
cated objects, it also kept Cyclone from being used in latency-critical or
space-sensitive applications such as network protocols or device drivers.
To address this shortcoming, we have added a region-based memory man-
agement system based on the work of Tofte and Talpin. The region-based
memory manager allows you some real-time control over memory man-
agement and can significantly reduce space overheads when compared to
a conventional garbage collector. Furthermore, the region type system en-
sures the same safety properties as a collector: objects cannot be accessed
outside of their lifetimes.
This manual is meant to provide an informal introduction to Cyclone.
We have tried to write the manual from the perspective of a C programmer
who wishes either to port code from C to Cyclone, or develop a new sys-
tem using Cyclone. Therefore, we assume a fairly complete understanding
of C.
Obviously, Cyclone is a work in progress and we expect to make sub-
stantial changes to the design and implementation. Your feedback (and
5
patience) is greatly appreciated.
1.1 Acknowledgements
The people involved in the development of Cyclone are at Cornell and
AT&T. Dan Grossman, Trevor Jim, and Greg Morrisett worked out the
initial design and implementation, basing the language to some degree on
Popcorn, a safe-C-like language that was developed at Cornell as part of
the Typed Assembly Language (TAL) project. Mathieu Baudet contributed
the bulk of the code for the link-checker. Matthew Harris did much of the
hard work needed to wrap and import the necessary libraries. Yanling
Wang ported bison to Cyclone. All of these people have also contributed
by finding and fixing various bugs. A number of other people have also
helped to find bugs and/or contributed key design ideas including James
Cheney, Fred Smith, Nathan Lutchansky, Jeff Vinocur, and David Walker.
#include <stdio.h>
int main() {
printf("hello, world\n");
return 0;
}
6
It looks rather like a C program—in fact, a C compiler will happily
compile it. The program uses #include to tell the preprocessor to import
some standard definitions, it defines a distinguished function main that
serves as the entry point of the program, and it uses the familiar printf
function to handle the printing; all of this is just as in C.
To compile the program, put it into a file hello.cyc, and run the
command
cyclone hello.cyc -o hello
This tells the Cyclone compiler (cyclone) to compile the file hello.cyc;
the -o flag tells the compiler to leave the executable output in the file
hello (or, in Windows, hello.exe). If all goes well you can execute
the program by typing
hello
and it will print
hello, world
It’s interesting to compare our program with a version that omits the
return statement:
#include <stdio.h>
int main() {
printf("hello, world\n");
}
A C compiler will compile and run this version. However, it’s not valid
Cyclone code: it will be rejected by the Cyclone compiler. Cyclone requires
a definite return: any function with a return type other than void must
explicitly return a value of the correct type. Since main is declared with
return type int, Cyclone requires that it explicitly return an integer.
Definite return reflects Cyclone’s concern with safety. The caller of
the function expects to receive a value of the return type; if the function
does not execute a return statement, the caller will receive some incor-
rect value instead. If the returned value is supposed to be a pointer, the
caller might try to dereference it, and dereferencing an arbitrary address
can cause the program to crash. So, Cyclone requires a return statement
(even if the return type is not a pointer type).
7
2.2 Pointers
Programs that use pointers properly in C can be both fast and elegant.
But when pointers are used improperly in C, they cause core dumps and
buffer overflows. To prevent this, Cyclone introduces different kinds of
pointers and either (a) puts some restrictions on how you can use pointers
of a given kind or (b) places no restrictions but may insert additional run-
time checks.
Nullable Pointers
The first kind of pointer is indicated with a *, as in C. For example, if we
declare
int x = 3;
int *y = &x;
*y = *y + 1;
8
Cyclone prevents this by inserting a null check whenever you deref-
erence a * pointer (that is, whenever you use the *, ->, or subscript
operation on a pointer.)
Fat Pointers
If you need to do pointer arithmetic in Cyclone, you must use a second
kind of pointer, called a fat pointer and indicated by ? (the question mark).
For example, here is a program that echoes its command-line arguments:
#include <stdio.h>
Except for the declaration of argv, which holds the command-line ar-
guments, the program looks just like you would write it in C: pointer arith-
metic (argv++) is used to move argv to point to each argument in turn,
so it can be printed.
In C, argv would typically be declared with type char **, a pointer
to a pointer to a character, which is thought of as an array of an array
of characters. In Cyclone, argv is instead declared with type char ??,
9
which is thought of in the same way: it is a (fat) pointer to a (fat) pointer
to characters. The difference between a * pointer and a ? pointer is that
a ? pointer comes with bounds information and is thus “fatter” than a
traditional pointer. Each time a fat pointer is dereferenced or its contents
are assigned to, Cyclone inserts not only a null check but a bounds check.
This guarantees that a ? pointer can never cause a buffer overflow.
Because of the bounds information contained in ? pointers, argc is
superfluous: you can get the size of argv by writing argv.size. We’ve
kept argc as an argument of main for backwards compatibility.
It’s worth remarking that you can always cast a * pointer to a ? pointer
(and vice-versa). So, it is possible to do pointer arithmetic on a value of
type *, but only when you insert the appropriate casts to convert from
one pointer type to another. Note that some of these casts can fail at run-
time. For instance, if you try to cast a fat pointer that points to an empty
sequence of characters to char *, then the cast will fail since the sequence
doesn’t contain at least one character.
Never-null pointers
There is one other kind of pointer in Cyclone: the never-null pointer. A
never-null pointer is indicated by @ (the at sign). An @ pointer is like a
* pointer, except that it is guaranteed not to be NULL. This means that
when you dereference an @ pointer or assign to its contents, a null check is
unnecessary.
@ pointers are useful in Cyclone both for efficiency and as documenta-
tion. This can be seen at work in the standard library, where many func-
tions take @ pointers as arguments, or return @ pointers as results. For
example, the getc function that reads a character from a file is declared,
int getc(FILE @);
This says that getc expects to be called with a non-null pointer to a FILE.
Cyclone guarantees that, in fact, when the getc function is entered, its
argument is not null. This means that getc does not have to test whether
it is null, or decide what to do if it is in fact NULL.
In C, the argument of getc is declared to have type FILE *, and pro-
grammers can call getc with NULL. So for safety, C’s getc ought to
check for NULL. In practice, many C implementations omit the check;
getc(NULL) is an easy way to crash a C program.
10
In Cyclone, you can still call getc with a possibly-null FILE pointer
(a FILE *). However, Cyclone insists that you insert a check before the
actual call:
FILE *f = fopen("/etc/passwd","r");
int c = getc((FILE @)f);
Initializing Pointers
Pointers must be initialized before they are used to ensure that random
stack garbage does not get used as a pointer. This requirement goes for
variables that have pointer type, as well for arrays, elements of arrays,
and for fields in structures. Conversely, data that does not have pointer
type need not be initialized before it is used, since doing so cannot result
in a violation of safety. This decision adheres to the philosophy of C, but
diverges from that of traditional type-safe languages like Java and ML.
11
length information and when you write “int *” this is just short-hand
for “int *{1}”.
We explain pointers in more detail in Section 3.
2.3 Regions
Another potential way to crash a program or violate security is to deref-
erence a dangling pointer—a pointer to storage that has been deallocated.
These are particularly insidious bugs because the error might not manifest
itself immediately. For example, consider the following C code:
struct Point {int x; int y;};
void bar() {
struct Point *p = newPoint(1,2);
foo(p);
}
The code has an obvious bug: the function newPoint returns a pointer to
a locally-defined variable (result), even though the storage for that vari-
able is deallocated upon exit from the function. That storage may be re-
used (e.g., by a subsequent procedure call) leading to subtle bugs or secu-
rity problems. For instance, in the code above, after bar calls newPoint,
the storage for the point is re-used to store information for the activation
record of the call to foo. This includes a copy of the pointer p and the
return address of foo. Therefore, it may be that p->y actually points to
the return address of foo. The assignment of the integer 1234 to that loca-
tion could then result in foo “returning” to an arbitrary hunk of code in
memory. Nevertheless, the C type-checker readily admits the code.
12
In Cyclone, this code would be rejected by the type-checker to avoid
the kind of problems mentioned above. The reason the code is rejected is
that Cyclone tracks the lifetime of every object and ensures that a pointer
to an object can only be dereferenced if that object has not been deallo-
cated.
The way that Cyclone achieves this is by assigning each object a sym-
bolic region that corresponds to the lexical block in which the object is
declared, and each pointer type reflects the region into which a pointer
points. For instance, the variable result lives within a region that corre-
sponds to the invocation of the function newPoint. We write the name of
the region explicitly using a back-quote as in ‘newPoint.
Because result lives in region ‘newPoint, the expression &result
is a pointer into region ‘newPoint. If we like, we can write the type of
&result with the explicit region as “struct Point * ‘newPoint”.
Note that the region name comes after the * (or ? or @).
When control flow exits a block, the storage (i.e., the region) for that
block is deallocated. Cyclone keeps track of the set of regions that are
allocated and deallocated at every control-flow point and ensures that you
only dereference pointers to allocated regions. For example, consider the
following fragment of (bad) Cyclone code:
1 int f() {
2 int x = 0;
3 int *‘f y = &x;
4 L:{ int a = 0;
5 y = &a;
6 }
7 return *y;
8 }
In the function f above, the variables x and y live within the region ‘f be-
cause they are declared in the outermost block of the function. The storage
for those variables will live as long as the invocation of the function. Note
that since y is a pointer to x, the type of y is int * ‘f reflecting that y
points into region ‘f.
The variable a does not live in region ‘f because it is declared in an in-
ner block, which we have labeled with L. The storage for the inner block L
may be deallocated upon exit of the block. To be more precise, the storage
13
for a is deallocated at line 7 in the code. Thus, it is an error to try to access
this storage in the rest of the computation, as is done on line 7.
Cyclone detects the error because it gives the expression &a the type
int * ‘L reflecting the fact that the value is a pointer into region ‘L.
So, the assignment y = &a fails to type-check because y expects to hold
a pointer into region ‘f, not region ‘L. The restriction, compared to C, is
that a pointer’s type indicates one region instead of all regions.
Region Inference
As we will see, Cyclone often figures out the region of a pointer without
the programmer providing the information. This is called region inference.
For instance, we can re-write the function f above without any region an-
notations, and without labelling the blocks:
1 int f() {
2 int x = 0;
3 int *y = &x;
4 { int a = 0;
5 y = &a;
6 }
7 return *y;
8 }
and Cyclone can still figure out that y is a pointer into region ‘f, and &a
is a pointer into a different (now anonymous) region, so the code should
be rejected.
As we will show below, occasionally you will need to put explicit re-
gion annotations into the code to convince the type-checker that some-
thing points into a particular region, or that two things point into the same
region. In addition, it is sometimes useful to put in the region annotations
for documentation purposes, or to make type errors a little less cryptic.
You need to understand at least four more details about regions to be an
effective Cyclone programmer: the heap region, dynamic regions, region
polymorphism, and default region annotations for function parameters.
The following sections give a brief overview of these details.
14
The Heap Region
There is a special region for the heap, written ‘H, that holds all of the
storage for top-level variables, and for data allocated via new or malloc.
For instance, if we write the following declarations at the top-level:
then Cyclone figures out that ptr points into the heap region. To reflect
this explicitly, we can put the region in the type of ptr if we like:
Dynamic Regions
Storage on the stack is implicitly allocated and recycled when you enter
and leave a block. Storage in the heap is explicitly allocated via new or
malloc, but there is no support in Cyclone for explicitly freeing an ob-
ject in the heap. The reason is that Cyclone cannot accurately track the
lifetimes of individual objects within the heap, so it can’t be sure whether
dereferencing a pointer into the heap would cause problems. Instead, a
conservative garbage collector reclaims the data allocated in the heap.
15
Using a garbage collector to recycle memory is the right thing to do for
most applications. For instance, the Cyclone compiler uses heap-allocated
data and relies upon the collector to recycle most objects it creates when
compiling a program. But a garbage collector can introduce pauses in
the program, and as a general purpose memory manager, might not be
as space- or time-efficient as routines tailored to an application.
To address these applications, Cyclone provides support for dynamic
regions. A dynamic region is similar to the region associated with a code
block. In particular, when you execute:
region<‘r> h {
...
}
this declares a new region ‘r along with a region handle h. The handle can
be used for dynamically allocating objects within the region. All of the
storage for the region is deallocated at the point of the closing brace. Un-
like block regions, the number (and size) of objects that you allocate into
the region is not fixed at compile time. In this respect, dynamic regions are
more like the heap. You can use the rnew(h) and rmalloc(h,...) op-
erations to allocate objects within a dynamic region, where h is the handle
for the region.
For instance, the following code takes an integer n, creates a new dy-
namic region and allocates an array of size n within the region using rnew.
int k(int n) {
int result;
region<‘r> h {
int ?arr = rnew(h) {for i < n : i};
result = process(h, arr);
}
return result;
}
It then passes the handle for the region and the array to some processing
function. Note that the processing function is free to allocate objects into
the region ‘r using the supplied handle. After processing the array, we
exit the region which deallocates the array, and then return the calculated
result.
16
It is worth remarking that the heap is really just a dynamic region with
global scope, and you can use the global variable heap_region as a han-
dle on the heap. Indeed, new and malloc(...) are just abbreviations for
rnew(heap_region) and rmalloc(heap_region,...) respectively.
Region Polymorphism
Another key concept you need to understand about regions is called re-
gion polymorphism. This is just a fancy way of saying that you can write
functions in Cyclone that don’t care which specific region a given object
lives in, as long as it’s still alive. For example, the function foo from the
beginning of this section is a region-polymorphic function. To make this
clear, let us re-write the function making the regions explicit:
void g() {
struct Point p = {0,1};
struct Point *‘g ptr1 = &p;
struct Point *‘H ptr2 = new Point{2,3};
foo(ptr1);
foo(ptr2);
}
Note that in the first call to foo, we are passing a pointer into region ‘g,
and in the second call to foo, we are passing in a pointer into the heap. In
the first call, ‘r is implicitly instantiated with ‘g and in the second call,
with ‘H.
Cyclone automatically inserts region parameters for function arguments,
so you rarely have to write them. For instance, foo can be written simply
as:
17
void foo(struct Point * p) {
p->y = 1234;
return;
}
then Cyclone fills in the region parameters for you by assuming that the
points p1 and p2 can live in any two regions ‘r1 and ‘r2. To make this
explicit, we would write:
Now we can call h with pointers into any two regions, or even two point-
ers into the same region. This is because the code is type-correct for all
regions ‘r1 and ‘r2
Occasionally, you will have to put region parameters in explicitly. This
happens when you need to assert that two pointers point into the same
region. Consider for instance the following function:
Cyclone will reject the code because it assumes that in general, p1 and
p2 might point into different regions. That is, Cyclone fills in the missing
regions as follows:
18
Now it is clear that the assignment does not type-check because the types
of p1 and p2 differ. In other words, ‘r1 and ‘r2 might be instantiated
with different regions, in which case the code would be incorrect. But you
can make them the same by putting in the same explicit region for each
pointer. Thus, the following code does type-check:
void j(struct Point *‘r1 p1, struct Point *‘r1 p2) {
p1 = p2;
}
So, Cyclone assumes that each pointer argument to a function is in a
(potentially) different region unless you specify otherwise. The reason we
chose this as the default is that (a) it is often the right choice for code, (b)
it is the most general type in the sense that if it does work out, clients will
have the most lattitude in passing arguments from different regions or the
same region to the function.
What about the results? Here, there is no good answer because the
region of the result of a function cannot be easily determined without
looking at the body of the function, which defeats separate compilation of
function definitions from their prototypes. Therefore, we have arbitrarily
chosen the heap as the default region for function results. Consequently,
the following code:
struct Point * good_newPoint(int x,int y) {
return new Point{x,y};
}
type-checks since the new operator returns a pointer to the heap, and the
default region for the return type is the heap.
This explains why the original bad code for allocating a new point does
not type-check:
struct Point *newPoint(int x,int y) {
struct Point result = {x,y};
return &result;
}
The value &result is a pointer into region ‘newPoint but the result type
of the function needs to be a pointer into the heap (region ‘H).
If you want to return a pointer that is not in the heap region, then you
need to put the region in explicitly. For instance, the following code:
19
int * id(int *x) {
return x;
}
will not type-check immediately. To see why, let us rewrite the code with
the default region annotations filled in. The argument is assumed to be
in a region ‘r, and the result is assumed to be in the heap, so the fully
elaborated code is:
int *‘H id(int *‘r x) {
return x;
}
Now the type-error is manifest. To fix the code, we must put in explicit
regions to connect the argument type with the result type. For instance,
we might write:
int *‘r id(int *‘r x) {
return x;
}
Region Summary
In summary, each pointer in Cyclone points into a given region and this
region is reflected in the type of the pointer. Cyclone won’t let you deref-
erence a pointer into a deallocated region. The lexical blocks declared in
functions correspond to one type of region, and simply declaring a vari-
able within that block allocates storage within the region. The storage is
deallocated upon exit of the block. Dynamic regions are similar, except
that a dynamic number of objects can be allocated within the region using
the region’s handle. The heap is a special region that is garbage collected.
Region polymorphism makes it possible to omit many region annota-
tions on types. Cyclone assumes that pointers passed to functions may live
in distinct regions, and assumes that result pointers are in the heap. These
assumptions are not perfect, but (a) programmers can fix the assumptions
by providing explicit region annotations, (b) it permits Cyclone files to be
separately compiled.
The region-based type system of Cyclone is perhaps the most compli-
cated aspect of the language. In large part, this is because memory man-
agement is a difficult and tricky business. We have attempted to make
20
stack allocation and region polymorphic functions simple to use without
sacrificing programmer control over the lifetimes of objects and without
having to resort to garbage collection.
For more information about regions, see Section 8.
21
This declares a new type, tunion t, that can hold either an integer or a
string (remember, a string is a char ? in Cyclone). Integer and String
are tags for the two possibilities. The tags are used to build values of type
tunion t, as in the declarations of x and y.
Pattern matching is used to determine the tag of a value of type tunion
t, and to extract the underlying value. For example, here is a function that
will print either an integer or a string:
void print(tunion t a) {
switch (a) {
case &Integer(i): printf("%d",i); return;
case &String(s): printf("%s",s); return;
}
}
The argument a has type tunion t, so it is either built with tag Integer
or tag String. Cyclone extends switch statements with patterns that dis-
tinguish between the cases. The first case,
case &Integer(i): printf("%d",i); return;
contains a pattern, &Integer(i), that will only match values that have
been built with the Integer tag. The variable i is bound to the under-
lying integer, and it can be used in the body of the case. For example,
print(x) will print 3, since x was initialized by new Integer(3), and
print(y) will print hello, world.
The cases of a tunion can carry any number of values, including none,
and they can be recursive. For example, we can define a tree datatype as
follows.
tunion tree {
Empty;
Leaf(int);
Node(tunion tree, tunion tree);
};
A tree can be empty, or it can be a single (leaf) node holding an integer,
or it can be an internal node with a left and a right subtree. In other
words, tunion tree is the type of possibly empty binary trees with in-
teger leaves.
Here’s a function, sum, that calculates the sum of the leaves of a tree:
22
int sum(tunion tree x) {
switch (x) {
case Empty: return 0;
case &Leaf(i): return i;
case &Node(y,z): return sum(y)+sum(z);
}
}
It’s written in a straightforward way, with a case for each possible tag
in the type tunion tree. The Empty case is noticeably different than
the other two cases: the pattern does not use the & character. The reason
has to do with how tunion is implemented. Every value of tunion type
must have the same size; for example, the Node case recursively calls sum
on the subtrees y and z, without knowing whether they are empty, leaves,
or internal nodes. The only way that it can extract y and z from x without
knowing this is if all possible cases of tunion tree have the same size.
At the same time, each tag of a tunion can carry a different number
of values, so obviously each can require a different amount of space. To
make it all work, the value-carrying cases of a tunion are represented
as pointers to structures containing a distinguishing integer plus the val-
ues, and the non-value-carrying cases of a tunion are represented just as
distinguishing integers. Since integers and pointers have the same size in
Cyclone, this achieves the goal.
The data representation is reflected both in how tunion values are
constructed and in the patterns used to take them apart. Value-carrying
cases are built using the new keyword, which performs a heap allocation
and results in a pointer to the new storage. Non-value-carrying cases don’t
require any allocation, and so they don’t use new. For example,
builds a tree consisting of an internal node with an empty left subtree, and
a right subtree consisting of a single leaf, 5. We use new for the value-
carrying cases, Node and Leaf, but not for Empty.
In pattern matching, we use the & character to match a pointer. So in
the function sum, since Leaf and Node are constructed as pointers, the &
is required to match them. Since Empty is not built as a pointer, the & must
not appear.
23
You might be wondering, “how does Cyclone tell whether a tunion
comes from a value-carrying case or a non-value-carrying case?” In partic-
ular, how can Cyclone tell the integers used for non-value-carrying cases
apart from the pointers used for the other cases? Here’s how we do it in
our current implementation: We reserve a space in the low part of mem-
ory where we will never allocate Cyclone objects using new. If a value of
a tunion is an address in this space, then it represents a tag without val-
ues, and if it is an address outside of this space, it represents a pointer to a
structure containing a tag plus the values that it carries.
You can find out more about patterns in Section 5; for more about
tunion and memory management, see Section 8.
2.5 Exceptions
So far we’ve glossed over what happens when you try to dereference a null
pointer, or assign to an out-of-bounds ? pointer. We’ve said that Cyclone
inserts checks to make sure the operation is safe, but what if the checks
fail? For safety, it would be sufficient to halt the program and print an error
message—a big improvement over a core dump, or, worse, a program with
corrupted data that keeps running.
In fact, Cyclone does something a bit more general than halting with an
error message: it throws an exception. The advantage of exceptions is that
they can be caught by the programmer, who can then take corrective action
and perhaps continue with the program. If the exception is not caught, the
program halts and prints an error message. Consider our earlier example:
FILE *f = fopen("/etc/passwd","r");
int c = getc((FILE @)f);
Suppose that there is no file /etc/passwd; then fopen will return NULL,
and when f is cast to FILE @, the implied null check will fail. The pro-
gram will halt with an error message,
Uncaught exception Null_Exception
Null_Exception is one of a handful of standard exceptions used in Cy-
clone. Each exception is like a case of a tunion: it can carry along some
values with it. For example, the standard exception InvalidArg carries a
string. Exceptions can be handled in try-catch statements, using pattern
matching:
24
FILE *f = fopen("/etc/passwd","r");
int c;
try {
c = getc((FILE @)f);
}
catch {
case Null_Exception:
printf("Error: can’t open /etc/passwd\n");
exit(1);
case &InvalidArg(s):
printf("Error: InvalidArg(%s)\n",s);
exit(1);
}
Here we’ve “wrapped” the call to getc in a try-catch statement. If f
isn’t NULL and the getc succeeds, then execution just continues, ignoring
the catch. But if f is NULL, then the null check will fail and the exception
Null_Exception will be thrown; execution immediately continues with
the catch (the call to getc never happens). In the catch, the thrown
exception is pattern matched against the cases. Since the thrown exception
is Null_Exception, the first case is executed here.
There is one important difference between an exception and a case of a
tunion: with tunion, all of the cases have to be declared at once, while a
new exception can be declared at any time. So, exceptions are an extensible
tunion, or xtunion. Here’s how to declare a new exception:
xtunion exn {
My_Exception(char ?);
};
The type xtunion exn is the type of exceptions, and this declaration in-
troduces a new case for the xtunion exn type: My_Exception, which
carries a single value (a string). Exception values are created just like
tunion values—using new for value-carrying tags only—and are thrown
with a throw statement. For example,
throw new My_Exception("some kind of error");
or
throw Null_Exception;
25
2.6 Additional Features of Cyclone
Thus far, we have mentioned a number of advanced features of Cyclone
that provide facilities needed to avoid common bugs or security holes in
C. But there are many other features in Cyclone that are aimed at making
it easier to write code, ranging from convenient expression forms, to ad-
vanced typing constructs. For instance, like GCC and C99, Cyclone allows
you declare variables just about anywhere, instead of at the top of a block.
As another example, like Java, Cyclone lets you declare variables within
the initializer of a for-statement.
In addition, Cyclone adds advanced typing support in the form of (a)
parametric polymorphism, (b) structural subtyping, (c) some unification-
based, local-type inference. These features are necessary to type-check or
port a number of (potentially) unsafe C idioms, usually involving “void*”
or the like. Similarly, tunion types can be used to code around many of
the uses for C’s union types – another potential source of unsoundness.
In what follows, we give a brief overview of these added features.
26
for (int x=0; x < n; x++) {
...
}
2.8 Tuples
Tuples are like lightweight structs. They need not be declared in advance,
and have member or field names that are implicitly 0, 1, 2, 3, etc. For
example, the following code declares x to be a 3-tuple of an integer, a
character, and a boolean, initialized with the values 42, ’z’, and true
respectively. It then checks to see whether the third component in the
tuple is true (it is) and if so, increments the first component in the tuple.
$(int,char,bool) x = $(42,’z’,true)
if (x[2])
x[0]++;
27
int foo[8] = {1,2,3,4,5,6,7,8};
char s[4] = "bar";
are both examples from C for creating arrays. Note that Cyclone follows
C’s conventions here, so that if you declare arrays as above within a func-
tion, then the lifetime of the array coincides with the activation record of
the enclosing scope. In other words, such arrays will be stack allocated.
To create heap-allocated arrays (or strings) within a Cyclone function,
you should either use “new” operator with either an array initializer or an
array comprehension. The following code demonstrates this:
2.10 Subtyping
Cyclone supports “extension on the right” and “covariant depth on const”
subtyping for pointers. This simply means that you can cast a value x from
having a type “pointer to a struct with 10 fields,” to “pointer to a struct
having only the first 5 fields.” For example, if we have the following defi-
nitions:
float xcoord(point p) {
return p->x;
}
then you can call xcoord with either a point or cpoint object. You can
also cast a pointer to a tuple having 3 fields (e.g., $(int,bool,double)*)
to a pointer to a tuple having only 2 fields (e.g., $(int,bool)*). In other
28
words, you can forget about the “tail” of the object. This allows a degree of
polymorphism that is useful when porting C code. In addition, you can do
“deep” casts on pointer fields that are const. (It is unsafe to allow deep
casts on non-const fields.) Also, you can cast a field from being non-const
to being const. You can also cast a constant-sized array to an equivalent
pointer to a struct or tuple. In short, Cyclone attempts to allow you to cast
one type to another as long as it is safe. Note, however, that these casts
must be explicit.
We expect to add more support for subtyping in the future (e.g., sub-
typing on function pointers, bounded subtyping, etc.)
int foo(int x) {
let y = x+3;
let z = 3.14159;
return (int)(y*z);
}
Here, we declared two variables y and z using “let.” When you use
let, you don’t have to write down the type of the variable. Rather, the
compiler infers the type from the expression that initializes the variable.
More generally, you can write “let pattern = exp;” to destructure
a value into a bunch of variables. For instance, if you pass a tuple to a
function, then you can extract the components as follows:
29
2.12 Polymorphic Functions
As mentioned above, Cyclone supports a limited amount of subtyping
polymorphism. It also supports a fairly powerful form of parametric poly-
morphism. Those of you coming from ML or Haskell will find this famil-
iar. Those of you coming from C++ will also find it somewhat familiar.
The basic idea is that you can write one function that abstracts the types
of some of the values it manipulates. For instance, consider the following
two functions:
$(string_t,int) swap1($(int,string_t) x) {
return $(x[1], x[0]);
}
$(int,int) swap2($(int,int) x) {
return $(x[1], x[0]);
}
The two functions are quite similar: They both take in a pair (i.e., a
2-tuple) and return a pair with the components swapped. At the machine-
level, the code for these two functions will be exactly the same, assuming
that ints and string_ts (char *) are represented the same way. So
it seems silly to write the code twice. Normally, a C programmer would
replace the definition with simply:
(assuming you added tuples to C). But of course, this isn’t type-safe be-
cause once I cast the values to void *, then I can’t be sure what type I’m
getting out. In Cyclone, you can instead write something like this:
$(‘b,‘a) swap($(‘a,‘b) x) {
return $(x[1],x[0]);
}
The code is the same, but it abstracts what the types are. The types ‘a
and ‘b are type variables that can be instantiated with any word-sized,
general-purpose register type. So, for instance, you can call swap on pairs
of integers, pairs of pointers, pairs of an integer and a pointer, etc.:
30
let $(x,y) = swap($("hello",3)); // x is 3, y is hello
let $(w,z) = swap($(4,3)); // w is 3, z is 4
Note that when calling a polymorphic function, you need not tell it
what types you’re using to instantiate the type variables. Rather, Cyclone
figures this out through unification.
C++ supports similar functionality with templates. However, C++ and
Cyclone differ considerably in their implementation strategies. First, Cy-
clone only produces one copy of the code, whereas a C++ template is
specialized and duplicated at each type that it is used. This approach
requires that you include definitions of templates in interfaces and thus
defeats separate compilation. However, the approach used by Cyclone
does have its drawbacks: in particular, the only types that can instantiate
type variables are those that can be treated uniformly. This ensures that
we can use the same code for different types. The general rule is that val-
ues of the types that instantiate a type variable must fit into a machine
word and must be passed in general-purpose (as opposed to floating-
point) registers. Examples of such types include int, pointers, tunion,
and xtunion types. Other types, including char, short, long long,
float, double, struct, and tuple types violate this rule and thus val-
ues of these types cannot be passed to a function like swap in place of the
type variables. In practice, this means that you tend to manipulate a lot of
pointers in Cyclone code.
The combination of parametric polymorphism and sub-typing means
that you can cover a lot of C idioms where void* or unsafe casts were
used without sacrificing type-safety. We use polymorphism a lot when
coding in Cyclone. For instance, the standard library includes many con-
tainer abstractions (lists, sets, queues, etc.) that are all polymorphic in the
element type. This allows us to re-use a lot of code. In addition, unlike
C++, those libraries can be compiled once and need not be specialized. On
the downside, this style of polymorphism does not allow you to do any
type-specific things (e.g., overloading or ad-hoc polymorphism.) Some-
day, we may add support for this, but in the short run, we’re happy not to
have it.
31
2.13 Polymorphic Data Structures
Just as function definitions can be parameterized by types, so can struct
definitions, tunion definitions, and even typedefs. For instance, the
following struct definition is similar to the one used in the standard
library for lists:
32
Queue, but will not be made available to the outside world. (This will
be enforced by a link-time type-checker that we are currently putting to-
gether.) Typically, the provider of the Queue abstraction would write in
an interface file:
2.15 Restrictions
Though Cyclone adds many new features to C, there are also a number of
restrictions that it must enforce to ensure code does not crash. Here is a
list of the major restrictions:
• Cyclone does not permit some of the casts that are allowed in C be-
cause incorrect casts can lead to crashes, and it is not always possible
for us to determine what is safe. In general, you should be able to cast
something from one type to another as long as the underlying rep-
resentations are compatible. Note that Cyclone is very conservative
about “compatible” because it does not know the size or alignment
constraints of your C compiler.
33
• Cyclone does not support pointer arithmetic on * or @ pointers. Pointer
arithmetic is not unsafe in itself, but it can lead to unsafe code when
the resulting pointer is assigned or dereferenced. You can cast the *
or @ value to a ? value and then do the pointer arithmetic instead.
• Cyclone inserts a NULL check when a * pointer is dereferenced and
it cannot determine statically that the pointer is not NULL.
• Cyclone requires any function that is supposed to return a non-void
value to execute a return statement (or throw an exception) on every
possible execution path. This is needed to ensure that the value re-
turned from the function has the right type, and is not just a random
value left in a register or on the stack.
• Unions in Cyclone can only hold “bits.” In particular, they can hold
combinations of chars, ints, shorts, longs, floats, doubles, structs of
bits, or tuples of bits. Pointer types are not supported. This avoids
the situation where an arbitrary bit pattern is cast to a pointer and
then dereferenced. If you want to use multiple types, then use tagged
unions (tunions).
• Cyclone only supports a limited form of malloc which is baked in.
Tuples and structs can be allocated via malloc but this requires writ-
ing explicitly: malloc(sizeof(t)) where t is the type of the value
that you are allocating. You cannot use malloc to allocate an array.
• Cyclone performs a static analysis to ensure that every variable and
every struct field is initialized before it is used. This prevents a
random stack value from being used improperly. The analysis is
somewhat conservative so you may need to initialize things earlier
than you would do in C. For instance, currently, Cyclone does not
support initializing a struct in a procedure separate from the one that
does the allocation.
• Cyclone does not permit gotos from one scope into another. C warns
against this practice, as it can cause crashes; Cyclone rules it out en-
tirely.
• Cyclone places some limitations on the form of switch statements
that rule out crashes like those caused by unrestricted goto. Fur-
thermore, Cyclone prevents you from accidentally falling through
34
from one case to another. To fall through, you must explicitly use
the fallthru keyword. Otherwise, you must explicitly break,
goto, continue, return, or throw an exception. However, ad-
jacent cases for a switch statement (with no intervening statement)
do not require an explicit fallthru.
• In the near future, Cyclone will place some restrictions on linking
for safety reasons. In particular, if you import a variable or function
with one type, then it must be exported by another file with that type.
In addition, access to C code will be restricted based on a notion of
security roles.
• Cyclone has some new keywords (let, abstract, region, etc.)
that can no longer be used as identifiers.
• Cyclone prevents you from using pointers to stack-allocated objects
as freely as in C to avoid security holes. The reason is that each dec-
laration block is placed in a conceptual “region” and the type system
tracks the region into which a pointer points.
• Cyclone does not allow you to explicitly free a heap-allocated object.
Instead, you can either use the region mechanism or rely upon the
conservative garbage collector to reclaim the space.
In addition, there are a number of shortcomings of the current imple-
mentation that we hope to correct in the near future. For instance:
• Cyclone currently does not support nested type declarations within a
function. All struct, union, enum, tunion, xtunion, and typedef
definitions must be at the top-level.
• Cyclone does not allow you to use a struct, tunion, union, xtunion,
or enum type without first declaring it. We do support one special
case of this where you embed a declaration within a typedef as in:
35
3 Pointers
As in C, Cyclone pointers are just addresses. Operations on pointers, such
as *x, x->f, and x[e], behave the same as in C, with the exception that
run-time checks sometimes precede memory accesses. (Exactly when and
where these checks occur is described below.) However, Cyclone prevents
memory errors such as dereferencing dangling pointers, so it may reject
legal C operations on pointers.
In order to enforce memory safety, Cyclone pointer types contain more
information than their C counterparts. In addition to the type of the object
pointed to, pointer types indicate:
Subtyping For any type t, the type t@ is a subtype of t*. The type of
malloc(sizeof(t)) is t@, as is new e where e has type t. Hence in the
declaration, “int *x = malloc(sizeof(int))”, there is an implicit
36
legal cast from t@ to t*. Note that even when t1 is a subtype of t2,
the type t1* is not necessarily a subtype of t2*, nor is t1@ necessarily
a subtype of t2@. For example, int@@ is not a subtype of int*@. This
illegal code shows why:
void f(int @@ x) {
int *@ y = x; // would be legal were int *@ a subtype of int @@
*y = NULL; // legal because *y has type int *
**x; // seg faults even though the type of *x is int @
}
You can explicitly cast a value of type t* to t@. Doing so will perform a
run-time check. The cast can be omitted, but the compiler emits a warning
and performs the run-time check. Because the current implementation
does not consider tests to change a t* to t@, such casts are sometimes
necessary to avoid spurious warnings, such as in this code:
void g(int * x) {
if (x != NULL)
f((int @)x);
}
37
Note that &e->f and &e[e2] check (if necessary) that e is not NULL
even though these constructs do not read through e.
Future
38
Future In the future, the bounds information on a pointer will not have
to be a compile-time constant. For example, you will be able to write
void f(int n) {}
int *{n} arr = new {for i < n : 37};
...
}
39
for pointers into different regions. Note that there is no way for a global
variable to hold a stack pointer.
Functions are implicitly polymorphic over the regions of their argu-
ments. For example, void f(int *‘r); is a prototype that can be passed
a pointer into any accessible region. That is, it can be passed a stack pointer
or a heap pointer, so long as it is not passed a dangling pointer. Note
that our example function f could not possibly assign its argument to a
global, whereas void g(int *‘H); could. On the other hand, g cannot
be passed a stack pointer.
The rules the compiler uses for filling in regions when they are omitted
from pointer types are numerous, but they are designed to avoid clutter in
the common case:
In the future, we intend to change the rule for typedef so that the mean-
ing can be different at each use of the typedef, as dictated by the other
rules. Until then, be warned that
is different than
Also, note that these rules are exactly the same as the rules for omitted
regions in instantiations of parameterized types.
40
Implementation A pointer’s region is not stored with the pointer at run-
time. So there is no way to ask for the region into which a pointer points.
For stack regions there is no region object at run-time per se, just the stack
space for the objects. As is normal with region-based systems, Cyclone
does not prevent dangling pointers. Rather, it prevents dereferencing dan-
gling pointers. But this is a subtle point.
41
subtraction. The result of pointer subtraction has type unsigned int, so
there is no bounds information.
Even though, t ? types are implemented as multi-word values, com-
parison operations (e.g., ==) are defined on them—the comparison is per-
formed on the access pointers.
Conversions to/from t ? ‘r types from/to t*{n}‘r and t@{n}‘r
types exist. Converting to a t? type just uses the t* or t@’s static type in-
formation to initialize the bounds information. The cast may be implicit;
no warning is given. Converting to a t* or t@ type incurs a run-time check
that the access pointer has a value such that the target type’s bounds in-
formation is sound. If so, the access pointer is returned, else the exception
Null_Exception is thrown. Implicit casts of this form cause the com-
piler to give a warning.
Future We may add a “cannot be NULL” version of these types for sake
of completeness. More significantly, we intend to allow user-defined types
to have certain fields describe the bounds information for other fields,
rather than relying on types built into the language.
42
Of course, using t? delays errors until run-time and is less efficient.
Using t@ is the most efficient and guarantees that Null_Exception will
not be thrown.
Currently, code performing pointer arithmetic must use t?.
4 Tagged Unions
In addition to struct, enum, and union, Cyclone has tunion (for “tagged
union”) and xtunion (for “extensible tagged union”) as ways to construct
new aggregate types. Like a union type, each tunion and xtunion has
a number of variants (or members). Unlike with union, an object of a
tunion or xtunion type is exactly one variant, we can detect (or dis-
criminate) that variant at run-time, and the language prevents using an
object as though it had a different variant.
The difference between tunion and xtunion is that tunion is closed—
a definition lists all possible variants. It is like the algebraic datatypes in
ML. With xtunion, separately compiled files can add variants, so no code
can be sure that it knows all the variants. There is a rough analogy with
not knowing all the subclasses of a class in an object-oriented language.
For sake of specificity, we first explain how to create and use tunion
types. We then explain xtunion by way of contrast with tunion. Be-
cause the only way to read parts of tunion and xtunion types is pattern-
matching, it is hard to understand tunion without pattern-matching, but
for sake of motivation and completeness, some of the examples in the ex-
planation of pattern-matching use tunion! To resolve this circular depen-
dency, we will informally explain pattern-matching as we use it here and
we stick to its simplest uses.
4.1 tunion
Basic Type Declarations and Subtyping [Warning: For expository pur-
poses, this section contains a white lie that is exposed in the later section called
“regions for tunion”.]
A tunion type declaration lists all of its variants. At its simplest, it
looks just like an enum declaration. For example, we could say:
43
As with enum, the declaration creates a type (called tunion Color)
and three constants Red, Green, and Blue. Unlike enum, these con-
stants do not have type tunion Color. Instead, each variant has its own
type, namely tunion Color.Red, tunion Color.Green, and tunion
Color.Blue. Fortunately these are all subtypes of tunion Color and
no explicit cast is necessary. So you can write, as expected:
In this simple example, we are splitting hairs, but we will soon find all
these distinctions useful. Unlike enum, tunion variants may carry any
fixed number of values, as in this example:
tunion Shape {
Point,
Circle(float),
Ellipse(float,float),
Polygon(int,float),
};
44
Rather non-null pointers to the value-carrying variant types are (e.g., tunion
Shape.Circle @‘H is a subtype of tunion Shape). So the following
are correct initializations that use implicit subtyping:
45
case &Circle(r): return true;
default: return false;
}
}
extern area_of_ellipse(float,float);
extern area_of_poly(int,float);
float area(tunion Shape s) {
float ans;
switch(s) {
case Point:
ans = 0;
break;
case &Circle(r):
ans = 3.14*r*r;
break;
case &Ellipse(r1,r2):
ans = area_of_ellipse(r1,r2);
break;
case &Polygon(sides,r):
ans = area_of_poly(sides,r);
break;
}
return ans;
}
46
The cases are compared in order against s. The following are compile-
time errors:
The _ is necessary because we did not give an explicit name to the stack
region.
We can now correct the white lie from the “basic type declarations and
subtyping” section. A declaration tunion Foo {...} creates a type
constructor which given a region creates a type. For any region ‘r, tunion
47
‘r Foo is a subtype of tunion Foo.Bar @‘r if tunion Foo.Bar car-
ries values. If tunion Foo.Bar does not carry values, then it is a subtype
of tunion ‘r Foo for all ‘r.
In the future, we may make the implied region for tunion Foo de-
pend on context, as we do with pointer types. For now, tunion Foo is
always shorthand tunion ‘H Foo.
In the above example, the root may be in any region, but all children
will be in the heap. This version allows the children to be in any region,
but they must all be in the same region. (The root can still be in a different
region.)
The constructors for variants with existential types are used the same
way, for example Foo("hi","mom",3), Foo(8,9,3), and Bar("hello",17)
48
are all well-typed. The compiler checks that the type variables are used
consistently—in our example, the first two arguments to Foo must have
the same type. There is no need (and currently no way) to explicitly spec-
ify the types being used.
Once a value of an existential variant is created, there is no way to de-
termine the types at which it was used. For example, Foo("hi","mom",3)
and Foo(8,9,3) both have type, “there exists some ‘a such that the type
is Foo<‘a>”. When pattern-matching an existential variant, you must
give an explicit name to the type variables; the name can be different from
the name in the type definition. Continuing our useless example, we can
write:
void f(tunion T t) {
switch(t) {
case Foo<‘a>(x,y,z): return;
case Bar<‘b,‘c>(x,y): return;
case Baz(x): return;
}
}
The scope of the type variables is the body of the case clause. So in
the first clause we could create a local variable of type ‘a and assign x
or y to it. Our example is fairly “useless” because there is no way for
code to use the values of existentially quantified types. In other words,
given Foo("hi","mom",3), no code will ever be able to use the strings
"hi" or "mom". Useful examples invariably use function pointers. For a
realistic library, see fn.cyc in the distribution. Here is a smaller (and sillier)
example; see the section on region and effects for an explanation of why
the ‘e stuff is necessary.
49
switch(t) {
case Foo<‘a>(arg,fun):
‘a x = arg;
int (*f)(‘a,int;{}) = fun;
f(arg,19);
break;
}
}
The case clause could have just been fun(arg)—the compiler would
figure out all the types for us. Similarly, all of the explicit types above
are for sake of explanation; in practice, we tend to rely heavily on type
inference when using these advanced typing constructs.
Future
4.2 xtunion
We now explain how an xtunion type differs from a tunion type. The
main difference is that later declarations may continue to add variants. Ex-
tensible datatypes are useful for allowing clients to extend data structures
in unforeseen ways. For example:
xtunion Food;
xtunion Food { Banana; Grape; Pizza(list_t<xtunion Food>) };
xtunion Food { Candy; Broccoli };
50
If multiple declarations include the same variants, the variants must
have the same declaration (the number of values, types for the values, and
the same existential type variables).
Because different files may add different variants and Cyclone com-
piles files separately, no code can know (for sure) all the variants of an
xtunion. Hence all pattern-matches against a value of an xtunion type
must end with a case that matches everything, typically default.
There is one built-in xtunion type: xtunion exn is the type of ex-
ceptions. Therefore, you declare new xtunion exn types like this:
5 Pattern Matching
Pattern matching provides a concise, convenient way to bind parts of large
objects to new local variables. Two Cyclone constructs use pattern match-
ing, let declarations and switch statements. Although the latter are more
common, we first explain patterns with let declarations because they have
fewer complications. Then we describe all the pattern forms. Then we
describe switch statements.
You must use patterns to access values carried by tagged unions, in-
cluding exceptions. In other situations, patterns make code more readable
and less verbose.
let x = e;
51
much more powerful because they can bind several variables to different
parts of an aggregate object. Here is an example:
struct Pair { int x; int y; };
void f(struct Pair pr) {
let Pair(fst,snd) = pr;
...
}
The pattern has the same structure as a struct Pair with parts being
variables. Hence the pattern is a match for pr and the variables are ini-
tialized with the appropriate parts of pr. Hence “let Pair(fst,snd)
= pr” is equivalent to “int fst =pr.x; int snd = pr.y”. A let-
declaration’s initializer is evaluated only once.
Patterns may be as structured as the expressions against which they
match. For example, given type
struct Quad { struct Pair p1; struct Pair p2; };
patterns for matching against an expression of type struct Quad could be
any of the following (and many more because of constants and wildcards—
see below):
• Quad(Pair(a,b),Pair(c,d))
• Quad(p1, Pair(c,d))
• Quad(Pair(a,b), p2)
• Quad(p1,p2)
• q
52
pr;” would match only when pr.x is 17. Otherwise the exception Match_-
Exception is thrown. Patterns that may fail are rarely useful and poor
style in let-declarations; the compiler emits a warning when you use them.
In switch statements, possibly-failing patterns are the norm—as we ex-
plain below, the whole point is that one of the cases’ patterns should match.
• The syntax
• The types of expressions it can match against (to avoid a compile-
time error)
• The expressions the pattern matches against (other expressions cause
a match failure)
• The bindings the pattern introduces, if any.
There is one compile-time rule that is the same for all forms: All vari-
ables (and type variables) in a pattern must be distinct. For example, “let
Pair(fst,fst) = pr;” is not allowed.
You may want to read the descriptions for variable and struct patterns
first because we have already explained their use informally.
• Variable patterns
– Syntax: an identifer
– Types for match: all types
– Expressions matched: all expressions
– Bindings introduced: the identifier is bound to the expression
being matched
• Wildcard patterns
– Syntax: _ (underscore, note this use is completely independent
of _ for type inference)
53
– Type for match: all types
– Expressions matched: all expressions
– Bindings introduced: none. Hence it is like a variable pattern
that uses a fresh identifier. Using _ is better style because it
indicates the value matched is not used. Notice that “let _ =
e;” is equivalent to e.
• Reference patterns
– Syntax: NULL
– Types for match: nullable pointer types, including ? types
– Expressions matched: NULL
54
– Bindings introduced: none
• enum patterns
• Tuple patterns
• Struct patterns
• Pointer patterns
55
– Types for match: pointer types, including ? types. Also tunion
Foo (or instantiations of it) when the pattern is &Bar(p1,...,pn)
and Bar is a value-carrying variant of tunion Foo and pi matches
the type of the ith value carried by Bar.
– Expressions matched: non-null pointers where the value pointed
to matches p. Note this explanation includes the case where the
expression has type tunion Foo and the pattern is &Bar(p1,...,pn)
and the current variant of the expression is “pointer to Bar”.
– Bindings introduced: bindings introduced by p
56
Restrictions
• You cannot implicitly “fall-through” to the next case. Instead, you must
use the fallthru; statement, which has the effect of transferring
control to the beginning of the next case. There are two exceptions
to this restriction: First, adjacent cases with no intervening statement
do not require a fall-through. Second, the last case of a switch does
not require a fall-through or break.
int f(int i) {
switch(i) {
case 0: f(34); return 17;
57
case 1: return 17;
default: return i;
}
}
Much More Powerful Because Cyclone case labels are patterns, a switch
statement can match against any expression and bind parts of the expres-
sion to variables. Also, fallthru can (in fact, must) bind values to the
next case’s pattern variables. This silly example demonstrates all of these
features:
58
case $(a,b): return a*b;
}
}
• It evaluates the pair $(f(x), f(y)) and stores the result on the
stack.
• If f(x) returned 0, the first case matches, control jumps to the second
case, and 0 is returned.
• Else if f(x) returned 1, the third case matches, b is assigned the value
f(y) returned, control jumps to the fourth case after assigning b+1-1
to a, and a (i.e., b + 1 - 1, i.e., b, i.e., f(y)) is returned.
• Else if f(y) returned 1, the fourth case matches, a is assigned the value
f(x) returned, and a is returned.
• Else the last case matches, a is assigned the value f(x) returned, b is
assigned the value f(y) returned, and a*b is returned.
59
fallthru is not allowed in the last case of a switch, not even if there is
an enclosing switch.
We repeat that fallthru may appear anywhere in a case body, but it is
usually used at the end, where its name makes the most sense. ML pro-
grammers may notice that fallthru with bindings is strictly more expres-
sive than or-patterns, but more verbose.
Case Guards We have withheld the full form of Cyclone case labels. In
addition to case p: where p is a pattern, you may write case p && e:
where p is a pattern and e is an expression of type int. (And since e1 &&
e2 is an expression, you can write case p && e1 && e2: and so on.)
Let’s call e the case’s guard.
The case matches if p matches the expression in the switch and e eval-
uates to a non-zero value. e is evaluated only if p matches and only after
the bindings caused by the match have been properly initialized. Here is
a silly example:
60
Exhaustiveness and Useless-Case Checking As mentioned before, it is
a compile-time error for the type of the switch expression to have values
that none of the case patterns match or for a pattern not to match any
values that earlier patterns do not already match. Rather than explain the
precise rules, we currently rely on your intuition. But there are two rules
to guide your intuition:
61
The general intuition is that there must be a break, continue, goto, re-
turn, or throw along all control-flow paths. The value of expressions is not
considered except for numeric constants and logical combinations (using
&&, ||, and ? :) of such constants. The statement try s catch . . . is checked
as though an exception might be thrown at any point while s executes.
6 Type Inference
Cyclone allows many explicit types to be elided. In short, you write _-
(underscore) where a type should be and the compiler tries to figure out
the type for you. Type inference can make C-like Cyclone code easier to
write and more readable. For example,
_ x = malloc(sizeof(sometype_t));
is a fine substitute for
sometype_t @ x = malloc(sizeof(sometype_t));
Of course, explicit types can make code more readable, so it is often better
style not to use inference.
Inference is even more useful because of Cyclone’s advanced typing
constructs. For example, it is much easier to write down _ than a type for
a function pointer.
We now give a rough idea of when you can elide types and how types
get inferred. In practice, you tend to develop a sense of which idioms
succeed, and, if there’s a strange compiler-error message about a variable’s
type, you give more explicit information about the variable’s type.
Note that _ can be used for part of a type. A silly example is $(_,int)
= $(3,4); a more useful example is an explicit cast to a non-nullable
pointer (to avoid a compiler warning). For example:
62
void f(some_big_type * x, some_big_type @ y) {
if(x != NULL) {
y = (_ @) x;
}
Semantics Except for the subtleties discussed below, using _ should not
change the meaning of programs. However, it may cause a program not
to type-check because the compiler no longer has the type information it
needs at some point in the program. For example, the compiler rejects
x->f if it does not know the type of x because the different struct types
can have members named f.
The compiler infers the types of expressions based on uses. For exam-
ple, consider:
_ x = NULL;
x = g();
x->f;
int f() {
char c = 1000;
63
return c;
}
int g() {
_ c = 1000; // compiler infers int
return c;
}
int main() {
printf("%d %d", f(), g());
return 0;
}
64
7 Polymorphism
Use ‘a instead of void *.
65
considered locals—when a function is called, its actual parameters
are placed in the same stack region as the variables declared at the
start of the function.
Dynamic regions Cyclone also has dynamic regions, which are regions that
you can add objects to over time. You create a dynamic region in
Cyclone with a statement,
region identifier {
statement1
...
statementn
}
The heap Cyclone has a special region called the heap. There is only one
heap, and it is never deallocated. New objects can be added to the
heap at any time (the heap can grow). Cyclone uses a garbage collec-
tor to automatically remove objects from the heap when they are no
longer needed. You can think of garbage collection as an optimiza-
tion that tries to keep the size of the heap small.
Objects outside of the heap live until their region is deallocated; there
is no way to free such an object earlier. Objects in the heap can be garbage
collected once they are unreachable (i.e., they cannot be reached by travers-
ing pointers) from the program’s variables. Objects in live non-heap re-
gions always appear reachable to the garbage collector (so everything reach-
able from them appears reachable as well).
Cyclone forbids following dangling pointers. This restriction is part of
the type system: it’s a compile-time error if a dangling pointer (a pointer
into a deallocated region) might be followed. There are no run-time checks
66
of the form, “is this pointing into a live region?” As explained below, each
pointer type has a region and objects of the type may only point into that
region.
8.2 Allocation
You can create a new object on the heap using one of three kinds of expres-
sion:
• new expr evaluates expr, places the result into the heap, and returns
a pointer to the result. It is roughly equivalent to
For example, new 17 allocates space for an integer on the heap, ini-
tializes it to 17, and returns a pointer to the space. For another exam-
ple, if we have declared
then new Pair(7,9) allocates space for two integers on the heap,
initializes the first to 7 and the second to 9, and returns a pointer to
the first.
let x = new { 3, 4, 5 };
is roughly equivalent to
67
unsigned int sz = expr1 ;
t @ temp = malloc(sz * sizeof(t2 )); // where t is the type of ex
for (int identifier = 0; identifier < sz; identifier++)
temp[ identifier] = expr2 ;
That is, expr1 is evaluated first to get the size of the new array, the
array is allocated, and each element of the array is initialized by the
result of evaluating expr2 . expr2 may use identifier, which holds the
index of the element currently being initialized.
For example, this function returns an array containing the first n pos-
itive even numbers:
int ? n_evens(int n) {
return new {for next < n : 2*(next+1)};
}
Note that:
68
On the plus side, the type of malloc(sizeof(type)) is type @ (a
subtype of type *), so there is no need to cast the result from char
*.
Objects can be created in a dynamic region using the following analo-
gous expressions.
• rnew(identifier) expr
• rnew(identifier) array-initializer
• rmalloc(identifier,sizeof(type))
rnew and rmalloc are keywords.
The Cyclone library has a global variable Core::heap_region which
contains a handle for the heap region, so, for example, new expr is just
rnew(heap_region,expr).
The only way to create an object in a stack region is declaring it as a
local variable. Cyclone does not currently support salloc; use a dynamic
region instead.
69
the compiler uses y’s initializer to decide that y’s type is int @ ‘H. Hence
the assignment is illegal, the parameter’s region (called ‘f1) does not out-
live the heap. On the other hand, this function type-checks:
void f2(int x) {
int @ y = &x;
y = new 42;
}
because y’s types is inferred to be int @ ‘f2 and the assignment makes
y point into a region that outlives ‘f2. We can fix our first function by
being more explicit:
void f1(int x) {
int @‘f1 y = new 42;
y = &x;
}
Function bodies are the only places where the compiler tries to infer the
region by how a pointer is used. In function prototypes, type declarations,
and top-level global declarations, the rules for the meaning of omitted re-
gion annotations are fixed. This is necessary for separate compilation: we
often have no information other than the prototype or declaration.
In the absence of region annotations, function-parameter pointers are
assumed to point into any possible region. Hence, given
70
Finally, we may need to refer to the region for x or y in the function body.
If we omit the names (relying on the compiler to make up names), then we
obviously won’t be able to do so.
Formally, omitted regions in function parameters are filled in by fresh
region names and the function is “region polymorphic” over these names
(as well as all explicit regions).
In the absence of region annotations, function-return pointers are as-
sumed to point into the heap. Hence the following function will not type-
check:
Both of these functions will type-check; the second one is more useful:
Notice that we used the same ‘r for the handle and the return type. We
could have also passed the object back through a pointer parameter like
this:
71
void f2(region_t<‘r> r,int x,int y,$(int,int)*‘r *‘s p){
*p = rnew(r) $(7,9);
}
Notice that we have been careful to indicate that the region where *p
lives (corresponding to ‘s) may be different from the region for which r
is the handle (corresponding to ‘r). Here’s how to use f2:
region rgn {
$(int,int) *‘rgn x = NULL;
f2(rgn,3,4,&x);
}
The ‘s and ‘rgn in our example are unnecessary because they would be
inferred.
typedef, struct, tunion, and xtunion declarations can all be pa-
rameterized by regions, just as they can be parameterized by types. For
example, here is part of the list library. Note that the “::R” is necessary.
72
// Return the length of a list.
int length(list_t x) {
int i = 0;
while (x != NULL) {
++i;
x = x->tl;
}
return i;
}
The type list_t<type,rgn> describes pointers to lists whose elements
have type type and whose “spines” are in rgn.
The functions are interesting for what they don’t say. Specifically, when
types and regions are omitted from a type instantiation, the compiler uses
rules similar to those used for omitted regions on pointer types. More
explicit versions of the functions would look like this:
list_t<‘a,‘r2> rcopy(region_t<‘r2> r2, list_t<‘a,‘r1> x) {
list_t<‘a,‘r2> result, prev;
...
}
list_t<‘a,‘H> copy(list_t<‘a,‘r> x) { ... }
int length(list_t<‘a,‘r> x) { ... }
• Use compile-time region names. Syntactically these are just type vari-
ables, but they are used differently.
• Decorate each pointer type and handle type with one region name.
73
use a handle for allocation, the handle type’s region name must be
in the capability.
This strategy is probably too vague to make sense at this point, but
it may help to refer back to it as we explain specific aspects of the type
system.
Note that in the rest of the documentation (and in common parlance)
we abuse the word “region” to refer both to region names and to run-time
collections of objects. Similarly, we confuse a block of declarations, its
region-name, and the run-time space allocated for the block. (With loops
and recursive functions, “the space allocated” for the block is really any
number of distinct regions.) But in the rest of this section, we painstak-
ingly distinguish region names, regions, etc.
• If a block (blocks create stack regions) has label L, then the region-
name for the block is ‘L.
The region name for the heap is ‘H. Region names associated with pro-
gram points within a function should be distinct from each other, distinct
from any region names appearing in the function’s prototype, and should
not be ‘H. (So you cannot use H as a label name.) Because the function’s
74
return type cannot mention a region name for a block or region-construct
in the function, it is impossible to return a pointer to deallocated storage.
In region r <‘r> s and region r s, the type of r is region_t<‘r>.
In other words, the handle is decorated with the region name for the con-
struct. Pointer types’ region names are explicit, although you generally
rely on inference to put in the correct one for you.
8.4.2 Capabilities
In the absence of explicit effects (see below), the capability for a program
point includes exactly:
• ‘H
• The region names for the blocks and “region r s” statements that
contain the program point
75
• Every region outlives itself.
• Region names for inner blocks outlive region names for outer blocks.
void f(int *‘r1*‘r2 x,int *‘r3 y; ‘r2 < ‘r1, ‘r3 < ‘r2);
This says that ‘r1 outlives ‘r2 and ‘r2 outlives ‘r3. The body will
be checked under these assumptions. Calls to f will type-check only
if the compiler knows that the region names of the actual arguments
obey the outlives assumptions.
the type struct List<int,‘H> is for a list of ints in the heap. Notice
that all of the “cons cells” of the List will be in the same region (the type
of the tl field uses the same region name ‘r that is used to instantiate
the recursive instance of struct List<‘a,‘r>). However, we could
instantiate ‘a with a pointer type that has a different region name.
tunion and xtunion declarations must also be instantiated with an
additional region name. If an object of type tunion ‘r Foo turns out to
be a value-carrying variant, then the object is treated (capability-wise) as a
pointer with region name ‘r. If the region name is omitted from a use of
a tunion declaration, it is implicitly ‘H.
76
8.4.5 Function Calls
If a function parameter or result has type int *‘r or region_t<‘r>,
the function is polymorphic over the region name ‘r. That is, the caller
can instantiate ‘r with any region in the caller’s current capability. This
instantiation is usually implicit, so the caller just calls the function and the
compiler uses the types of the actual arguments to infer the instantiation
of the region names (just like it infers the instantiation of type variables).
The callee is checked knowing nothing about ‘r except that it is in
its capability (plus whatever can be determined from explicit outlives as-
sumptions). For example, it will be impossible to assign a parameter of
type int*‘r to a global variable. Why? Because the global would have to
have a type that allowed it to point into any region. There is no such type
because we could never safely follow such a pointer (since it could point
into a deallocated region).
77
9 Namespaces
As in C++, namespaces are used to avoid name clashes in code. For exam-
ple:
namespace Foo {
int x = 0;
int f() { return x; }
}
namespace Bar {
using Foo {
int g() { return f(); }
}
int h() { return Foo::f(); }
}
78
• The current implementation translates qualified Cyclone variables
to C identifiers very naively: each :: is translated to _ (underscore).
This translation is wrong because it can introduce clashes that are
not clashes in Cyclone, such as in the following:
the Cyclone code refers to the global variable as Foo::x, but the
translation to C will convert all uses to just x. The following code
will therefore get compiled incorrectly (f will return 4):
79
10 Varargs
C functions that take a variable number of arguments (vararg functions)
are syntactically convenient for the caller, but C makes it very difficult to
ensure safety. The callee has no fool-proof way to determine the number
of arguments or even their types. Also, there is no type information for
the compiler to use at call-sites to reject bad calls.
Cyclone provides three styles of vararg functions that provide different
trade-offs for safety, efficiency, and convenience.
First, you can call C vararg functions just as you would in C:
extern "C" void foo(int x, ...);
void g() {
foo(3, 7, "hi", ’x’);
}
However, for the reasons described above, foo is almost surely unsafe.
All the Cyclone compiler will do is ensure that the vararg arguments at
the call site have some legal Cyclone type.
Actually, you can declare a Cyclone function to take C-style varargs,
but Cyclone provides no way to access the vararg arguments for this style.
That is why the example refers to a C function. (In the future, function sub-
typing could make this style less than completely silly for Cyclone func-
tions.)
The second style is for a variable number of arguments of one type:
void foo(int x, ...string_t args);
void g() {
foo(17, "hi", "mom");
}
The syntax is a type and identifer after the “...”. (The identifier is op-
tional in prototypes, as with other parameters.) You can use any identifier;
args is not special. At the call-site, Cyclone will ensure that each vararg
has the correct type, in this case string_t.
Accessing the varargs is simpler than in C. Continuing our example,
args has type string_t ? ‘foo in the body of foo. You retrieve the
first argument ("hi") with args[0], the second argument ("mom") with
args[1], and so on. Of course, args.size tells you how many argu-
ments there are.
80
This style is implemented as follows: At the call-site, the compiler gen-
erates a stack-allocated array with the array elements. It then passes a “fat
pointer” to the callee with bounds indicating the number of elements in
the array. Compared to C-style varargs, this style is less efficient because
there is a bounds-check and an extra level of indirection for each vararg
access. But we get safety and using vararg functions is just as convenient.
A very useful example of this style is in the list library:
The special syntax “inject” is the syntactic distinction for the third
style. The type must be a tunion type. In the body of the vararg func-
tion, the array holding the vararg elements has this tunion type, with
81
the function’s region. (That is, the wrappers are stack-allocated just as the
vararg array is.)
At the call-site, the compiler implicitly wraps each vararg by finding
a tunion variant that has the expression’s type and using it. The exact
rules for finding the variant are as follows: Look in order for a variant
that carries exactly the type of the expression. Use the first variant that
matches. If none, make a second pass and find the first variant that carries
a type to which the expression can be coerced. If none, it is a compile-time
error.
In practice, the tunion types used for this style of vararg tend to be
quite specialized and used only for vararg purposes.
Compared to the other styles, the third style is less efficient because the
caller must wrap and the callee unwrap each argument. But everything is
allocated on the stack and call sites do everything implicitly. A testament
to the style’s power is the library’s implementation of printf and scanf
entirely in Cyclone (except for the actual I/O system calls, of course).
82
• Use tunions for unions with pointers.
• Initialize variables.
Even when you follow these suggestions, you’ll still need to test and
debug your code carefully. By far, the most common run-time errors
you will get are uncaught exceptions for null-pointer dereference
or array out-of-bounds. Under Linux, you should get a stack backtrace
when you have an uncaught exception which will help narrow down
where and why the exception occurred. On other architectures, you
can use gdb to find the problem. The most effective way
to do this is to set a breakpoint on the routines _throw_null()
and _throw_arraybounds() which are defined in the
runtime and used whenever a null-check or array-bounds-check fails.
Then you can use gdb’s backtrace facility to see where
the problem occurred. Of course, you’ll be debugging at the C
level, so you’ll want to use the -save-c and -g
options when compiling your code.
port:null]Use NULL instead of 0. Use NULL instead of 0 for null-
pointers.
83
port:pointers]Change pointer types to fat pointer types where nec-
essary. Ideally, you should examine the code and use thin pointers
(e.g., int* or better int@) wherever possible as these require fewer
run-time checks and less storage. However, recall that thin pointers
do not support pointer arithmetic. In those situations, you’ll need
to use fat pointers (e.g., int?). A particularly simple strategy when
porting C code is to just change all pointers to fat pointers. The code
is then more likely to compile, but will have greater overhead. After
changing to use all fat pointers, you may wish to profile or reexamine
your code and figure out where you can profitably use thin pointers.
Use tunions for unions with pointers. Cyclone only accepts unions that
contain “bits” (i.e., ints; chars;
shorts; floats; doubles; or tuples, structs, unions, or arrays of bits.)
So if you have a C union with a pointer type in it, you’ll have to
code around it. One way is to simply use a tagged union (tunion).
Note that this adds a level of indirection and requires pattern
matching to ensure type-safety.
84
so that you can construct an appropriate initial value. For instance,
suppose you have the following declarations at top-level:
struct DICT;
void init() {
d = new_dict();
85
struct DICT;
void init() {
d = new_dict();
char ? temp;
temp = x;
bar(temp);
temp = y;
bar(temp);
86
}
char ? temp1;
char ? temp2;
temp1 = x;
bar(temp1);
temp2 = y;
bar(temp2);
87
Now Cyclone can figure out that temp1 is a pointer into
the region #0 whereas temp2 is a pointer into
region #1.
Connect argument and result pointers with the same region. Remember
that Cyclone assumes that pointer inputs to a function might
point into distinct regions, and that output pointers, by default point
into the heap. Obviously, this won’t always be the case. Consider
the following code:
if (b)
return x;
else
return y;
88
reflecting the fact that neither x nor y is a pointer
into the heap. You can fix this problem by putting in explicit regions
to connect the arguments and the result. For instance, we might
write:
if (b)
return x;
else
return y;
and then the code will compile. Of course, any caller to this function
must now ensure that the arguments are in the same region.
void foo(int b) {
89
(2:39-2:40): implicit cast to shorter array
b is fals
void foo(int b) {
90
}
Alternatively, you can declare a temp with the right type and use
it:
void foo(int b) {
The point is that by giving Cyclone more type information, you can
get it to do the right sorts of promotions.
void foo() {
char ?x = "howdy"
x[0] = ’a’;
91
The problem is that the string "howdy" will be placed in
the read-only text segment, and thus trying to write to it will
cause a fault. Fortunately, Cyclone complains that you’re trying
to initialize a non-const variable with a const value so this
problem doesn’t occur in Cyclone. If you really want to initialize
x with this value, then you’ll need to copy the string,
say using the dup function from the string library:
void foo() {
char ?x = dup("howdy");
x[0] = ’a’;
int base);
char ?x = dup("howdy");
92
return strtoul(x,e,0);
Get rid of calls to free, realloc, memset, memcpy, etc. There are many stan-
dard functions that Cyclone can’t support
and still maintain type-safety. An obvious one is free()
which releases memory. Let the garbage collector free the object
for you, or use region-allocation if you’re scared of the collector.
Other operations, such as memset and memcpy
are also not supported
by Cyclone. You’ll need to write code to manually copy one data
structure to another. Fortunately, this isn’t so bad since Cyclone
supports structure assignment.
Use polymorphism or tunions to get rid of void*. Often you’ll find C code
that uses void* to simulate
polymorphism. A typical example is something like swap:
93
void swap(void **x, void **y) {
void *t = x;
x = y;
y = t;
‘a t = x;
x = y;
y = t;
Now the code can (safely) be called with any two (compatible)
pointer types. This trick works well as long as you only need
to “cast up” from a fixed type to an abstract one. It doesn’t
work when you need to “cast down” again. For example, consider
the following:
94
int foo(int x, void *y) {
if (x)
else {
printf("%s\n",(char *)y);
return -1;
95
int foo(i_or_s y) {
switch (y) {
case String(s):
printf("%s\n",s);
return -1;
Rewrite the bodies of vararg functions. See the section on varargs for more
details.
A.2 Interfacing to C
When porting any large code from C to Cyclone, or even when writing
a Cyclone program from scratch, you’ll want to be able to access
legacy libraries. To do so, you must understand how Cyclone
represents data structures, how it compiles certain features,
and how to write wrappers to make up for representation mismatches.
Sometimes, interfacing to C code is as simple as writing
an appropriate interface. For instance, if you want to
96
call the acos function which is defined in the C
Math library, you can simply write the following:
extern "C" {
double acos(double);
float acosf(float);
double acosh(double);
float acoshf(float);
double asin(double);
97
char *int_to_string(int i);
void foo() {
int i = 12345;
printf(int_to_string(i));
and we’ll get the right behavior. However, this obviously isn’t
going to work if the size of the buffer might be different for
different calls.
Another solution is to somehow convert the “C string” to a “Cyclone
98
string” before handing it back to Cyclone. This is fundamentally
an unsafe operation because we must rely upon the “C string” being
properly zero-terminated. So, your best bet is to write a little
wrapper function in C which can convert the C string to a Cyclone
string and then use that as follows:
void foo() {
int i = 12345;
printf(Cstring_to_string(int_to_string(i)));
struct _tagged_arr {
};
99
struct _tagged_arr Cstring_to_string(char *s) {
if (s == NULL) {
else {
_throw_badalloc();
str.curr[--sz] = ’\0’;
while(--sz>=0)
str.curr[sz]=s[sz];
100
}
if (s == NULL) {
101
// return Cyclone fat NULL
else {
str.base = str.curr = s;
102
stderr, various macros, and various function prototypes.
A typical example function is the one to remove a file which
has the following prototype:
namespace Cstdio {
extern "C" {
int fclose(__sFILE);
...
namespace Std;
103
abstract struct __sFILE {
Cstdio::__sFILE *file;
};
return Cstdio::remove(string_to_Cstring(filename));
if (r == 0) {
f->file = NULL;
return r;
...
104
At the top of the file, we have declared the external types
and functions that C uses. Notice that these definitions
are wrapped in their own namespace (Cstdio) so that
we can “redefine” them within the Std namespace.
Also notice that they are wrapped with an extern-C so that
when compiled, their names won’t get mangled.
The Cyclone wrapper code starts after the namespace Std
declaration. The first thing we do is define a “wrapper”
type for C files. The wrapper includes a possibly null pointer
to a C file. We use this level of indirection to keep someone
from closing a file twice, or from reading or writing to a file
that has been closed. Of course, any operations on files will
need wrappers to strip off the level of indirection and check
that the file has not been closed already.
The wrapper function for remove calls the string_to_Cstring
function (defined in runtime_cyc.c) to conver the
argument to a C string and then passes the C string to the
real remove function, returning the error code.
The wrapper function for fclose checks to make sure that
the file has not already been closed. If so, it returns -1.
Otherwise, it pulls out the real C file and passes it to the
real fclose function. It then checks the return code
(to ensure that the close actually happened) and if it’s 0,
sets the C file pointer to NULL, ensuring that we
don’t call C’s fclose on the file again.
105
What does int @ mean? In Cyclone @ is a pointer that is guaranteed not
to be NULL. The Cyclone compiler guarantees through static or dy-
namic checks. For example,
int *x = NULL;
int @x = NULL;
is an error
What does int *{37} mean? This is the type of pointers to a sequence
of at least 37 integers. The extra length information is used by Cy-
clone to prevent buffer overflows. For example, Cyclone will com-
pile x[expr] into code that will evaluate expr, and check that the
result is less than 37 before accessing the element. Note that int
* is just shorthand for int *{1}. Currently, the expression in the
braces must be a compile-time constant.
What does int *‘r mean? This is the type of a pointer to an int in re-
gion ‘r. A region is just a group of objects with the same lifetime—
all objects in a region are freed at once. Cyclone uses this region in-
formation to prevent dereferencing a pointer into a previously freed
region. Regions can have a “nested” structure, for example, if the
region for a function parameter is a variable, then the function may
assume that the parameter points into a region whose lifetime in-
cludes the lifetime of the function.
What does ‘H mean? This is Cyclone’s heap region: objects in this region
cannot be explicitly freed, only garbage-collected. Effectively, this
means that pointers into the heap region can always be safely deref-
erenced; conceptually, objects in the heap last “forever,” since they
are always available if needed; garbage collection is like an optimiza-
tion that frees objects after they are no longer needed.
What does int @{37}‘r mean? A pointer can come with all or none of
the nullity, bound, and region annotation. This type is the type of
non-null pointers to at least 37 consecutive integers in region ‘r.
When the bound is omitted it default to 1.
106
What is a pointer type’s region when it’s omitted? Every pointer type has
a region; if you omit it, the compiler puts it in for you implicitly. The
region added depends on where the pointer type occurs. In function
arguments, a new region variable is used. In function results and
type definitions (inlcuding typedef), the heap region (‘H) is used.
In function bodies, the compiler looks at the uses (using unification)
to try to determine a region.
What does int ? mean? The ? a special kind of pointer that carries along
bounds information. It is a “questionable” pointer: it might be NULL
or pointing out of bounds. An int ? is a pointer to an integer,
along with some information that allows Cyclone to check whether
the pointer is in bounds at run-time. These are the only kinds of
pointers that you can use for pointer arithmetic in Cyclone.
What does ‘a mean? ‘a is a type variable. Type variables are typically
used in polymorphic functions. For example, if a function takes a
parameter of type ‘a, then the function can be called with a value
of any suitable type. If there are two arguments of type ‘a, then any
call will have to give values of the same type for those parameters.
And if the function returns a type ‘a, then it must return a result of
the same type as the the argument. Syntactically, a type variable is
any identifier beginning with ‘ (backquote).
What is a “suitable” type for a type variable? The last question said that
a type variable can stand for a “suitable” type. Unfortunately, not
all types are “suitable.” Briefly, the “suitable” types are those that
fit into a general-purpose machine register, typically including int,
pointers, tunion types, and xtunion types. Non-suitable types
include float, struct types (which can be of arbitrary size), tu-
ples, and questionable pointers. Technically, the suitable types are
the types of “box kind,” described below.
How do I cast from void *? You can’t do this in Cyclone. A void * in
C really does not point to void, it points to a value of some type.
However, when you cast from a void * in C, there is no guarantee
that the pointer actually points to a value of the expected type. This
can lead to crashes, so Cyclone doesn’t permit it. Cyclone’s polymor-
phism and tagged unions can often be used in places where C needs
to use void *, and they are safe.
107
What does _ (underscore) mean in types? Underscore is a “wildcard” type.
It stands for some type that the programmer doesn’t want to bother
writing out; the compiler is expected to fill in the type for the pro-
grammer. Sometimes, the compiler isn’t smart enough to figure out
the type (you will get an error message if so), but usually there is
enough contextual information for the compiler to succeed. For ex-
ample, if you write
_ x = new Pair(3,4);
the compiler can easily infer that the wildcard stands for struct
Pair @. In fact, if x is later assigned NULL, the compiler will infer
that x has type struct Pair * instead.
Note that _ is not allowed as part of top-level declarations.
What do ‘a::B, ‘a::M, ‘a::A, ‘a::R, and ‘a::E mean? Types are di-
vided into different groups, which we call kinds. There are five dif-
ferent kinds: B (for Box), M (for Memory), A (for Any), R (for Re-
gion), and E (for Effect). The notation typevar::kind says that a
type variable belongs to a kind. A type variable can only be instanti-
ated by types that belong to its kind.
Box types include int, pointers (except for questionable pointers)
tagged unions, and extensible tagged unions. Memory types include
all box types, tuples, tunion and xtunion variants, questionable
pointers, and non-abstract structs. Any types include all types that
don’t have kind Region or Effect. Region types are regions, i.e., the
heap and stack regions. Effect types are sets of regions (these are
explained elsewhere).
What does it mean when type variables don’t have explicit kinds? Every
type variable has a kind, but usually the programmer doesn’t have
to write it down. In function prototypes, the compiler will infer the
most permissive kind. For example,
is shorthand for
108
void f(‘a::B *‘b::R x, ‘c::M * y, ‘a::B z)
is shorthand for
but
is not.
What are tunion and xtunion? These are Cyclone’s tagged union and
extensible tagged union types. In C, when a value has union type,
you know that in fact it has one of the types of the union’s fields,
but there is no guarantee which one. This can lead to crashes in
C. Cyclone’s tagged unions are like C unions with some additional
information that lets the Cyclone compiler determine what type the
underlying value actually has, thus helping to ensure safety.
109
What are the Cyclone keywords? In addition to the C keywords, the fol-
lowing have special meaning and cannot be used as identifiers: abstract,
catch, codegen, cut, fallthru, fill, let, malloc, namespace,
new, NULL, region_t, regions, rmalloc, rnew, splice, throw,
try, tunion, using, xtunion. As in gcc, __attribute__ is re-
served as well.
What is new? new expr allocates space in the heap region, initializes it
with the result of evaluating expr, and returns a pointer to the space.
It is roughly equivalent to
110
to heap-allocate an array of size expr1 with the ith element initialized
to expr2 (which may mention i).
throw MyExn;
111
If statement1 throws an MyExn and no inner catch handles it, control
transfers to statement2 .
The catch body can have any number of case clauses. If none
match, the exception is re-thrown.
Exceptions can carry values with them. For example, here’s how to
declare an exception that carries an integer:
try statement1
catch {
case &MyIntExn(x): statement2
}
What does let mean? In Cyclone, let is used to declare variables. For
example,
let x,y,z;
112
declares the three variables x, y, and z. The types of the variables
do not need to be filled in by the programmer, they are filled in by
the compiler’s type inference algorithm. The let declaration above
is equivalent to
_ x;
_ y;
_ z;
let x = 3;
What is a pattern and how do I use it? Cyclone’s patterns are a convenient
way to destructure aggregate objects, such as structs and tuples. They
are also the only way to destructure tagged unions. Patterns are used
in Cyclone’s let declarations, switch statements, and try/catch
statements.
is a way to extract the second element of the pair and bind it to a new
variable y.
113
What does it mean when a function has an argument with type ‘a? Any
type that looks like ‘ (backquote) followed (without whitespace) by
an identifier is a type variable. If a function parameter has a type
variable for its type, it means the function can be called with any
pointer or with an int. However, if two parameters have the same
type variable, they must be instantiated with the same type. If all
occurrences of ‘a appear directly under pointers (eg. ‘a *), then an
actual parameter can have any type, but the restrictions about using
the same type still apply. In general, Cyclone has parametric poly-
morphism as a safe alternative to casts and void *.
Do functions with type variables get duplicated like C++ template functions?Is there run-tim
No and no. Each Cyclone function gives rise to one function in the
output, and types are not present at run-time. When a function is
called, it does not need to know the types with which the caller is
instantiating the type variables, so no instantiation actually occurs—
the types are not present at run-time. We do not have to duplicate the
code because we either know the size of the type or the size does not
matter. This is why we don’t allow type variables of memory kind
as parameters—doing so would require code duplication or run-time
types.
What casts are allowed? Cyclone doesn’t support all of the casts that C
does, because incorrect casts can lead to crashes. Instead, Cyclone
supports a safe subset of C’s casts. Here are some examples.
All of C’s numeric casts, conversions, and promotions are unchanged.
You can always cast between type@{const}, type*{const}, and type?.
A cast from type? to one of the other types includes a run-time check
that the pointer points to a sequence of at least const objects. A cast
114
to type@{const}from one of the other types includes a run-time check
that the pointer is not NULL. No other casts between these type have
run-time checks. A failed run-time check throws Null_Exception.
A pointer into the heap can be cast to a pointer into another region.
A pointer to a struct or tuple can be cast to a pointer to another
struct or tuple provided the “target type” is narrower (it has fewer
fields after “flattening out” nested structs and tuples) and each
(flattened out) field of the target type could be the target of a cast
from the corresponding field of the source type. A pointer can be cast
to int. The type type*{const1 }can be cast to type*{const2 }provided
const2 < const1 , and similarly for type@{const1 }and type@{const2 }.
An object of type tunion T.A can be cast to tunion T if A does not
carry values. An object of type tunion T.A@ can be cast to tunion
T if A does carry values. The current implementation isn’t quite as
lenient as it should be. For example, it rejects a cast from int *{4}
to $(int,int)*{2}, but this cast is safe.
For all non-pointer-containing types type, you can cast from a type
? to a char ?. This allows you to make frequent use of memcpy,
memset, etc.
115
int x[37] = { for i < expr 1 : expr2 }
Are there threads? Cyclone does not yet have a threads library and some
of the libraries are not re-entrant. In addition, because Cyclone uses
unboxed structs of three words to represent fat pointers, and updat-
ing them is not an atomic operation, it’s possible to introduce un-
soundnesses by adding concurrent threads. However, in the future,
we plan to provide support for threads and a static analysis for pre-
venting these and other forms of data races.
Can I use setjmp and longjmp? No. However, Cyclone has exceptions,
which can be used for non-local control flow. The problem with
setjmp and longjmp is that safety demands we prohibit a longjmp
to a place no longer on the stack. A future release may have more
support for non-local control flow.
What types are allowed for union members? Currently, union members
cannot contain pointers. You can have numeric types (including
bit fields and enumerations), structs and tuples of allowable union-
member types, and other unions.
What is aprintf? The aprintf function is just like printf, but the
output is placed in a new string allocated on the heap.
116
int main(int argc, char ?? argv);
117
?, that is, they carry bounds information. In Cyclone a string ends
when a null character is found, or when the bounds are exceeded.
Can I call free? Yes and no. Individual memory objects may not be freed.
In future versions, we may support freeing objects for which you
can prove that there are no other pointers to the object. Until then,
you must rely on a garbage collector to reclaim heap objects or use
regions (similar to “arenas” or “zones”) for managing collections of
objects.
For porting code, we have defined a free function that behaves as a
no-op, having type
118
calling malloc behind the collector’s back. Instead, you should call
GC_malloc. See the collector’s documentation for more informa-
tion.
Note that if you allocate all objects on the stack, garbage collection
will never occur. If you allocate all objects on the stack or in regions,
it is very unlikely collection will occur and nothing will actually get
collected.
int x[256];
int y[] = { 0, 1, 2, 3 };
int z[] = { for i < 256 : i };
To pass (a pointer to) the array to another function, the function must
have a type indicating it can accept stack pointers, as explained else-
where.
Can I use salloc or realloc? No, neither of these functions are cur-
rently provided and it is not possible to write them in Cyclone. Both
features are hard to provide in a way that is guaranteed safe.
Why do I have to cast from * to @ if I’ve already tested for NULL? Our com-
piler is not as smart as you are. It does not realize that you have
tested for NULL, and it insists on a check (the cast) just to be sure.
You can leave the cast implicit, but the compiler will emit a warn-
ing. We are currently working to incorporate a flow analysis to omit
spurious warning. Because of aliasing, threads, and undefined eval-
uation order, a sound analysis is non-trivial.
119
Why can’t a function parameter or struct field have type ‘a::M? Type vari-
ables of memory kind can be instantiated with types of any size.
There is no straightforward way to compile a function with an ar-
gument of arbitrary size. The obvious way to write such a function
is to manipulate a pointer to the arbitrary size value instead. So your
parameter should have type ‘a::M* or ‘a::M@.
Can I see how Cyclone compiles the code? The easiest way to do this is
to pass the flags -save-c and -pp to the compiler. This instructs
the compiler to save the C code that it builds and passes to GCC, and
print it out using the pretty-printer. You will have to work to make
some sense out of the C code, though. It will likely contain many
extern declarations (because the code has already gone through
the preprocessor) and generated type definitions (because of tuples,
tagged unions, and questionable pointers). Pattern-matching code
gets translated to a mess of temporary variables and goto state-
ments. Array-bounds checks and NULL checks can clutter array-
intensive and pointer-intensive code. And all typedefs are expanded
away before printing the output.
Can I use gdb on the output? You can run gdb, but debugging support
is not all the way there yet. By default, source-level debugging op-
erations within gdb will reference the C code generated by the Cy-
clone compiler, not the Cyclone source itself. In this case, you need
to be aware of three things. First, you have to know how Cyclone
translates top-level identifiers to C identifiers (it prepends Cyc and
separates namespaces by instead of ::) so you can set breakpoints at
functions. Second, it can be hard to print values because many Cy-
clone types get translated to void *. Third, we do not yet have source
correlation, so if you step through code, you’re stepping through C
code, not Cyclone code.
To improve this situation somehwat, you can compile your files with
the option --lineno. This will insert line directives in the gener-
ated C code that refer to the original Cyclone code. This will allow
you to step through the program and view the Cyclone source rather
than the generated C. However, doing this has two drawbacks. First,
it may occlude some operation in the generated C code that is caus-
ing your bug. Second, compilation with --lineno is significantly
120
slower than without. Finally, the result is not bug-free; sometimes
the debugger will fall behind the actual program point and print the
wrong source lines; we hope to fix this problem soon.
Two more hints: First, on some architectures, the first memory alloca-
tion appears to seg fault in GC_findlimit. This is correct and doc-
umented garbage-collector behavior (it handles the signal but gdb
doesn’t know that); simply continue execution. Second, a common
use of gdb is to find the location of an uncaught exception. To do
this, set a breakpoint at throw (a function in the Cyclone runtime).
Can I use gprof on the output? Yes, just use the -pg flag. You should
also rebuild the Cyclone libraries and the garbage collector with the
-pg flag. The results of gprof make sense because a Cyclone func-
tion is compiled to a C function.
Notes for Cygwin users: First, the versions of libgmon.a we have
downloaded from cygnus are wrong (every call gets counted as a
self-call). We have modified libgmon.a to fix this bug, so download
our version and put it in your cygwin/lib directory. Second, tim-
ing information should be ignored because gprof is only sampling
100 or 1000 times a second (because it is launching threads instead
of using native Windows profiling). Neither of these problems are
Cyclone-specific.
Is there an Emacs mode for Cyclone? Sort of. In the doc/ directory of
the distribution you will find a font-lock.el file and elisp code
(in cyclone_dot_emacs.el) suitable for inclusion in your .emacs
file. However, these files change C++ mode and use it for Cyclone
rather than creating a new Cyclone mode. Of course, we intend to
make our own mode rather than destroy C++-mode’s ability to be
good for C++. Note that we have not changed the C++ indentation
rules at all; our elisp code is useful only for syntax highlighting.
121
ation, but we have reserved the keywords codegen, splice, cut,
and fill in case we get a chance to add it.
What platforms are supported? You need a platform that has gcc 2.9, GNU
make, ar, sed, either bash or ksh, and the ability to build the Boehm-
Demers-Weiser garbage collector. Furthermore, the size of int and
all C pointers must be the same. We have actively develop Cyclone
on cygwin (Windows 98, NT, 2K), and Linux. We have code for ver-
sions on Solaris, OpenBSD, FreeBSD, and Mac OS X. The platform-
specific parts of these non-development distributions, particularly
system call interfaces, may not be correct. We are in the process of
developing a tool to automatically generate system-dependent code
that should be part of future releases.
Why aren’t there more libraries? We are eager to have a wider code base,
but we are compiler writers with limited resources. Let us know of
useful code you write.
If Cyclone is safe, why does my program crash? There are certain classes
of errors that Cyclone does not attempt to prevent. Two examples are
stack overflow and various numeric traps, such as division-by-zero.
It is also possible to run out of memory. Other crashes could be due
to compiler bugs or linking against buggy C code (or linking incor-
rectly against C code).
Note that when using gdb, it may appear there is a seg fault in GC -
findlimit(). This behavior is correct; simply continue execution.
122
What are compile-time constants? Compile-time constants are NULL, in-
teger and character constants, and arithmetic operations over compile-
time constants. Constructs requiring compile-time constants are: tuple-
subscript (e.g., x[3] for tuple x), case argument for switch "C"
argument has a numeric type (e.g., case 3+4:), sizes in array dec-
larations (e.g., int y[37], and sizes in pointer bounds (e.g., int
* x{124}). Unlike in C, sizeof(t) is not an integral constant ex-
pression because the Cyclone compiler does not know the actual size
of aggregate types.
How can I get the size of an array? If expr is an array, then expr.size re-
turns the array’s size. Note that for ? types, the size is retrieved at
runtime from the object. For other array types, the size is determined
at compile-time.
C Libraries
C.1 C Libraries
Cyclone provides partial support for the following C library headers:
#include <cXXX.h>
using Std;
123
C.2 <array.h>
Defines namespace Array, implementing utility functions on arrays.
void qsort(cmpfn_t<‘a, ‘r, ‘r>, ‘a ?‘r x, int len);
qsort(cmp,x,len) sorts the first len elements of array x into as-
cending order (according to the comparison function cmp) by the Quick-
Sort algorithm. cmp(a,b) should return a number less than, equal to,
or greater than 0 according to whether a is less than, equal to, or greater
than b. qsort throws Core::InvalidArg("Array::qsort") if
len is negative or specifies a segment outside the bounds of x.
qsort is not a stable sort.
void msort(cmpfn_t<‘a, , >, ‘a ?‘H x, int len);
msort(cmp,x,len) sorts the first len elements of array x into as-
cending order (according to the comparison function cmp), by the Merge-
Sort algorithm. msort throws Core::InvalidArg("Array::msort")
if len is negative or specifies a segment outside the bounds of x.
msort is a stable sort.
‘a ?from_list(List::list_t<‘a> l);
from_list(l) returns a heap-allocated array with the same elements
as the list l.
List::list_t<‘a> to_list(‘a ?x);
to_list(x) returns a new heap-allocated list with the same elements
as the array x.
‘a ?copy(‘a ?x);
copy(x) returns a fresh copy of array x, allocated on the heap.
124
void imp_map(‘a(@‘H f)(‘a), ‘a ?x);
imp_map(f,x) replaces each element xi of x with f(xi).
125
void app2_c(‘d(@‘H f)(‘a, ‘b, ‘c), ‘a env, ‘b ?x, ‘c ?y);
app2_c is a version of app where the function argument requires a clo-
sure as its first argument.
‘a ?rev_copy(‘a ?x);
rev_copy(x) returns a new heap-allocated array whose elements are
the elements of x in reverse order.
void imp_rev(‘a ?x);
imp_rev(x) reverses the elements of array x.
126
bool forall_c(bool (@‘H pred)(‘a, ‘b), ‘a env, ‘b ?x);
forall_c is a version of forall where the predicate argument re-
quires a closure as its first argument.
bool exists(bool (@‘H pred)(‘a), ‘a ?x);
exists(pred,x) returns true if pred returns true when applied to
some element of x, and returns false otherwise.
bool exists_c(bool (@‘H pred)(‘a, ‘b), ‘a env, ‘b ?x);
exists_c is a version of exists where the predicate argument re-
quires a closure as its first argument.
$(‘a, ‘b)?zip(‘a ?‘r1 x, ‘b ?y);
If x has elements x1 through xn, and y has elements y1 through yn,
then zip(x,y) returns a new heap-allocated array with elements $(x1,y1)
through $(xn,yn). If x and y don’t have the same number of ele-
ments, Array_mismatch is thrown.
$(‘a ?, ‘b ?)split($(‘a, ‘b)?x);
If x has elements $(a1,b1) through $(an,bn), then split(x) re-
turns a pair of new heap-allocated arrays with elements a1 through
an, and b1 through bn.
bool memq(‘a ?l, ‘a x);
memq(l,x) returns true if x is == an element of array l, and returns
false otherwise.
bool mem(int(@‘H cmp)(‘a, ‘a), ‘a ?l, ‘a x);
mem(cmp,l,x) is like memq(l,x) except that the comparison func-
tion cmp is used to determine if x is an element of l. cmp(a,b) should
return 0 if a is equal to b, and return a non-zero number otherwise.
‘a ?extract(‘a ?x, int start, int *len_opt);
extract(x,start,len_opt) returns a new array whose elements
are the elements of x beginning at index start, and continuing to the
end of x if len_opt is NULL; if len_opt points to an integer n, then
n elements are extracted. If n<0 or there are less than n elements in x
starting at start, then Core::InvalidArg("Array::extract")
is thrown.
127
C.3 <bitvec.h>
Defines namespace Bitvec, which implements bit vectors. Bit vectors are
useful for representing sets of numbers from 0 to n, where n is not too
large.
typedef int ?‘r bitvec_t<‘r>;
bitvec_t is the type of bit vectors.
bitvec_t new_empty(int);
new_empty(n) returns a bit vector containing n bits, all set to 0.
bitvec_t new_full(int);
new_full(n) returns a bit vector containing n bits, all set to 1.
bitvec_t new_copy(bitvec_t );
new_copy(v) returns a copy of bit vector v.
void set_all(bitvec_t );
set_all(v) sets every bit in v to 1.
128
bool all_set(bitvec_t bvec, int sz);
all_set(v) returns true if every bit in v is set to 1, and returns false
otherwise.
void union_two(bitvec_t dest, bitvec_t src1, bitvec_t src2);
union_two(dest,src1,src2) sets dest to be the union of src1
and src2: a bit of dest is 1 if either of the corresponding bits of src1
or src2 is 1, and is 0 otherwise.
C.4 <buffer.h>
Defines namespace Buffer, which implements extensible character arrays.
It was ported from Objective Caml.
typedef struct t @T ;
T is the type of buffers.
129
mstring_t contents(T );
contents(b) heap allocates and returns a string whose contents are
the contents of buffer b.
size_t length(T );
length(b) returns the number of characters in buffer b.
void clear(T );
clear(b) makes b have zero characters. Internal storage is not re-
leased.
void reset(T );
reset(b) sets the number of characters in b to zero, and sets the in-
ternal storage to the initial string. This means that any storage used to
grow the buffer since the last create or reset can be reclaimed by the
garbage collector.
130
C.5 <core.h>
The file <core.h> defines some types and functions outside of any names-
pace, and also defines a namespace Core. These declarations are made
outside of any namespace.
typedef const unsigned char ?‘r string_t<‘r>;
A string_t<‘r> is a constant array of characters allocated in region
‘r.
typedef unsigned char ?‘r mstring_t<‘r>;
An mstring_t<‘r> is a non-const (mutable) array of characters allo-
cated in region ‘r.
C.6 <dict.h>
Defines namespace Dict, which implements dictionaries: mappings from
keys to values. The dictionaries are implemented functionally: adding
a mapping to an existing dictionary produces a new dictionary, without
affecting the existing dictionary. To enable an efficient implementation,
you are required to provide a total order on keys (a comparison function).
We follow the conventions of the Objective Caml Dict library as much
as possible.
Namespace Dict implements a superset of namespace SlowDict, except
that delete_present is not supported.
131
typedef struct Dict<‘a, ‘b, ‘r> @‘r dict_t<‘a,‘b,‘r>;
A value of type dict_t<‘a,‘b,‘r> is a dictionary that maps keys of
type ‘a to values of type ‘b; the dictionary datatypes live in region ‘r.
xtunion exn {
Present
};
Present is thrown when a key is present but not expected.
xtunion exn {
Absent
};
Absent is thrown when a key is absent but should be present.
132
dict_t<‘a, ‘b, ‘r> inserts(dict_t<‘a, ‘b, ‘r> d, list_t<$(‘a, ‘b)@> l)
inserts(d,l) inserts each key, value pair into d, returning the result-
ing dictionary.
133
void app(‘c(@‘H f)(‘a, ‘b), dict_t<‘a, ‘b> d);
app(f,d) applies f to every key/value pair in d; the results of the
applications are discarded. Note that f cannot return void.
134
dict_t<‘a, ‘c> map(‘c(@‘H f)(‘b), dict_t<‘a, ‘b> d);
map(f,d) applies f to each value in d, and returns a new dictionary
with the results as values: for every binding of a key k to a value v in
d, the result binds k to f(v). The returned dictionary is allocated on
the heap.
135
bool forall_c(bool (@‘H f)(‘c, ‘a, ‘b), ‘c env, dict_t<‘a, ‘b> d);
forall_c(f,env,d) returns true if f(env,k,v) returns true for ev-
ery key k and associated value v in d, and returns false otherwise.
bool forall_intersect(bool (@‘H f)(‘a, ‘b, ‘b), dict_t<‘a, ‘b> d1, dic
forall_intersect(f,d1,d2) returns true if f(k,v1,v2) returns
true for every key k appearing in both d1 and d2, where v1 is the value
of k in d1, and v2 is the value of k in d2; and it returns false otherwise.
dict_t<‘a, ‘b, ‘r> rfilter(‘r, bool (@‘H f)(‘a, ‘b), dict_t<‘a, ‘b> d)
rfilter(r,f,d) is like filter(f,d), except that the resulting dic-
tionary is allocated in the region with handle r.
136
dict_t<‘a, ‘b, ‘r> rfilter_c(‘r, bool (@‘H f)(‘c, ‘a, ‘b), ‘c env, dic
rfilter_c(r,f,env,d) is like filter_c(f,env,d), except that
the resulting dictionary is allocated in the region with handle r.
C.7 <filename.h>
Defines a namespace Filename, which implements some useful operations
on file names that are represented as strings.
mstring_t concat(string_t , string_t );
Assuming that s1 is a directory name and s2 is a file name, concat(s1,s2)
returns a new (heap-allocated) file name for the child s2 of directory
s1.
137
mstring_t chop_extension(string_t );
chop_extension(s) returns a copy of s with any file extension re-
moved. A file extension is a period (‘.’) followed by a sequence of
non-period characters. If s does not have a file extension, chop_-
extension(s) throws Core::Invalid_argument("chop_extension").
mstring_t dirname(string_t );
dirname(s) returns the directory part of s. For example, if s is "foo/bar/baz",
dirname(s) returns "foo/bar".
mstring_t basename(string_t );
basename(s) returns the file part of s. For example, if s is "foo/bar/baz",
basename(s) returns "baz".
bool check_suffix(string_t , string_t );
check_suffix(filename,suffix) returns true if filename ends
in suffix, and returns false otherwise.
mstring_t gnuify(string_t );
gnuify(s) forces s to follow Unix file name conventions: any Win-
dows separator characters (backslashes) are converted to Unix separa-
tor characters (forward slashes).
C.8 <fn.h>
Defines namespace Fn, which implements closures: a way to package up
a function with some hidden data (an environment). Many of the library
functions taking function arguments have versions for functions that re-
quire an explicit environment; the closures of namespace Fn are different,
they combine the function and environment, and the environment is hid-
den. They are useful when two functions need environments of different
type, but you need them to have the same type; you can do this by hiding
the environment from the type of the pair.
typedef tunion
A value of type fn_t<‘arg,‘res,‘eff> is a function and its closure;
‘arg is the argument type of the function, ‘res is the result type, and
‘eff is the effect.
138
fn_t<‘arg, ‘res, ‘e1> make_fn(‘res(@‘H f)(‘env, ‘arg;‘e1+{}), ‘env x;‘
make_fn(f,env) builds a closure out of a function and an environ-
ment.
fn_t<‘arg, ‘res, ‘e1> fp2fn(‘res(@‘H f)(‘arg;‘e1+{}));
fp2fn(f) converts a function pointer to a closure.
fn_t<‘a, fn_t<‘b, ‘c, ‘e1>, > curry(fn_t<$(‘a, ‘b)@‘H, ‘c, ‘e1> f);
curry(f) curries a closure that takes a pair as argument: if x points to
a pair $(x1,x2), then apply(f,x) has the same effect as apply(apply(curry(f),x1),
fn_t<$(‘a, ‘b)@, ‘c, > uncurry(fn_t<‘a, fn_t<‘b, ‘c, ‘e1>, ‘e2> f);
uncurry(f) converts a closure that takes two arguments in sequence
into a closure that takes the two arguments as a pair: if x points to a
pair $(x1,x2), then apply(uncurry(f),x) has the same effect as
apply(apply(f,x1),x2).
C.9 <hashtable.h>
Defines namespace Hashtable, which implements mappings from keys to
values. These hashtables are imperative—values are added and deleted
destructively. (Use namespace Dict or SlowDict if you need functional
(non-destructive) mappings.) To enable an efficient implementation, you
are required to provide a total order on keys (a comparison function).
139
typedef struct Table<‘a, ‘b> @table_t<‘a,‘b>;
A table_t<‘a,‘b> is a hash table with keys of type ‘a and values
of type ‘b.
table_t<‘a, ‘b> create(int sz, int(@‘H cmp)(‘a, ‘a), int(@‘H hash)(‘a)
create(sz,cmp,hash) returns a new hash table that starts out with
sz buckets. cmp should be a comparison function on keys: cmp(k1,k2)
should return a number less than, equal to, or greater than 0 according
to whether k1 is less than, equal to, or greater than k2. hash should
be a hash function on keys. cmp and hash should satisfy the following
property: if cmp(k1,k2) is 0, then hash(k1) must equal hash(k2).
140
void iter(void(@‘H f)(‘a, ‘b), table_t<‘a, ‘b> t);
iter(f,t) applies f to each key/value pair in t.
C.10 <list.h>
Defines namespace List, which implements generic lists and various op-
erations over them, following the conventions of the Objective Caml list
library as much as possible.
struct List<‘a,‘r> {
‘a hd;
struct List<‘a, ‘r> *‘r tl;
};
A struct List is a memory cell with a head field containing an ele-
ment and a tail field that points to the rest of the list. Such a structure is
traditionally called a cons cell. Note that every element of the list must
have the same type ‘a, and every cons cell in the list must be allocated
in the same region ‘r.
typedef struct List<‘a, ‘r> *‘r list_t<‘a,‘r>;
A list_t is a possibly-NULL pointer to a struct List. Most of the
functions in namespace List operate on values of type list_t rather
than struct List. Note that a list_t can be empty (NULL) but a
struct List cannot.
typedef struct List<‘a, ‘r> @List_t<‘a,‘r>;
A List_t is a non-NULL pointer to a struct List. This is used
much less often than list_t, however it may be useful when you
want to emphasize that a list has at least one element.
list_t<‘a> list(...‘a);
list(x1,...,xn) builds a heap-allocated list with elements x1 through
xn.
141
list_t<‘a, ‘r> rlist(‘r, ...‘a);
rlist(r, x1,...,xn) builds a list with elements x1 through xn,
allocated in the region with handle r.
‘a hd(list_t<‘a> x);
hd(x) returns the first element of list x, if there is one, and throws
Failure("hd") if x is NULL.
list_t<‘a, ‘r> tl(list_t<‘a, ‘r> x);
tl(x) returns the tail of list x, if there is one, and throws Failure("tl")
if x is NULL.
list_t<‘a> copy(list_t<‘a> x);
copy(x) returns a new heap-allocated copy of list x.
142
xtunion exn {
List_mismatch
};
List_mismatch is thrown when two lists don’t have the same length.
143
void iter_c(void(@‘H f)(‘b, ‘a), ‘b env, list_t<‘a> x);
iter_c is a version of iter where the function argument requires a
closure as its first argument.
144
list_t<‘a> rev(list_t<‘a> x);
rev(x) returns a new heap-allocated list whose elements are the ele-
ments of x in reverse.
list_t<‘a, ‘r> rrev(‘r, list_t<‘a> x);
rrev(r,x) is like rev(x), except that the result is allocated in the
region with handle r.
145
list_t<‘a, ‘r> rflatten(‘r, list_t<list_t<‘a, ‘r>> x);
rflatten(r,x) is like flatten(x), except that the result is allo-
cated in the region with handle r, and each element of x must be allo-
cated in r.
list_t<‘a> merge_sort(int(@‘H cmp)(‘a, ‘a), list_t<‘a> x);
merge_sort(cmp,x) returns a new heap-allocated list whose ele-
ments are the elements of x in ascending order (according to the com-
parison function cmp), by the MergeSort algorithm.
xtunion exn {
Nth
};
Nth is thrown when nth doesn’t have enough elements in the list.
146
‘a nth(list_t<‘a> x, int n);
If x has elements x0 through xm, and 0<=n<=m, then nth(x,n) re-
turns xn. If n is out of range, Nth is thrown. Note that the indexing is
0-based.
list_t<‘a, ‘r> nth_tail(list_t<‘a, ‘r> x, int i);
If x has elements x0 through xm, and 0<=n<=m, then nth(x,n) re-
turns the list with elements xn through xm. If n is out of range, Nth is
thrown.
bool forall(bool (@‘H pred)(‘a), list_t<‘a> x);
forall(pred,x) returns true if pred returns true when applied to
every element of x, and returns false otherwise.
147
$(list_t<‘a>, list_t<‘b>)split(list_t<$(‘a, ‘b)@> x);
If x has elements &$(a1,b1) through &$(an,bn), then split(x)
returns a pair of new heap-allocated arrays with elements a1 through
an, and b1 through bn.
148
‘b assoc_cmp(int(@‘H cmp)(‘a, ‘c), list_t<$(‘a, ‘b)@> l, ‘c x);
assoc_cmp(cmp,l,k) is like assoc(l,k) except that the compari-
son function cmp is used to decide if k is a key in l. cmp should return
0 if two keys are equal, and non-zero otherwise.
149
bool list_prefix(int(@‘H cmp)(‘a, ‘a), list_t<‘a> l1, list_t<‘a> l2);
list_prefix(cmp,l1,l2) returns true if l1 is a prefix of l2, using
cmp to compare the elements of l1 and l2.
C.11 <pp.h>
Defines a namespace PP that has functions for implementing pretty print-
ers. Internally, PP is an implementation of Kamin’s version of Wadler’s
pretty printing combinators, with some extensions for doing hyperlinks
in Tk text widgets.
All of the internal data structures used by PP are allocated on the heap.
typedef struct Doc @doc_t ;
A value of type doc_t is a “document” that can be combined with
other documents, formatted at different widths, converted to strings
or files.
void file_of_doc(doc_t d, int w, FILE @f);
file_of_doc(d,w,f) formats d to width w, and prints the formatted
output to f.
150
string_t string_of_doc(doc_t d, int w);
string_of_doc(d,w) formats d to width w, and returns the format-
ted output in a heap-allocated string.
doc_t nil_doc();
nil_doc() returns an empty document.
doc_t blank_doc();
blank_doc() returns a document consisting of a single space charac-
ter.
doc_t line_doc();
line_doc() returns a document consisting of a single line break.
doc_t oline_doc();
oline_doc() returns a document consisting of an optional line break;
when the document is formatted, the pretty printer will decide whether
to break the line.
doc_t text(string_t<> s);
text(s) returns a document containing exactly the string s.
151
doc_t hyperlink(string_t<> shrt, string_t<> full);
hyperlink(shrt,full) returns a document that will be formatted
as the string shrt linked to the string full.
doc_t cat(...doc_t );
cat(d1, d2, ..., dn) returns a document consisting of document
d1 followed by d2, and so on up to dn.
152
doc_t seql(string_t<> sep, list_t<doc_t , > l0);
seql is like seq, except that the resulting document has line breaks
after each separator.
C.12 <queue.h>
Defines namespace Queue, which implements generic imperative queues
and various operations following the conventions of the Objective Caml
queue library as much as possible.
typedef struct Queue<‘a, ‘r> @‘r queue_t<‘a,‘r>;
A value of type queue_t<‘a,‘r> is a first-in, first-out queue of el-
ements of type ‘a; the queue data structures are allocated in region
‘r.
bool is_empty(queue_t );
is_empty(q) returns true if q contains no elements, and returns false
otherwise.
queue_t create();
create() allocates a new, empty queue on the heap and returns it.
153
void add(queue_t<‘a, >, ‘a x);
add(q,x) adds x to the end of q (by side effect).
xtunion exn {
Empty
};
Empty is an exception raised by take and peek.
‘a take(queue_t<‘a>);
take(q) removes the element from the front on q and returns it; if q
is empty, exception Empty is thrown.
‘a peek(queue_t<‘a>);
peek(q) returns the element at the front of q, without removing it
from q. If q is empty, exception Empty is thrown.
void clear(queue_t<‘a>);
clear(q) removes all elements from q.
int length(queue_t<‘a>);
length(q) returns the number of elements in q.
154
C.13 <rope.h>
Defines namespace Rope, which implements character arrays that can be
concatenated in constant time.
typedef struct Rope_node @rope_t ;
A value of type rope_t is a character array that can be efficiently con-
catenated.
rope_t from_string(string_t<>);
from_string(s) returns a rope that has the same characters as string
s. Note that s must be heap-allocated.
mstring_t to_string(rope_t );
to_string(r) returns a new, heap-allocated string with the same
characters as rope r.
155
C.14 <set.h>
Defines namespace Set, which implements polymorphic, functional, finite
sets over elements with a total order, following the conventions of the Ob-
jective Caml set library as much as possible.
typedef struct Set<‘a, ‘r> @‘r set_t<‘a,‘r>;
A value of type set_t<‘a,‘r> is a set with elements of type ‘a. The
data structures used to implement the set (not the elements of the set!)
are in region ‘r.
The set creation functions require a comparison function as an argument.
The comparison function should return a number less than, equal to, or
greater than 0 according to whether its first argument is less than, equal
to, or greater than its second argument.
set_t<‘a> empty(int(@‘H cmp)(‘a, ‘a));
empty(cmp) creates an empty set given comparison function cmp.
The set is heap-allocated.
set_t<‘a, ‘r> rempty(‘r r, int(@‘H cmp)(‘a, ‘a));
rempty(r,cmp) creates an empty set in the region with handle r.
156
set_t<‘a> union_two(set_t<‘a, > s1, set_t<‘a, > s2);
union_two(s1,s2) returns a set whose elements are the union of the
elements of s1 and s2. (We use the name union_two because union
is a keyword in Cyclone.)
157
bool equals(set_t<‘a> s1, set_t<‘a> s2);
equals(s1,s2) returns true if s1 equals s2 have the same elements,
and returns false otherwise.
list_t<‘a, ‘r> elements(set_t<‘a, ‘r> s);
elements(s) returns a list of the elements of s, in no particular order.
Note that the returned list is allocated in the same region as the set s.
‘a choose(set_t<‘a> s);
choose(s) returns some element of the set s; if the set is empty,
choose throws Absent.
158
C.15 <slowdict.h>
Defines namespace SlowDict, which implements polymorphic, functional,
finite maps whose domain must have a total order. We follow the conven-
tions of the Objective Caml Dict library as much as possible.
The basic functionality is the same as Dict, except that SlowDict sup-
ports delete_present; but region support still needs to be added, and
some functions are missing, as well.
typedef struct Dict<‘a, ‘b> @dict_t<‘a,‘b>;
A value of type dict_t<‘a,‘b> is a dictionary that maps keys of type
‘a to values of type ‘b.
xtunion exn {
Present
};
Present is thrown when a key is present but not expected.
xtunion exn {
Absent
};
Absent is thrown when a key is absent but should be present.
159
dict_t<‘a, ‘b> insert_new(dict_t<‘a, ‘b> d, ‘a k, ‘b v);
insert_new(d,k,v) is like insert(d,k,v), except that it throws
Present if k is already mapped to some value in d.
160
void app(‘c(@‘H f)(‘a, ‘b), dict_t<‘a, ‘b> d);
app(f,d) applies f to every key/value pair in d; the results of the
applications are discarded. Note that f cannot return void.
C.16 <xarray.h>
Defines namespace Xarray, which implements a datatype of extensible ar-
rays.
161
typedef struct Xarray<‘a> @xarray_t<‘a>;
An xarray_t is an extensible array.
int length(xarray_t<‘a>);
length(a) returns the length of extensible array a.
‘a get(xarray_t<‘a>, int);
get(a,n) returns the nth element of a, or throws Invalid_argument
if n is out of range.
‘a ?to_array(xarray_t<‘a>);
to_array(a) returns a normal (non-extensible) array with the same
elements as a.
xarray_t<‘a> from_array(‘a ?arr);
from_array(a) returns an extensible array with the same elements
as the normal (non-extensible) array a.
162
xarray_t<‘a> append(xarray_t<‘a>, xarray_t<‘a>);
append(a1,a2) returns a new extensible array whose elements are
the elements of a1 followed by a2. The inputs a1 and a2 are not mod-
ified.
void app(‘b(@‘H f)(‘a), xarray_t<‘a>);
app(f,a) applies f to each element of a, in order from lowest to high-
est. Note that f returns ‘a, unlike with iter.
void app_c(‘b(@‘H f)(‘c, ‘a), ‘c, xarray_t<‘a>);
app_c(f,e,a) applies f to e and each element of a, in order from
lowest to highest.
void iter(void(@‘H f)(‘a), xarray_t<‘a>);
iter(f,a) applies f to each element of a, in order from lowest to
highest. Note that f returns void, unlike with app.
void iter_c(void(@‘H f)(‘b, ‘a), ‘b, xarray_t<‘a>);
iter_c(f,e,a) applies f to e and each element of a, in order from
lowest to highest.
xarray_t<‘b> map(‘b(@‘H f)(‘a), xarray_t<‘a>);
map(f,a) returns a new extensible array whose elements are obtained
by applying f to each element of a.
xarray_t<‘b> map_c(‘b(@‘H f)(‘c, ‘a), ‘c, xarray_t<‘a>);
map_c(f,e,a) returns a new extensible array whose elements are ob-
tained by applying f to e and each element of a.
void reuse(xarray_t<‘a> xarr);
reuse(a) sets the number of elements of a to zero, but does not free
the underlying array.
void delete(xarray_t<‘a> xarr, int num);
delete(a,n) deletes the last n elements of a.
void remove(xarray_t<‘a> xarr, int i);
remove(a,i) removes the element at position i from a; elements at
positions greater than i are moved down one position.
163
D Grammar
The grammar of Cyclone is derived from ISO C99. It has the following ad-
ditional keywords: abstract, catch, codegen, cut, fallthru, fill,
let, malloc, namespace, new, NULL, region_t, regions, rmalloc,
rnew, splice, throw, try, tunion, using, xtunion. As in gcc, __-
attribute__ is reserved as well.
The non-terminals character-constant, floating-constant, identifier, integer-
constant, string, type-var, and typedef-name are defined lexically as in C.
The start symbol is translation-unit.
translation-unit:
(empty)
external-declaration translation-unitopt
using identifier ; translation-unit
namespace identifier ; translation-unit
using identifier { translation-unit } translation-unit
namespace identifier { translation-unit } translation-unit
extern string { translation-unit } translation-unit
external-declaration:
function-definition
declaration
function-definition:
declaration-specifiersopt declarator
declaration-listopt compound-statement
declaration:
declaration-specifiers init-declarator-listopt ;
let pattern = expression ;
let identifier-list ;
declaration-list:
declaration
declaration-list declaration
declaration-specifiers:
storage-class-specifier declaration-specifiersopt
164
type-specifier declaration-specifiersopt
type-qualifier declaration-specifiersopt
function-specifier declaration-specifiersopt
storage-class-specifier: one of
auto register static extern typedef abstract
type-specifier:
_
void
char
short
int
long
float
double
signed
unsigned
enum-specifier
struct-or-union-specifier
tunion-specifier
typedef-name type-paramsopt
type-var
type-var :: kind
$( parameter-list )
region_t < any-type-name >
kind:
identifier
typedef-name
type-qualifier: one of
const restrict volatile
enum-specifier:
enum identifier { enum-declaration-list }
enum identifier
165
enum-field:
identifier
identifier = constant-expression
enum-declaration-list:
enum-field
enum-field , enum-declaration-list
function-specifier:
inline
struct-or-union-specifier:
struct-or-union { struct-declaration-list }
struct-or-union identifier type-paramsopt { struct-declaration-list }
struct-or-union identifier type-paramsopt
type-params:
< type-name-list >
struct-or-union: one of
struct union
struct-declaration-list:
struct-declaration
struct-declaration-list struct-declaration
init-declarator-list:
init-declarator
init-declarator-list , init-declarator
init-declarator:
declarator
declarator = initializer
struct-declaration:
specifier-qualifier-list struct-declarator-list ;
specifier-qualifier-list:
type-specifier specifier-qualifier-listopt
type-qualifier specifier-qualifier-listopt
166
struct-declarator-list:
struct-declarator
struct-declarator-list , struct-declarator
struct-declarator:
declarator
declaratoropt : constant-expression
tunion-specifier:
tunion-or-xtunion identifier type-paramsopt { tunionfield-list }
tunion-or-xtunion regionopt identifier type-paramsopt
tunion-or-xtunion identifier . identifier type-paramsopt
tunion-or-xtunion: one of
tunion xtunion
tunionfield-list:
tunionfield
tunionfield ;
tunionfield , tunionfield-list
tunionfield ; tunionfield-list
tunionfield-scope: one of
extern static
tunionfield:
tunionfield-scope identifier
tunionfield-scope identifier type-paramsopt ( parameter-list )
declarator:
pointeropt direct-declarator
direct-declarator:
identifier
( declarator )
direct-declarator [ assignment-expressionopt ]
direct-declarator ( parameter-type-list )
direct-declarator ( ; effect-set )
direct-declarator ( identifier-listopt )
direct-declarator < type-name-list >
167
pointer:
* rangeopt regionopt type-qualifier-listopt pointeropt
@ rangeopt regionopt type-qualifier-listopt pointeropt
? regionopt type-qualifier-listopt pointeropt
range:
{ assignment-expression }
region:
_
’H
type-var
type-var :: kind
type-qualifier-list:
type-qualifier
type-qualifier-list type-qualifier
parameter-type-list:
parameter-list
parameter-list , ...
optional-effect:
(empty)
; effect-set
optional-inject:
(empty)
identifier
effect-set:
atomic-effect
atomic-effect + effect-set
atomic-effect:
{}
{ region-set }
type-var
type-var :: kind
168
region-set:
type-var
type-var , region-set
type-var :: kind
type-var :: kind , region-set
parameter-list:
parameter-declaration
parameter-list , parameter-declaration
parameter-declaration:
specifier-qualifier-list declarator
specifier-qualifier-list abstract-declaratoropt
identifier-list:
identifier
identifier-list , identifier
initializer:
assignment-expression
array-initializer
array-initializer:
{ initializer-listopt }
{ initializer-list , }
{ for identifier < expression : expression }
initializer-list:
designationopt initializer
initializer-list , designationopt initializer
designation:
designator-list =
designator-list:
designator
designator-list designator
designator:
[ constant-expression ]
. identifier
169
type-name:
specifier-qualifier-list abstract-declaratoropt
any-type-name:
type-name
{}
{ region-set }
any-type-name + atomic-effect
type-name-list:
type-name
type-name-list , type-name
abstract-declarator:
pointer
pointeropt direct-abstract-declarator
direct-abstract-declarator:
( abstract-declarator )
direct-abstract-declaratoropt [ assignment-expressionopt ]
direct-abstract-declaratoropt ( parameter-type-listopt )
direct-abstract-declaratoropt ( ; effect-set )
direct-abstract-declaratoropt [ ? ]
direct-abstract-declarator < type-name-list >
statement:
labeled-statement
expression-statement
compound-statement
selection-statement
iteration-statement
jump-statement
region identifier statement
region < type-var > identifier statement
cut statement
splice statement
labeled-statement:
identifier : statement
170
expression-statement:
expressionopt ;
compound-statement:
{ block-item-listopt }
block-item-list:
block-item
block-item block-item-list
block-item:
declaration
statement
selection-statement:
if ( expression ) statement
if ( expression ) statement else statement
switch ( expression ) { switch-clauses }
try statement catch { switch-clauses }
switch-clauses:
(empty)
default : block-item-list
case pattern : block-item-listopt switch-clauses
case pattern && expression : block-item-listopt switch-clauses
iteration-statement:
while ( expression ) statement
do statement while ( expression ) ;
for ( expressionopt ; expressionopt ; expressionopt ) statement
for ( declaration expressionopt ; expressionopt ) statement
jump-statement:
goto identifier ;
continue ;
break ;
return ;
return expression ;
fallthru ;
fallthru ( argument-expression-listopt ) ;
171
pattern:
_
( pattern )
integer-constant
- integer-constant
floating-constant
character-constant
NULL
identifier
identifier type-paramsopt ( tuple-pattern-list )
$( tuple-pattern-list )
identifier type-paramsopt { }
identifier type-paramsopt { field-pattern-list }
& pattern
* identifier
tuple-pattern-list:
(empty)
pattern
tuple-pattern-list , pattern
field-pattern:
pattern
designation pattern
field-pattern-list:
field-pattern
field-pattern-list , field-pattern
expression:
assignment-expression
expression , assignment-expression
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
assignment-operator: one of
= *= /= %= += -= <<= >>= &= ˆ= |=
172
conditional-expression:
logical-or-expression
logical-or-expression ? expression : conditional-expression
throw conditional-expression
new array-initializer
new logical-or-expression
rnew ( expression ) array-initializer
rnew ( expression ) logical-or-expression
constant-expression:
conditional-expression
logical-or-expression:
logical-and-expression
logical-or-expression || logical-and-expression
logical-and-expression:
inclusive-or-expression
logical-and-expression && inclusive-or-expression
inclusive-or-expression:
exclusive-or-expression
inclusive-or-expression | exclusive-or-expression
exclusive-or-expression:
and-expression
exclusive-or-expression ˆ and-expression
and-expression:
equality-expression
and-expression & equality-expression
equality-expression:
relational-expression
equality-expression == relational-expression
equality-expression != relational-expression
relational-expression:
shift-expression
173
relational-expression < shift-expression
relational-expression > shift-expression
relational-expression <= shift-expression
relational-expression >= shift-expression
shift-expression:
additive-expression
shift-expression << additive-expression
shift-expression >> additive-expression
additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression - multiplicative-expression
multiplicative-expression:
cast-expression multiplicative-expression * cast-expression
multiplicative-expression / cast-expression
multiplicative-expression % cast-expression
cast-expression:
unary-expression
( type-name ) cast-expression
unary-expression:
postfix-expression
++ unary-expression
-- unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
expression . size
unary-operator: one of
& * + - ˜ !
postfix-expression:
primary-expression
postfix-expression [ expression ]
174
postfix-expression ( )
postfix-expression ( argument-expression-list )
postfix-expression . identifier
postfix-expression -> identifier
postfix-expression ++
postfix-expression --
( type-name ) { initializer-list }
( type-name ) { initializer-list , }
fill ( expression )
codegen ( function-definition )
primary-expression:
identifier
constant
string
( expression )
identifier <>
identifier @ < type-name-list >
$( argument-expression-list )
identifier { initializer-list }
( { block-item-list } )
argument-expression-list:
assignment-expression
argument-expression-list , assignment-expression
constant:
integer-constant
character-constant
floating-constant
NULL
E Installing Cyclone
Cyclone currently only runs on 32-bit machines, and has only been tested
on Win32 (Cygnus) and Linux (Red Hat 6.2) platforms. Other platforms
might or might not work. Right now, there are a few 32-bit dependencies
175
in the compiler, so the system will probably not work on a 64-bit machine
without some changes.
To install and use Cyclone, you’ll need to use the Gnu utilities, includ-
ing GCC (the Gnu C compiler) and Gnu-Make. For Win32, you should first
install the latest version of the Cygwin utilities to do the build, and make
sure that the Cygwin bin directory is on your path. We use some features
of GCC extensively, so Cyclone definitely will not build with another C
compiler.
Cyclone is distributed as a compressed archive (a .tar.gz file). Unpack
the distribution into a directory; if you are installing Cyclone on a Win-
dows system, we suggest you choose c:/cyclone.
From here, follow the instructions in the INSTALL file included in the
distribution.
F Tools
F.1 The compiler
General options
The Cyclone compiler has the following command-line options:
-Bdir Add dir to the list of directories to search for special compiler files.
-Idir Add dir to the list of directories to search for include files.
176
-c Produce an object (.o) file instead of an executable; do not link.
-s Remove all symbol table and relocation information from the executable.
-O Optimize.
-MT file Make file be the target of any dependencies generated using the
-M flag.
Developer options
In addition, the compiler has some options that are primarily of use to its
developers:
-g Compile for debugging. This is currently only useful for compiler de-
velopers, as the debugging information reflects the C code that the
Cyclone code is compiled to, and not the Cyclone code itself.
177
-ic Activate the link-checker.
-nocyc Don’t add the implicit namespace Cyc to variable names in the C
output.
1. Compile the program with the flag -pa. The resulting executable
will be compiled to record allocation behavior. It will also be linked
with a version of the standard library that records its allocation be-
havior. (If you get the message, “can’t find internal compiler file
libcyc_a.a,” then ask your system administrator to install the spe-
cial version of the library.)
178
2. Execute the program as normal. As it executes, it will write to a file
amon.out in the current working directory; if the file exists before
execution, it will be overwritten.
3. Run the program aprof. This will examine amon.out and print a
report on the allocation behavior of the program.
179