A compiler for the π-Calculus: the backend
A compiler for the π-Calculus: the backend
Departamento de Matemática
the backend
Tiago Cogumbreiro
Orientador
Doutor Francisco Cipriano da Cunha Martins
i
Abstract
Our work covers the creation of the backend of the compiler for the π-
calculus. The backend consists in the translation to an abstract assembly
language. The abstract assembly language used in our compiler is MIL, a
multi-threaded typed assembly language.
MIL has the concept of types and locks. It uses the concurrency model of
shared memory, where tuples are protected by a lock and may be accessed by
multiple threads that have to acquire the exclusive right to alter the data. π-
Calculus, on the other hand, uses the concurrency model of message passing,
where processes communicate through channels passing information amongst
each other.
We start by describing the target language’s (MIL) syntax, semantics and
type discipline, in Chapter 1. Afterwards we show a few usage examples,
showing off the basic operations of MIL. Finally we present a runtime library
that implements the π-calculus communication in the MIL language.
Compilers need to make various operations over an abstract syntactic
tree. The visitor pattern is chosen to solve this problem. During our work
in the compiler we envisioned an extension to the classic Visitor pattern.
Chapter 2 explains the advantages of using the Extended Visitor pattern
versus the usual approach.
The final chapter (Chapter 3) describes the translation process. It starts
by explaining the framing step, where variables are arranged and structured
according to a certain scope. Afterwards we an overview of the translation to
MIL, followed by an architectural in depth analysis of the implementation.
ii
Contents
3 The Backend 28
3.1 Framing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
iii
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Frame Indexing . . . . . . . . . . . . . . . . . . . . . . 34
3.2.3 Type Adapting . . . . . . . . . . . . . . . . . . . . . . 35
3.2.4 Code Generation Helper . . . . . . . . . . . . . . . . . 36
4 Conclusion 38
5 Appendix 39
iv
Chapter 1
1
1.1 Architecture
MIL envisages an abstract CMP with a shared main memory. Each processor
core owns a number of registers and an instruction cache. The main memory
is divided into a heap (for storing data and code blocks) and a run pool
(for storing suspended threads). Data blocks, kept in the main memory, are
represented by tuples and are protected by a lock. Blocks of code define the
needed registers (including the type each one needs), a list of required locks,
and an instruction set. The run pool contains all the idle threads. It may
happen that there are more threads to be run than the number of processors.
1.2 Syntax
The syntax of our language is described by the grammar in Figures 1.1, 1.2,
and 1.7. We postpone the exposure of types to Section 1.4. We rely on a set
of heap labels ranged over l, and a disjoint set of type variables ranged over
by α, β.
Most of the proposed instructions, represented in Figure 1.1, are standard
in assembly languages. Instructions are organised in sequences, ending with
a jump or with a yield. The instruction yield frees the processor to execute
another thread from the thread pool.
The abstract machine is defined by the number of processors available (N)
and the number of registers (R), as depicted in Figure 1.2.
An abstract machine can be in two possible states: halted or running. A
running machine comprises a heap, a thread pool, and an array of processors.
Heaps are maps from labels into heap values that tuples or code blocks.
Tuples are vectors of values protected by some lock. Code blocks comprise a
signature and a body. The signature of a code block, which is enforced by the
type system to be a ∀[~ α].(Γ requires Λ), describes a universal operator ∀[~
α]
that abstracts types used in the signature and used in the body, a register file
Γ that describes the type of each register, and the locks hold by the processor
when jumping to this code block. The body is a sequence of instructions that
are executed by a processor.
A thread pool is a multiset of pairs, each of which contains a pointer
(i.e., a label) to a code block and a register file. A processor array contains
N processors, where each is composed of a register file, a set of locks, and a
sequence of instructions.
2
registers r ::= r1 | . . . | rR
integer values n ::= . . . | -1 | 0 | 1 | . . .
lock values b ::= -1 | 0 | 1 | . . .
values v ::= r | n | b | l | pack τ, v as τ | packL α, v as τ |
v[τ ] | ?τ
instructions ι ::=
control flow r := v | r := r + v | if r = v jump v |
memory r := malloc [~τ ] guarded by α |
r := v[n] | r[n] := v |
unpack α, r := unpack v |
lock α, r := newLock b | α := newLockLinear
r := tslE v | r := tslS v | unlockE v | unlockS v |
fork fork v
inst. sequences I ::= ι; I | jump v | yield
3
lock sets λ ::= α1 , . . . , αn
permissions Λ ::= (λ, λ, λ)
register files R ::= {r1 : v1 , . . . , rR : vR }
processor p ::= hR; Λ; Ii
processors array P ::= {1 : p1 , . . . , N : pN }
thread pool T ::= {hl1 , R1 i, . . . , hln , Rn i}
heap values h ::= hv1 . . . vn iα | τ {I}
heaps H ::= {l1 : h1 , . . . , ln : hn }
states S ::= hH; T ; P i | halt
lock is created in the exclusive lock state, the new lock variable β is added
to the set of exclusive locks held by the processor. Similarly, when the lock
is created in the shared lock state, the new lock variable β is added to the
set of shared locks held by the processor.
Linear locks are created by newLockLinear. They are initialised in the
locked state. The new lock variable β is added to the set of linear locks.
The Test and Set Lock, present in many machines designed with multiple
processes in mind, is an atomic operation that loads the contents of a word
into a register and then stores another value in that word. These two opera-
tions (the load and the store) are indivisible. There are two variations of the
Test and Set Lock in our language: tslE and tslS. When tslE is applied to
an unlocked state the type variable α is added to the set of exclusive locks
and the value becomes h-1iα . Various threads may read values from a tuple
locked in shared state, hence when tslS is applied to a shared or to an un-
locked lock the value of contained in the tuple is incremented, reflecting the
number of readers holding the shared lock, and then the type variable α is
added to the set of hold shared locks. When tslE is applied in a tuple with a
number greater than −1, it places a 0 in the target register.
Shared locks are unlocked with unlockS and the number of readers is
decremented. The running processor must hold the shared lock. Exclusive
locks are unlocked with unlockE, while the running processor holds the ex-
clusive lock.
4
∀i.P (i) = h ; ; yieldi
(R-halt)
h ; ∅; P i → halt
H(l) = ∀[ ].( requires Λ){I}
hH; T ] {hl, Ri}; P {i : h ; ; yieldi}i → hH; T ; P {i : hR; Λ; Ii}i
(R-schedule)
R̂(v) = l H(l) = ∀[ ].( requires Λ){ }
hH; T ; {i : hR; Λ ] Λ ; (fork v; I)i}i → hH; T ∪ {hl, Ri}; P {i : hR; Λ0 ; Ii}i
0
(R-fork)
5
∀[~α].(Γ requires Λ) describes a code block; a thread jumping into such a
block must instantiate all the universal variables α~ , it must also hold a reg-
ister file type Γ as well as the locks in Λ. The types lock(α), lockE(α), and
lockS(α) describe singleton types, respectively the lock type, the lock type
exclusive, and the lock type shared. The types ∃α.τ and ∃l α.τ define the
existential operator in [2]. The recursive type µα.τ allows the definition of
recursive data structures.
The type system is presented in Figures 1.8 to 1.11. Typing for values is
illustrated in Figure 1.8. Heap values are distinguished from operands (that
include registers as well) by the form of the sequent. Notice that lock values
(-1, 0, and 1) have any lock type. Also, uninitialised value ?τ has type ?τ ;
we use the same syntax for a uninitialised value (at the left of the colon) and
its type (at the right of the colon). A formula σ <: σ 0 allows to “forget”
initialisations.
Instructions are checked against a typing environment Ψ (mapping labels
to types, and type variables to the kind Lock: the kind of singleton lock
types), a register file type Γ holding the current types of the registers, and
a tuple Λ that comprises three sets of lock variables (the permission of the
code block), that are, respectively, exclusive, shared, and linear.
Rule T-yield requires that all shared and that all exclusive locks must
have been released prior to ending the thread. Only the thread that acquired
a lock may release it.
Rule T-fork splits the permission into two tuples, Λ and Λ0 : one goes
with the forked thread, the other remains with the current thread, according
to the permissions required by the target code block.
Rules T-new-lock1, T-new-lock-1, and T-new-lockL each adds
the type variable into the respective set of locks. Rules T-new-lock0,
T-new-lock1, and T-new-lock-1 assign a lock type to the register. Rules
T-tslE and T-tslS require that the value under test holds a lock; disal-
lowing testing a lock already held by the thread. Rules T-unlockE and
T-unlockS make sure that only held locks are unlocked. Finally, the rules
T-criticalE and T-criticalS ensure that the current thread holds the
exact number of locks required by the target code block. Each of these rules
also adds the lock under test to the respective set of locks of the thread. A
thread is guaranteed to hold the lock only after (conditionally) jumping to a
critical region. A previous test and set lock instructions may have obtained
the lock, but as far as the type system goes, the thread holds the lock after
the conditional jump.
6
The typing rules for memory and control flow are depicted in Figure 1.10.
The rule for malloc makes sure that the lock α is in scope, meaning that it
must be preceded by a newLock, in the same code block, or that the type
variable must be abstracted in the universal value operator. Values can be
loaded from tuples if the guarding type variable is in one of the set of locks.
Values can be stored in tuples if the guarding type variable is in the set of
exclusive locks or in the set of linear locks.
The rules for typing machine states are illustrated in Figure 1.11. They
should be easy to understand. The only remark goes to heap tuples, where
we make sure that all locks protecting the tuples are in the domain of the
typing environment.
1.5 Examples
To exemplify how MIL works, we show interprocessor communication. We
create a tuple of shared memory and then create two threads that try to
write in it concurrently. The lock α is passed to each code block, because it
is not in the scope of the forked threads.
main ( ) {
α , r1 := newLock 0
r2 := malloc [ i n t ] guarded by α
fork thread1 [α]
fork thread2 [α]
}
Each thread tries to acquire lock α with a different strategy. The first
thread (thread1) uses a technique called spin lock :
t h r e a d 1 ∀ [ α ] (r1 : h l o c k (α) i ˆα , r2 : h ? i n t i ˆα) {
r3 := t s l E r1 −− e x c l u s i v e b e c a u s e we want t o w r i t e
i f r3 = 0 jump c r i t i c a l R e g i o n 1 [ α ]
jump t h r e a d 1 [ α ]
}
The code block loops actively, not releasing the processor, until it even-
tually grabs the lock.
c r i t i c a l R e g i o n 1 ∀ [ α ] (r1 : h l o c k (α) i ˆα , r2 : h ? i n t i ˆα)
r e q u i r e s (α ; ; ) {
r2 [ 0 ] := 1
7
unlock r1
yield
}
The second thread (thread2) uses a different technique called a sleep lock :
t h r e a d 2 ∀ [ α ] (r1 : h l o c k (α) i ˆα , r2 : h ? i n t i ˆα) {
r3 := t s l E r1 −− e x c l u s i v e b e c a u s e we want t o w r i t e
i f r3 = 0 jump c r i t i c a l R e g i o n 2
fork thread2 [α]
}
This strategy features a cooperative approach. Instead of actively trying
to grab the lock, it forks a copy of that thread and yields the processor to
another thread in the pool.
c r i t i c a l R e g i o n 2 ∀ [ α ] (r1 : h l o c k (α) i ˆα , r2 : h ? i n t i ˆα)
r e q u i r e s (α ; ; ) {
r2 [ 0 ] := 2
u n l o c k E r1
yield
}
These two techniques have advantages over each other. A spin lock is
faster. It should be used when there is a reasonable expectation that the
lock will be available in a short period of time. A short coming of the spin
lock is demonstrated in this example:
main ( ) {
α , r1 := newLock -1
fork r e l e a s e [α]
jump s p i n L o c k [ α ]
}
The code block main creates a lock and forks a thread that unlocks it, thus
taking the ownership of the lock when forked:
r e l e a s e ∀ [ α ] (r1 : h l o c k (α) i ˆα) r e q u i r e s (α ; ; ) {
unlock r1
yield
}
The code block spinLock uses the spin lock technique to acquire the lock’s
permission, as exemplified in the first example. The problem with this pro-
gram is that it only works with machines with more than one processor.
8
Otherwise, because the spin locking thread does not relinquish the usage of
the processor, the forked process that will unlock l will never be able to do
so. The sleep lock technique, however, does context switching, which is an
expensive operation (i.e., degrades performance).
Libraries written in MIL use continuation passing style. In this model of
programming the user passes a continuation (a label pointing to a code block)
to the library’s procedure. When computation is finished, the procedures
runs the continuation label (either by forking or by jumping).
In continuation passing style, it is useful to pass user data to the con-
tinuation code. With existential types, it is possible to abstract the type of
the user data. A data structure (a tuple) is created to keep the continuation
label and the user data. Let ContinuationT ype stands for
∀[α].((r1 : h?intiα ) requires (; ; α))
Let P ackedU serData stands for
∃X.h∀[α].((r1 : X) requires (; ; α)), Xiα
A sketch of this usage is:
main ( ) {
α := newLockLinear
r2 := malloc [ i n t ] guarded by α
r1 := malloc [ C o n t i n u a t i o n T y p e , h ? i n t i ˆα ] guarded by α
r1 [ 0 ] := c o n t i n u a t i o n
r1 [ 1 ] := r2
r1 := pack r1 , h ? i n t i ˆα as PackedUserData
jump l i b r a r y [α]
}
continuation ContinuationType {
−− do some work
}
9
The code block main allocates the user data h?intiα and places it into a
tuple, along with the label pointing to the continuation. The tuple is then
packed and passed to the library, which eventually calls the continuation by
unpacking the packed data and jumping to the callback.
10
i f ( c o n t i n u a t i o n s . s i z e ( ) i 0) {
I n p u t C o n t i n u a t i o n c o n t = c o n t i n u a t i o n s . remove ( 0 ) ;
c o n t . r e d u c e ( msg ) ;
r e d u c e ( cont , msg ) ;
} else {
m e s s a g e s . add ( msg ) ;
}
}
1.6.2 Implementation
A channel needs to keep two attributes: a list of messages and a list of input
continuations. MIL does not allow the use of values as indexes in load oper-
ations (only literals) and it does not permit the allocation of dynamic sizes
of memory, hence we cannot have a tuple with dynamic length to implement
the list of elements. Our implementation uses locks to impose the queueing
of messages and of continuations. A Channel is the type:
hint, M essage, InputContinuationiα
The first element of the tuple specifies the contents of the channel:
• 0 if it has no messages and no input continuations;
• 1 if it has at least one queued message;
• 2 if it has at least one queued input continuation;
The second element of the tuple is a queued message and the third element
is a queued input continuation.
MIL has no notion of what classes are so we must adapt the code to
the language. A Continuation class is simply a callback with user data at-
tached to it (the implementation of the class). Using the continuation pass-
ing style explained in Section 1.5, the input callback is the data structure
InputContinuation :
∃X.h∀[α, M essage].((r1 : X,
r2 : Channel,
r3 : hlock(α)iα ) requires (α; ; )),
Xi
11
Because messages can be of any type, we abstract them with the universal
operator in both procedures. Let the signature of readMessage stands for:
∀[α, M essage].((r1 : InputContinuation,
r2 : Channel,
r3 : hlock(α)iα ) requires (α; ; ))
Let the signature of writeMessage stands for
∀[α, M essage].((r1 : M essage,
r2 : Channel,
r3 : hlock(α)iα ) requires (α; ; ))
The initialisation of a channel is the responsibility of the client. Since no
encapsulation is possible in an assembly language, the consistency of Channel
is responsibility of the user of the library.
12
The label sink has the type of the message abstracted and the type of the
environment abstracted so it can be used to fill a dummy InputContinuation .
The label sink just unlocks the lock α and yields the processor. The user
data we use is an integer, because it is more convenient. Now we need to
hide the user data with the existential operator:
r2 := pack r2 , i n t as I n t C o n t i n u a t i o n
Now, with the InputContinuation in register r2 we are able to create the channel:
r1 := malloc [
int , −− t h e s t a t u s o f t h e c h a n n e l
int , −− t h e t y p e o f t h e message
I n t C o n t i n u a t i o n −− t h e p ac k e d c o n t i n u a t i o n
] guarded by α
r1 [ 0 ] := 0 −− ’ 0 ’ marks an empty c h a n n e l
r1 [ 1 ] := 0 −− t h e dummy message
r1 [ 2 ] := r2 −− t h e dummy i n p u t c o n t i n u a t i o n
To send a message with the literal 10, emulating the process xh10i, we
use the channel initialised in r1 .
r2 := r1 −− move t h e c h a n n e l t o t h e s e c o n d p a r a m e t e r
r1 := 10 −− move t h e message t o t h e f i r s t p a r a m e t e r
jump w r i t e M e s s a g e
If we wish to read a message through channel x, emulating the process
x(a).P , we use the channel initialised in r1 . Let PType stands for the type:
∀[α](r1 : hiα ,
r2 : int,
r3 : hlock(α)iα ) requires (α; ; )
13
r1 [ 0 ] := P −− t h e c o n t i n u a t i o n
r1 [ 1 ] := r4 −− empty u s e r d a t a
r1 := pack r1 , h i ˆα as I n t C o n t i n u a t i o n
jump readMessage
14
P (i) = hR; Λ; (α, r := newLock 0; I)i l 6∈ dom(H) β 6∈ Λ
β
hH; T ; P i → hH{l : h0i }; T ; P {i : hR{r : l}; Λ; I[β/α]i}i
(R-new-lock 0)
P (i) = hR; Λ; (α, r := newLock 1; I)i l 6∈ dom(H) β 6∈ Λ
β
hH; T ; P i → hH{l : h1i }; T ; P {i : hR{r : l}; (λE , λS ] {β}, λL ); I[β/α]i}i
(R-new-lock 1)
P (i) = hR; Λ; (α, r := newLock -1; I)i l 6∈ dom(H) β 6∈ Λ
β
hH; T ; P i → hH{l : h-1i }; T ; P {i : hR{r : l}; (λE ] {β}, λS , λL ); I[β/α]i}i
(R-new-lock -1)
P (i) = hR; Λ; (α := newLockLinear; I)i β 6∈ Λ
hH; T ; P i → hH; T ; P {i : hR; (λE , λS , λL ] {β}); I[β/α]i}i
(R-new-lockL)
P (i) = hR; Λ; (r := tslS v; I)i R̂(v) = l H(l) = hbiα b≥0
α
hH; T ; P i → hH{l : hb + 1i }; T ; P {i : hR{r : 0}; (λE , λS ] {α}, λL ); Ii}i
(R-tslS-acq)
P (i) = hR; Λ; (r := tslS v; I)i H(R̂(v)) = h-1iα
(R-tslS-fail)
hH; T ; P i → hH; T ; P {i : hR{r : -1}; Λ; Ii}i
P (i) = hR; Λ; (r := tslE v; I)i R̂(v) = l H(l) = h0iα
hH; T ; P i → hH{l : h-1iα }; T ; P {i : hR{r : 0}; (λE ] {α}, λS , λL ); Ii}i
(R-tslE-acq)
P (i) = hR; Λ; (r := tslE v; I)i H(R̂(v)) = hbiα b 6= 0
hH; T ; P i → hH; T ; P {i : hR{r : b}; Λ; Ii}i
(R-tslE-fail)
P (i) = hR; (λE , λS ] {α}, λL ); (unlockS v; I)i R̂(v) = l H(l) = hbiα
hH; T ; P i → hH{l : hb − 1iα }; T ; P {i : hR; (λE , λS , λL ); Ii}i
(R-unlockS)
P (i) = hR; (λE ] {α}, λS , λL ); (unlockE v; I)i R̂(v) = l H(l) = h iα
hH; T ; P i → hH{l : h0iα }; T ; P {i : hR; (λE , λS , λL ); Ii}i
(R-unlockE)
15
P (i) = hR; Λ; (r := malloc [~τ ] guarded by α; I)i l 6∈ dom(H)
~ iα }; T ; P {i : hR{r : l}; Λ; Ii}i
hH; T ; P i → hH{l : h?τ
(R-malloc)
P (i) = hR; Λ; (r := v[n]; I)i H(R̂(v)) = hv1 ..vn ..vn+m iα
(R-load)
hH; T ; P i → hH; T ; P {i : hR{r : vn }; Λ; Ii}i
P (i) = hR; Λ; (r[n] := v; I)i
R(r) = l H(l) = hv1 ..vn ..vn+m iα
(R-store)
hH; T ; P i → hH{l : hv1 .. R̂(v)..vn+m iα }; T ; P {i : hR; Λ; Ii}i
16
types τ ::= int | h~σ iα | ∀[~ α].(Γ requires Λ) | lock(α) |
lockE(α) | lockS(α) | ∃α.τ | ∃l α.τ | µα.τ | α
init types σ ::= τ | ?τ
register file types Γ ::= r1 : τ1 , . . . , rn : τn
typing environment Ψ ::= ∅ | Ψ, l : τ | Ψ, α : : Lock
(T-pack,T-packL)
Ψ ` v: τ
Ψ; Γ ` r : Γ(r) (T-reg,T-val)
Ψ; Γ ` v : τ
Ψ; Γ ` v : ∀[αβ].(Γ ~ 0
requires Λ)
(T-val-app)
Ψ; Γ ` v[τ ] : ∀[β].(Γ ~ 0 [τ /α] requires Λ[τ /α])
17
Ψ; Γ; (∅, ∅, λL ) ` yield (T-yield)
0 0 0
Ψ; Γ ` v : ∀[].(Γ requires Λ) Ψ; Γ; Λ ` I ` Γ <: Γ
(T-fork)
Ψ; Γ; Λ ] Λ0 ` fork v; I
Ψ, α : : Lock; Γ{r : hlock(α)iα }; Λ ` I α 6∈ Ψ, Γ, Λ
(T-new-lock 0)
Ψ; Γ; Λ ` α, r := newLock 0; I
Ψ, α : : Lock; Γ{r : hlock(α)iα }; (λE , λS ] {α}, λL ) ` I α 6∈ Ψ, Γ, Λ
Ψ; Γ; Λ ` α, r := newLock 1; I
(T-new-lock 1)
α
Ψ, α : : Lock; Γ{r : hlock(α)i }; (λE ] {α}, λS , λL ) ` I α 6∈ Ψ, Γ, Λ
Ψ; Γ; Λ ` α, r := newLock -1; I
(T-new-lock -1)
Ψ, α : : Lock; Γ; (λE , λS , λL ] {α}) ` I α 6∈ Ψ, Γ, Λ
(T-new-lockL)
Ψ; Γ; Λ ` α := newLockLinear; I
Ψ; Γ ` v : hlock(α)iα Ψ; Γ{r : lockS(α)}; Λ ` I α 6∈ Λ
(T-tslS)
Ψ; Γ; Λ ` r := tslS v; I
Ψ; Γ ` v : hlock(α)iα Ψ; Γ{r : lockE(α)}; Λ ` I α 6∈ Λ
(T-tslE)
Ψ; Γ; Λ ` r := tslE v; I
Ψ; Γ ` v : hlock(α)iα α ∈ λS Ψ; Γ; (λS \ {α}, λE , λL ) ` I
Ψ; Γ; (λS , λE , λL ) ` unlockS v; I
(T-unlockS)
α
Ψ; Γ ` v : hlock(α)i α ∈ λE Ψ; Γ; (λS , λE \ {α}, λL ) ` I
Ψ; Γ; (λS , λE , λL ) ` unlockE v; I
(T-unlockE)
Ψ; Γ ` r : lockS(α) Ψ; Γ ` v : ∀[].(Γ0 requires (λE , λS ] {α}, λ0L ))
Ψ; Γ; Λ ` I ` Γ <: Γ0 λ0L ⊆ λL
Ψ; Γ; (λE , λS , λL ) ` if r = 0 jump v; I
(T-criticalS)
Ψ; Γ ` r : lockE(α) Ψ; Γ ` v : ∀[].(Γ0 requires (λE ] {α}, λS , λ0L ))
Ψ; Γ; Λ ` I ` Γ <: Γ0 λ0L ⊆ λL
Ψ; Γ; Λ ` if r = 0 jump v; I
(T-criticalE)
Figure 1.9: Typing rules for instructions (thread pool and locks) Ψ; Γ; Λ ` I
18
Ψ, α : : Lock; Γ{r : h?τ~ iα }; Λ ` I ~τ 6= lock( ), lockS( ), lockE( )
Ψ, α : : Lock; Γ; Λ ` r := malloc [~τ ] guarded by α; I
(T-malloc)
Ψ; Γ ` v : hσ1 ..τn ..σn+m iα Ψ; Γ{r : τn }; Λ ` I τn 6= lock( ) α ∈ Λ
Ψ; Γ; Λ ` r := v[n]; I
(T-load)
Ψ; Γ ` v : τn Ψ; Γ ` r : hσ1 ..σn ..σn+m iα τn 6= lock( )
α
Ψ; Γ{r : hσ1 .. type(σn )..σn+m i }; Λ ` I α ∈ λE ∪ λL
(T-store)
Ψ; Γ; Λ ` r[n] := v; I
Ψ; Γ ` v : τ Ψ; Γ{r : τ }; Λ ` I
(T-move)
Ψ; Γ; Λ ` r := v; I
Ψ; Γ ` r0 : int Ψ; Γ ` v : int Ψ; Γ{r : int}; Λ ` I
0
(T-arith)
Ψ; Γ; Λ ` r := r + v; I
Ψ; Γ ` v : ∃α.τ Ψ; Γ{r : τ }; Λ ` I α 6∈ Ψ, Γ, Λ
(T-unpack)
Ψ; Γ; Λ ` α, r := unpack v; I
Ψ; Γ ` v : ∃l α.τ Ψ, β : : Lock; Γ{r : τ }; Λ ` I α 6∈ Ψ, Γ, Λ
Ψ; Γ; Λ ` α, r := unpack v; I
(T-unpackL)
Ψ; Γ ` r : int Ψ; Γ ` v : ∀[].(Γ0 requires (λE , λS , λ0L ))
Ψ; Γ; Λ ` I ` Γ <: Γ0 λ0L ⊆ λL
(T-branch)
Ψ; Γ; Λ ` if r = 0 jump v; I
Ψ; Γ ` v : ∀[].(Γ0 requires (λE , λS , λ0L )) ` Γ <: Γ0 λ0L ⊆ λL
Ψ; Γ; Λ ` jump v
(T-jump)
Figure 1.10: Typing rules for instructions (memory and control flow)
Ψ; Γ; Λ ` I
19
∀i.Ψ ` R(ri ) : Γ(ri )
(reg file, Ψ ` R : Γ )
Ψ ` R: Γ
∀i.Ψ ` P (i) Ψ ` R: Γ Ψ; Γ; Λ ` I
(processors, Ψ ` P )
Ψ`P Ψ ` hR; Λ; Ii
∀i.Ψ ` li : ∀[~αi ].(Γi requires ){ } Ψ ` Ri : Γi [β~i /~
αi ]
Ψ ` {hl1 , R1 i, . . . , hln , Rn i}
(thread pool, Ψ ` T )
~ : : Lock; Γ; Λ ` I
Ψ, α ∀i.Ψ, α : : Lock ` vi : σi
Ψ ` ∀[~
α].(Γ requires Λ){I} : ∀[~ α].(Γ requires Λ) Ψ, α : : Lock ` h~v iα : h~σ iα
(heap value, Ψ ` h : τ )
∀l.Ψ ` H(l) : Ψ(l)
(heap, Ψ ` H )
Ψ`H
Ψ`H Ψ`T Ψ`P
` halt (state, ` S )
` hH; T ; P i
Channel InputContinuation
+messages: List<Message> +reduce(msg:Message)
+continuations: List<InputContinuation>
+readMessage(input:InputChannel)
+writeMessage(msg:Message)
20
Chapter 2
2.1 Intent
The Visitor [3] encapsulates the application of an operation over an object
structure. This pattern facilitates the addition of new operations without
modifying the classes on which they operate. Our extension separates three
concerns: traversal, object structure, and operation.
2.2 Motivation
The object structure traversed by the visitor is usually defined by a class
hierarchy, using inheritance and composition. Consider Figure 2.1, one object
structure for this class hierarchy is: the tree is composed by instances of
Addition, leaves are instances of Number.
This is an ad hoc object structure, thus it cannot be predicted, nor in-
ferred automatically. The convention, is that the attributes of an object that
share the same base class are the children of that object (like the Addition),
but this is only true in a subset of problems. By using this convention, it
is possible to create tools that generate visitors automatically. Conventions,
however, have drawbacks, like the lack of flexibility and the absence of intro-
spection. Object structures that are different from this convention cannot
be target of code generation tools.
An alternative approach to specify the object structure is the implemen-
tation of a common interface that embodies the relation between objects,
like the Composite pattern [3]. This approach provides introspection, gen-
21
IntegerExpression
Number Addition
+left: IntegerExpression
+right: IntegerExpression
eralising the retrieval of the children of a class of objects, but imposes the
implementation of an interface on every object that needs to be navigated,
that may not possible.
Two common techniques of making the Visitor reusable is by using the
Decorator pattern, where the decorator implements the traversal of the tree
and the decorated implements the code logic, and the Template Method,
where inheritance is used to separate the traversal from the code logic. Both
techniques, however, are hindered by a hard-coded object structure, making
the generic implementations brittle to change. The concept of traversal and
the code logic are mixed at the interface level, to workaround this the sep-
aration of responsibilities is performed using the Decorator pattern or using
inheritance, but the interface for both concerns is still the same (i.e., the
Visitor ).
The Guided Visitor [1] proposes the separation of navigation from com-
putation code. With this kind of visitor, the object structure is obtained
in the class where the traversal algorithm is performed, in the Guide. By
decoupling these two concepts it is possible to reuse traversal strategies and
apply them to different object structures.
We propose breaking the visitor into three concerns, each represented by
an interface, solving an isolated problem: object structure, traversal, and
computation code.
22
2.3 Applicability
Use the extended visitor when:
• you are dealing with complex object structures. Object structures are
an object, hence they can be composed or extended, as any other class.
2.4 Structure
Client
IntegerExpression
Number Addition
23
:Traversal :ObjectStructure :Visitor
before(node)
traverse(node)
getChildren(node)
traverse(childNode)
after(node)
2.5 Participants
The Visitor has two methods that represent two events: before traversing
the children of the node and afterwards traversal. The visitor defines an ab-
stract interface where the operation that is applied to the object structure is
implemented. The ConcreteVisitor ( Calculator ) is the actual implementation
of the Visitor interface. A Node is any element that may be traversed. It can
be of any type. The ObjectStructure is an interface for the retrieval the chil-
dren of a node, if possible. A ConcreteObjectStructure ( IntObjectStructure )
is an implementation of the previous interface. The Traversal is the inter-
face for the controller of the application of the operation (the visitor). A
ConcreteTraversal (DepthFirst).
2.6 Collaborations
A client that uses the Extended Visitor pattern must create three objects,
an object structure, a traversal strategy, and the visitor that applies the
operation. When an element is visited two methods, corresponding to two
events are called, one before its children are visited and one after its children
are visited. Figure 2.3 illustrates the traversal using a depth visitor.
24
Visitor ObjectStructure
+before(node:Object) +getChildren(node:Object)
+after(node:Object)
queries
guides
Traversal
+traverse(struct:ObjectStructure,visitor:Visitor,
node:Object)
2.7 Consequences
Adding new Nodes is easy. Contrary to the usual Visitor pattern, adding a
new Node to the object structure is just a matter of updating the Object-
Structure’s implementation.
Visiting a class is more verbose. In the classic Visitor, because all the
concerns are mixed within the same class, a single parameter is needed, the
Node to visit. The Extended Visitor needs the user to explicitly create a
traversal object, an object structure object, and a visitor object.
Leverages encapsulation. The classic Visitor demands a close coupling
between the Visitor and the Nodes, since the Node calls methods of the
Visitor and vice-versa. With the Extended Visitor, Nodes are not aware,
even at interface level, of the visitor. The visitor may, or may not, be aware
of the interface of the Node too, since it receives instances casted to Object.
Code reuse. Because of the separation of concerns and cohesion of classes,
it is possible to reuse parts of the Extended Visitor in various situations, with
different classes of Nodes.
2.8 Implementation
We define three classes that embody separation of concerns. The class dia-
gram presented in Figure 2.4. The Traversal is the controller class responsible
for guiding the Visitor through the tree generated by the ObjectStructure . The
25
implementation of the ObjectStructure describes the children of a group of ob-
jects, that may share the same base class (or may not). The implementation
of the Visitor depends on the operation that needs to be applied over a
structure of objects.
26
t o t a l += ( ( Node ) o b j ) . v a l u e ;
}
}
27
Chapter 3
The Backend
3.1 Framing
3.1.1 Introduction
A frame is a data structure that holds variables available in a certain scope.
Frames are used to specify parameters, local variables, and global variables
defined in a function. An frame of a function is allocated when the function
is called and deallocated when the function exits.
The parameters and local variables are instantiated when a function is
called. Local variables, parameters, and global variables should only exist
whilst they are needed. If a local variable is only used for the computation
of a temporary value, the memory associated with it should be freed when
the computation is over, or, at least, when the function is finished. When a
function (the nested) is defined inside another function (the nesting), it may
reference variables defined in the nesting function. These variables, however,
must not be freed when the nesting function exits, their values must be stored
inside the nested function’s frame.
In the π-calculus there are no functions or local variables. Input processes
are analogous to functions, in what framing is concerned. Frames define the
names known in a certain scope. Names used by processes must be present
in a certain scope, defined as a global variable, or defined as a parameter
of an input channel. Frames specify the free names available to a group of
processes.
The rules for defining a variable in a frame is the following:
28
a(x).(a<x> | b(y).y<x>)
Frame name: ’a’
Parameters: ’x’
Globals: ’a’
a<x> | b(y).y<x>
a<x> b(y).y<x>
Frame name: ’b’
Parameters: ’y’
Globals: ’x’
y<x>
(F3) the global variables defined in a frame are variables in the parent frame;
3.1.2 Implementation
Our compiler creates a representation of frames in the same step it performs
semantic checking. A new frame is mapped to each input process. Frames
29
Frame
+parent: Frame globals
+name: Symbol Variable
+depth: int parameters
+type: PiType
+position: int
are named after the name of the channel. Figure 3.2 illustrates the class
diagram of frames.
Stacks are used in visitors to allow the communication in nodes of different
depth. The visitor navigating the AST holds a stack for defining the current
frame. Each time it visits an input process a new frame is created and the
names of the arguments of the channel are defined as parameters of that
frame. This frame is pushed into a stack of frames. The top of the stack
is the current frame enclosing a group of processes. When the visitor leaves
the input process the current frame is removed from the top of the stack of
frames.
When the visitor enters an output process it adds all the instances of type
NameValue, present in the arguments, as variables to the current frame.
There is a stack in the visitor to hold Replication objects, allowing the
visitor to know wether an InputPrefix instance is contained in a Replication
instance. When a visitor enters an InputPrefix that is child of a Replication
instance it adds a variable (with the name of the channel) to the current
frame.
3.2 Translation
After semantic analysis we translate the AST to an abstract assembly lan-
guage. The translation can be performed in the same traversal where se-
mantic analysis is done, but this usually makes the code more complex.
Translating directly to a concrete assembly language reduces the portability
of the compiler, since it bounds one compiler to one platform.
Abstract assembly languages generalise concepts that exist in various
30
inputPrefix(input)
:SymbolTableProcessVisitor
1: pushFrame(input)
1.1: <<create>>
:SymbolTable frame:Frame
1.3: push(frame)
:Stack<Frame> :Map<InputPrefix,Frame>
platforms. Various compilers may target the same abstract assembly lan-
guage, that will enable code generation for various architectures.
We show an overview of the implementation of the translation step, fol-
lowed by an in-depth analysis of each component that is part of the backend.
3.2.1 Overview
Our compiler targets MIL, an abstract typed assembly language that has
the concept of threads. In our compiler, code generation is performed after
semantic checking. The code generation traverses the AST twice: first to
associate a MIL Label to each process and then to translate the nodes. The
generated code uses the runtime library defined in Chapter 1.6.
By using composite visitors [9], through the decorator pattern [3], we
apply various operations separately, i.e., each in its own class, in the same
traversal. For example, the class FramePusherVisitor makes the decorated visi-
tor access a current frame while traversing the AST. This visitor is used both
in the process labeling traversal and in the code generation traversal. The
ScopeVisitor makes the symbol table be coherent to the decorated visitor. In
31
pi
this case, the class is used only once, but since the code for handling the
symbol table is separated from the code generation, it is motivation enough
to create a new visitor.
The basic idea of the translation is: sequential operations are sequentially
performed in a single thread and processes being run in parallel are each run
in separate threads.
So, considering that each process has a label attached to it, the process
P | Q roughly translates to:
fork P[α]
f o r k Q[ α ]
The code generation for the nil process, 0, is pretty straight forward:
yield
The process xh10i has this translation sketch:
r1 := 10
r2 := x
jump w r i t e M e s s a g e [ α ]
The input process x(a).P is a bit trickier, since we need to pack the user
data (which we will address further), but we can sketch it as:
r1 := malloc [ C o n t i n u a t i o n T y p e , UserDataType ] guarded by α
r1 [ 0 ] := c o n t i n u a t i o n
r1 [ 1 ] := u s e r d a t a
r1 := pack UserDataType , r1 as P a c k e d C o n t i n u a t i o n
r2 := x
jump readMessage [α]
where the continuation is something like:
32
Translate
Tree preprocessing
Code generation
ScopeVisitor CodeHelper
ProcessLabelerVisitor
EnvironmentCreator RegisterPool
FramePusherVisitor
TypeAdapter FrameIndexerFactory
ChannelRepresentation FrameIndexer
33
a := r2 −− r e c e i v e t h e v a l u e o f t h e p a r a m e t e r
jump P [ α ]
Finally, because we only allow replicated input processes, the translation
of the replication is similar to the input, but with a different continuation.
Consider the continuation of the process !x(a).P :
a := r1
fork P
jump g r a b L o c k [ α ]
Where the generated output of the code fragment grabLock is something like:
r5 := t s l E r1
i f r5 = 0 jump t r y A g a i n
f o r k g r a b L o c k [ α ] −− s l e e p s p i n l o c k
yield
The last block (grabBlock) tries to read the message again:
r1 := malloc [ C o n t i n u a t i o n T y p e , UserDataType ] guarded by α
r1 [ 0 ] := c o n t i n u a t i o n
r1 [ 1 ] := u s e r d a t a
r1 := pack UserDataType , r1 as P a c k e d C o n t i n u a t i o n
r2 := x
jump readMessage [α]
34
Indexed Frame
Frame: a x y a
Global Variables: x:Str, y:Int
Parameters: a:Int Str Int Int
FrameIndexer
+names: Symbol[]
+types: PiType[]
+getIndexFor(name:Symbol): int
35
3.2.4 Code Generation Helper
To aid the translation we use the class CodeHelper, a Façade [3] to type
adapting, to frame indexing, to register pooling, and to environment creation.
Register pooling allows the reuse of registers. Environment creation is the
initialisation of a frame in MIL.
Register pooling is performed in the class RegisterPool . This class has two
methods, one to allocate a register and another to free a register. When a
register is allocated we return it if there is none in the pool. When a register
is freed, it is placed in the pool. If we allocate a register and there is a register
in the pool, that one is returned, not needing to increase the register use. It
is possible to know how many registers are being used at the same time.
Environment creation is implemented in the class EnvironmentCreator. It
generates a MIL type for a given frame. First, it uses the FrameIndexerFactory
to index the variables. Afterwards, the EnvironmentCreator adapts each π-
calculus type to a MIL type. Finally, a tuple is created and bound to a type
variable. This tuple comprises the adapted types (from the indexed frame).
The environment is passed between processes and contains the values of the
existing variables. When communicating with the runtime, it is the user data
sent to the input continuation.
The CodeHelper contains a group of methods to generate blocks of code
(code fragments). There is a concept of a current code block where instruc-
tions are appended to. There is a method to close the current code block,
where we free the registers not freed explicitly and generate the code signa-
ture. The closed code block is then added as a new code fragment to the
generated tree.
There are two methods in the CodeHelper that are used to generate code
for a process. These were not placed in the Translate class because unit testing
was easier, and because the idea is that most code generation is handled by
the CodeHelper, not the Translate , which should be used more like a mediator
between the traversal and the code generation.
Code generation for the output process is straightforward. If the param-
eter is a literal, its valued is converted to a MIL value. If the parameter is a
name, then we use the frame indexing to know where the name is located in
the environment variable. The value is loaded from the environment to the
register that passes the message. Afterwards the a jump to the writeMessage
is done.
When the input process is run, a new frame is created, this corresponds
36
to the allocation and initialisation of a new frame object. The initialisation
is the copy the global variables present in the new frame from the old en-
vironment. After frame switching, we prepare the call for the readMessage,
defined in the runtime.
37
Chapter 4
Conclusion
38
Chapter 5
Appendix
registers 7
packed: exists Env. <[l](r1: Env, r2: int, r3: <lock(l)>^l) requires (l;;), Env>^l
mainEnv: <channel>^l
39
r7[2] := r6 -- move the dummy input channel
r7[1] := 0 -- move the dummy output message
r5[0] := r7 -- initialize (int)
r1 := r5 -- Move the environemnt to the first register
jump main_new_a[l]
}
40
a_in_a [l](r1: mainEnv, r3: <lock(l)>^l) requires (l;;) {
r5 := malloc [channel, int] guarded by l -- alloc space for the new env ’a’
r7 := r1[0]
r5[0] := r7
r5[1] := 0 -- initialize ’x’
r2 := r1[0] -- move the channel a as 2nd arg
r7 := malloc [[l](r1: aEnv, r2: int, r3: <lock(l)>^l) requires (l;;), aEnv] guar
r7[0] := read_replicate_a
r7[1] := r5
r1 := pack aEnv, r7 as exists Env. <[l](r1: Env, r2: int, r3: <lock(l)>^l) requi
jump inputMessage[l][int]
}
41
jump outputMessage[l][int]
}
InputContinuation: exists Env. <[l](r1: Env, r2: Message, r3: <lock(l)>^l) require
outputMessage OutputCode {
42
r4 := r2[0] -- Grab the status of the channel.
if r4 = 0 -- Empty. Place our message into the channel.
jump outputFill[l][Message]
r4 := r4 - 1 -- if r4 = 1
if r4 = 0 -- Has a message already. Try again.
jump outputScheduleMessage[l][Message]
r4 := r4 - 1 -- if r4 = 2
if r4 = 0 -- Has an input channel. Redux.
jump outputReduce[l][Message]
outputFill OutputCode {
r2[0] := 1
r2[1] := r1
unlockE r3
yield
}
outputScheduleMessage OutputCode {
unlockE r3
jump outputGrabLock[l][Message]
}
outputGrabLock OutputUnlockedCode {
r4 := tslE r3
if r4 = 0 -- we grab the lock, back to the begining
jump outputMessage[l][Message]
-- try again:
fork outputGrabLock[l][Message]
yield
}
outputReduce OutputCode {
r2[1] := r1
43
jump reduce[l][Message]
}
reduce ReduceCode {
r2[0] := 0 -- flag it as empty
r4 := r2[2] -- the input
r2 := r2[1] -- the message
inputMessage InputCode {
r4 := r2[0]
if r4 = 0
jump inputFill[l][Message]
r4 := r4 - 1 -- if r4 = 1
if r4 = 0
jump inputReduce[l][Message]
r4 := r4 - 1 -- if r4 = 2
if r4 = 0
jump inputSchedule[l][Message]
-- should never reach this code
jump inputMessage[l][Message]
}
inputFill InputCode {
r2[0] := 2 -- has an input
r2[2] := r1 -- put the input into the pool
unlockE r3 -- we are done; yield
yield
}
inputReduce InputCode {
44
r2[2] := r1
jump reduce[l][Message]
}
inputSchedule InputCode {
unlockE r3
jump inputGrabLock[l][Message]
}
inputGrabLock InputUnlockedCode {
r4 := tslE r3
if r4 = 0 jump inputMessage[l][Message]
fork inputGrabLock[l][Message]
yield
}
45
Bibliography
[1] Martin Bravenboer and Eelco Visser. Guiding visitors: Separating navi-
gation from computation, November 29 2001.
[2] Cormac Flanagan and Martn Abadi. Types for safe locking, February 02
1999.
[3] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. De-
sign Patterns: Elements of Reusable Object-Oriented Software. Addison-
Wesley, 1994.
[4] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From sys-
tem F to typed assembly language. ACM Transactions on Programing
Language and Systems, 21(3):527–568, 1999.
46