Multi-task Ada Code Generation_preprint
Multi-task Ada Code Generation_preprint
Abstract
1. Introduction
Safety-critical systems are those systems whose failure could result in loss
of life, significant property damage, or damage to the environment. There are
many well known examples in application areas such as avionics and space sys-
5 tems. Currently, Model-Driven Development (MDD) is generally accepted as
a key enabler for the design of the safety-critical systems. For example, in
the guidance of civil avionics software certification DO-178C [1], MDD (DO-
331) and formal methods (DO-333) are considered as vital technology sup-
plements. There are many MDD languages and approaches covering various
10 modeling demands, such as UML for generic modeling, SysML for system-level
modeling, AADL [2] for the architectural modeling and analysis of embedded
1
systems, SCADE and Simulink for functional modeling, and Modelica for
multi-disciplines modeling.
Synchronous languages, which rely on the synchronous hypothesis, are widely
15 adopted in the design and verification of safety-critical systems. There are sev-
eral synchronous languages, such as LUSTRE [3] , ESTEREL [4], SIGNAL [5],
QUARTZ [6], PRELUDE [7], SCADE, and so on. SCADE is the industrial ver-
sion of LUSTRE, which is commercialized by ANSYS/ESTEREL TECHNOLO-
GIES. SIGNAL is a kind of polychronous language, and it naturally considers
20 a mathematical time model, in terms of a partial-order relation, to describe
1 https://www.ansys.com/products/embedded-software/ansys-scade-suite
2
multi-clocked systems. Safety-critical systems have evolved to use multi-core
processors to get higher computation performance to implement advanced func-
tionalities, such as autonomous driving in the flight control. Several recent
works focus on multi-task code generation and the scheduling and mapping of
25 tasks to multi-core processors, with synchronous languages. For instance, the
mapping of PRELUDE programs to many-core architectures [8], extension of
SCADE code generator to support multi-core platform [9][10], parallel code gen-
eration of LUSTRE synchronous programs for a many-core architecture [11],
compilation of ESTEREL for multi-core execution[12], generating OpenMP-
30 based multi-threaded code from the intermediate representation of QUARTZ
[13][14]. In our case, building on our previous works, such as the mechanized
semantics of a subset of SIGNAL in Coq [15], and the sequential code generation
of SIGNAL [16][17], we mainly focus on the SIGNAL language.
2 http://www.irisa.fr/espresso/Polychrony/
3
(fine-grained), and so on. The existing MiniSIGNAL code generation strate-
50 gies mainly consider coarse-grained parallelism based on Ada multi-task model.
However, this code generation scheme has revealed inefficient: architecture as-
pects of the target platform have to be taken into account to achieve fine-grained
parallelism, for instance reusing in-cache data is always expected. Moreover,
sometimes the task’s execution time is very short. Hence, creating tasks and
55 context-switching between them incur significant overhead. To generate more
efficient target code from industrial cases, this paper presents a new multi-task
code generation method for MiniSIGNAL.
We select Ada as the target language because Ada is an explicit-concurrency
and high-safety programming language which is very popular in the safety-
60 critical systems, especially in the aerospace industry such as Airbus, ESA, NASA
and China Aerospace. The Ada language includes support for concurrency as
part of the language standard, by means of Tasks, which are entities that denote
concurrent actions, and inter-task communication mechanisms such as protected
objects or the rendezvous mechanism. This model is targeted to support the
65 concurrent functionalities that the software should support, providing coarse-
grained parallelism. Recently, two complementary research lines are tackling
the extension of Ada to support fine-grained parallelism, for instance: 1) The
next revision of Ada standard (Ada 202x) [20] is currently considering a draft
proposal of parallel model. It specifies that an Ada task (a concurrent activity)
70 can represent multiple logical threads of control which can proceed in parallel
within the context of well specified parallel regions: parallel blocks and paral-
lel loops. However, it is still not available now. 2) Sara Royuela et al. [21]
proposed the incorporation of the OpenMP parallel programming model into
Ada. However, in OpenMP a structured concurrency is enforced and we do not
75 always have such a structure. JobQueue is an alternative way to exploit fine-
grained parallelism. In this paper, we extend the multi-task code generation of
MiniSIGNAL with concurrent JobQueue (i.e., several JobQueues with shared
memory). For instance, one task is created for one core at initialisation time,
a job is a set of data that is processed by a task. Thus the overhead of creat-
4
80 ing/destroying tasks and context switching between them can be reduced. The
jobs which belong to a task, are stored in a job queue, and workers are employed
by the job scheduler to process the jobs. Efficient job scheduling improves re-
source utilization by automatically load-balancing jobs across workers, thereby
enhancing the overall performance of the computation. Inspired by the work of
85 [22] and [23], this paper presents a lock-free implementation of the work-stealing
JobQueue scheduler in Ada.
In addition, the front-end of our compiler prototype has been proven in the
3
proof assistant Coq [16]. In this paper, the formal syntax and the operational
semantics of VMT are also mechanized in Coq. Invariants are put forward and
90 allow the proof of an important structural property: when a task is started, its
required data have already been computed.
105 • The formal syntax and the operational semantics of VMT are mecha-
nized in the proof assistant Coq. A VMT contains a set of tasks that
5
communicate through shared data and synchronise through a wait/no-
tify mechanism. The Coq formalisation allows to establish an important
property of the VMT structure: once a given number of notifications have
110 been received, needed data have been computed and the task can run until
completion.
This paper is an extended version of our FTSCS 2019 conference paper [24].
120 The main extended parts can be summarised as follows:
• In Section 3.1, the details of the task partition approach has been given.
6
compilation phases step by step. The details of CASE B are still given
135 (Appendix B).
1.3. Outline
145 2. Preliminaries
In this section, we first introduce the basic concepts of SIGNAL, and then
give the definition of the intermediate language S-CGA.
2.1. SIGNAL
7
• undersampling y := x when b
The instantaneous function and the delay are monoclock operators which
mean all signals involved have the same abstract clock, while the undersam-
165 pling and the deterministic merging are multiclock operators which represent
the signals involved may have different clocks.
SIGNAL also provides several extended constructs to express control-related
properties by specifying clock relations explicitly, for example set operators on
clocks (union x1 ˆ+x2 , intersection x1 ˆ*x2 , difference x1 ˆ-x2 ). Each extended
170 construct can be equivalently transformed into a set of primitive constructs.
In the SIGNAL language, the relations between values and the relations
between abstract clocks, of the signals, are defined as equations, and a process
consists of a set of equations. Two basic operators apply to processes, the first
one is the composition of different processes, and the other one is the local
175 declaration in which the scope of a signal is restricted to a process.
Each of the extended constructs can be defined in term of the primitive con-
structs [25], so we just consider the primitive constructs, that is kernel SIGNAL
(kSIGNAL for short). Its abstract syntax is presented as follows:
|x := x1 $ init c (delay)
|x := x1 when x2 (undersampling)
|P |P (composition)
8
attitude and orbit. The Eliminate Initial Deviation of Attitude Control sub-
system eliminates the angular rate of attitude generated by the separation of
satellites from launch vehicles by calling some three-axis attitude control algo-
rithms of spacecraft. Here we consider the function, that is Satellite Oriented
185 to Earth. A part of its SIGNAL model is shown as follows, the whole model can
be found in Appendix A. Here we preserve the line number in the Appendix
A:
This function receives two input parameters: the deviation angle of the
attitude angle x (unit ◦ ) and the attitude angular velocity y (unit ◦ /s). And
190 it returns three output values: the jet pulse width jet DC (unit ms), the total
count of jet count DC, and the sign of jet jet sign.
The input variables determine a location in a two-dimensional coordinate
system. Different regions of the coordinate system represent different jet pulse
widths, for instance the jet pulse width of region C1 is 500 (line 15) and the jet
195 pulse width of the origin is zero. C1, C2, . . . , C6 are used to determine which
region includes the location. If the location is in one of the six regions, i.e. the
Boolean variable C1to6 is T rue, the total count of jet count DC is increased
9
by 1 (line 26 - line 27) and the sign of jet jet sign is true.
One of the execution traces of the running example Satellite Orient to Earth
200 is shown in the following table.
Tick 0 1 2 3 4 5 6 7 8 9
x 0.0 -7.1 ⊥ 6.5 2.2 -1.6 -2.5 ⊥ -5.0 -9.9
y 0.0 -1.0 ⊥ -0.01 0.03 -1.1 2.7 ⊥ 0.05 -0.1
f 0.0 -1.355 ⊥ -0.335 0.14 -1.104 2.575 ⊥ -0.2 0.595
C1 F T ⊥ F F T F ⊥ F T
C1 DC ⊥ 500 ⊥ ⊥ ⊥ 500 ⊥ ⊥ ⊥ 500
...
jet DC 0 500 ⊥ -500 -10 500 0 ⊥ 100 500
tmp DC 0 1 ⊥ 1 2 3 4 ⊥ 4 5
add DC ⊥ 1 ⊥ 2 3 4 ⊥ ⊥ 5 6
count DC 0 1 ⊥ 2 3 4 4 ⊥ 5 6
jet sign F T ⊥ T T T F ⊥ T T
Some signals in the table are synchronous, for instance x, y and f , because
the clock synchronisation x ˆ = y explicitly sets synchronisation (line 6) and
205 the instantaneous function f := y + 0.05 ∗ x implicitly expresses synchronisation
(line 7). In addition, the trace of count DC shows the semantics of deterministic
merging (line 28) which is the ‘sum‘ of the traces of tmp DC and add DC, where
add DC has a higher priority.
2.2. S-CGA
210 We present the intermediate representation S-CGA which is proposed in the
MiniSIGNAL. With the same purpose as [26][27], S-CGA provides a common
intermediate format to integrate more synchronous languages such as QUARTZ,
4
AIF into our compiler. Here we just present the syntax of S-CGA. Its formal
semantics can be referred to [16][19].
10
Definition 1 (S-CGA) An S-CGA program is a set of guarded actions
γ ⇒ A defined over a set of variables X. The Boolean condition γ is called the
guard and A is called the action. Intuitively, the semantics of guarded actions
is that A is executed if γ holds. Guarded actions can be of one of the following
forms:
215 where,
• γ and σ are Boolean conditions over the variables of X, and their clocks.
For a variable x ∈ X, we denote:
,
– its clock x
• τ is an expression over X
The form (1) immediately writes the value of τ to the variable x. The form
(2) evaluates τ in the given instant but changes the value of the variable x at
its next instant of presence. The form (3) defines a constraint which has to hold
225 when γ is defined and true. The form (4) shows x that gets a value provided by
the environment while the form (5) indicates the environment gets a value x if
γ is defined and true. Guarded actions are composed by the parallel operator
||.
S-CGA models can be structurally generated from kSIGNAL programs by
230 generating each construct separately, the details are introduced in [16]. Here we
show the S-CGA model generated from the running example:
11
1 || true ⇒ Read x
2 || true ⇒ Read y
3 || true ⇒ Write jet DC
4 || true ⇒ Write count DC
5 || true ⇒ Write jet sign
6 || x
⇒ f := y + 0.05 ∗ x
7 || x
⇒ C1 := (x < −0.5)&&(f < −0.25)&&(y < 0.15)
|| ...
14 || C1&&C1 ⇒ C1 DC := 500
|| ...
20 || true ⇒ jet DC := C1 DC?C1 DC : ... : 0
24 || C1to6&&C1to6 ⇒ add DC := tmp DC + 1
25 || true ⇒ count DC := add DC?add DC : tmp DC
27 || init (true) ⇒ tmp DC := 0
|| ...
29 || true ⇒ next (tmp DC) := count DC
3. Approach
12
• Clock Calculus: The clock calculus contains several steps [28], for in-
stance construction of an equation system over clocks and resolution of
the system of clock equations.
13
3.1.1. Dependency Analysis
We construct the DDG based on reads and writes occurring in guarded
actions. Notice that next(x) is considered as a new variable.
Definition 4 (Read and Write Dependencies) [29] Let F V (τ ) denote the
free variables occurring in the expression τ . The dependencies from guarded
actions to variables are defined as follows:
W rV ars (γ ⇒ x = τ ) := {x}
An action can only be executed if all read variables are known. Similarly, a
270 variable is only known once all actions writing it in the current step have been
evaluated. SIGNAL ensures that at most one write will be performed.
Definition 5 (Data Dependency Graph) Let GA be the set of guarded
actions except assumption and V ar be the set of the variables of GA. A DDG
is a directed acyclic graph GA, →D , where:
The DDG describes the execution order of guarded actions. We ignore the
initialisation information (immediate actions containing keyword init) and as-
sumption actions when constructing the DDG, because the former only takes
280 effect once while the latter is only used for constructing the clock tree.
14
DDG can be constructed by simply traversing S-CGA programs twice to
calculate all data-dependency relations and optimizing them. A direct depen-
dency relation will be removed, if it can be implied by other relations. For
instance, true ⇒ Read x (line 01), x
⇒ f := y + 0.05 ∗ x (line 06) and
285 ⇒ C1 := (x < −0.5)&&(f < −0.25)&&(y < 0.15) (line 07) can generate
x
three direct relations 01, x, 06, 01, x, 07 and 06, f, 07, where the line num-
bers of the S-CGA model are used to note the corresponding guarded actions.
01, x, 07 is implied by the relations 01, x, 06 and 06, f, 07, thus it can be
omitted.
290 The DDG of the running example is shown in Fig.2, where the labels rep-
resent variables appearing in the edges. For instance, 01, x, 06 denotes the
variable x ∈ W rV ars(01) and x ∈ RdV ars(06).
15
by the value of variables associated to next statement of the S-CGA which are
present, must also be updated. Then, the next tick of the master clock will start
a new cycle. Thus, a global synchronisation is introduced to wait for the com-
pletion of current step computations. We can imagine three implementations:
300 • for the system to be correct computations should complete before the
occurrence of next input. Thus, the next input or a timer can fire the end
of the current step. It is efficient but requires the study of environment and
platform timing assumptions which are beyond the scope of this paper.
• a dependency between all the nodes of the dependency graph and the big
305 step task is added. It follows that all the tasks should run, even if they
are associated to absent variables. It is costly.
• the big step waits for tasks linked to present variable to complete. This
set of task is dynamic but can be much smaller. This solution makes
tasks associated to absent variables fully passive. We have retained this
310 solution. Once variables associated to clocks have been computed, we
know how many tasks must be waited for. This fact will be used to
implement this global synchronisation between tasks attached to present
variables.
In the next section, we will present the task partitioning over DDG from
315 which we define the parallelism through elementary tasks.
16
communication between tasks. Moreover, combination is a key step for task
325 partitioning to achieve more efficiency. In this paper, three combination patterns
are proposed to optimise the partitioning result.
At first, several preliminary functions are defined.
Definition 6 (Starting) Let ga be a node from a generated DDG G =
GA, →D . The function Starting(ga) {ga |ga , x, ga ∈→D }, maps ga to a
330 set of nodes which have relationships with ga and pointing to ga.
Definition 7 (Ending) Let ga be a node from a generated DDG G =
GA, →D . The function Ending(ga) {ga |ga, x, ga ∈→D } , maps ga to a
set of nodes which ga has relationships with.
Definition 8 (Replacing) Let ga be a node from a generated DDG G =
335 GA, →D , and let n be a new node which doesn’t appear in G. The function
Replacing(ga, n, G) nGA, →nD returns a new graph in which ga occurs in
G are replaced with n, where
Merge Pattern. Let a and b be two nodes in DDG. If a and b satisfy Ending(a) =
{b} and Starting(b) = {a}, then a and b can be merged into one new node
named a;b. As shown in Algorithm 1, the combination consists of firstly remov-
350 ing the edge {a, x, b} (line 4, here x represents the variable that is read by b
and written by a), and then calling the Replacing function twice to replace a
and b with a;b (line 5-line 6).
17
Figure 3: Partition Combination Patterns.
18
Algorithm 2 Parents Sequentialization Pattern.
Input: ddg
Output: ddg
1: procedure Parents Sequentialization Pattern:
2: for each node c ∈ ddg.GA do
3: if Starting(c) = {a, b} and Ending(a) = {c} and Ending(b) = {c} then
4: if cost(b) = LOW then
5: (ddg. →D ) ← (ddg. →D ∪{ a, x, b } \ { a, x, c });
6: else if cost(a) = LOW then
7: (ddg. →D ) ← (ddg. →D ∪{ b, x, a } \ { b, x, c });
8: end if
9: end if
10: end for
11: return ddg;
12: end procedure
Sons Sequentialization Pattern. Let a,b and c be three nodes in DDG. If Ending(a)
360 = {b, c} and Starting(b) = {a}, then the dependency from a to c can be mod-
ified to a new dependency from b to c. The detailed description is shown in
Algorithm 3, where another case Starting(c) = {a} is also considered (line 6
-line 7).
19
Algorithm 4 Task Partitioning.
Input: ddg
Output: ddg
1: procedure Task Partitioning:
2: ddg ← Parents Sequentialization Pattern(ddg);
3: ddg ← Sons Sequentialization Pattern(ddg);
4: ddg ← Merge Pattern(ddg);
5: return ddg;
6: end procedure
The partitioning result of the running example is shown in Fig. 4, where the
labels are omitted. For instance, 01 → 06, 02 → 06 are replaced by 01 → 02,
02 → 06 according to the parents sequentialization pattern and then the new
370 node ”01;02;06” is constructed according to the merge pattern.
20
based on Synchronous Transition System (STS) [30]. A VMT is defined by a
set of tasks synchronised by a wait-notify mechanism. Notifications could be
associated to newly computed variables and sent to the reading tasks. However,
385 to reduce the number of notifications, they specify task completion instead of
single variable computations. Static properties make the link between the two
viewpoints and ensure that once a task has received enough notifications, its
required variables have been valued.
In the following, we first introduce tasks, their before-after semantics and
390 then VMTs and their STS-based semantics.
3.2.1. Tasks
A task could be simply defined as guarded assignment as specified by a S-
CGA statement. However, in order to make possible the composition of tasks
as required by the partitioning methods presented in Section 3.1.3, we have
395 introduced a small action language.
Actions. Starting with the Cond and Assign constructors allowing the specifica-
tion of elementary guarded actions, we have added sequence (Seq), if-then-else
(Ite) as well as a Load statement to make explicit the access to memory storage
of past values. Moreover, we have introduced the Notify statement to notify
400 target tasks about the completion of the calculus of some variables. Note that
waits are not explicit: once a task is ready, its action part can execute without
blocking.
The following Coq code defines the abstract syntax of the action language.
The Action type is parameterized by the type Id of variable identifiers which
405 are supposed to have a decidable equality, the type Tid of task identifiers which
are supposed to be iterable (i.e. they can all be put in a list) and the type M of
identifier-data mappings5 .
21
Inductive Action ‘{Id: EqDec} {Tid: Iterable} ‘{M:Mem Id}: Type :=
Skip (* does nothing *)
| Load (v:Var Id) (m:Var Id) (ism:isM M m) (* loads v from memory location m *)
| Notify (tid: Tid) (* notifies target task tid *)
| Assign (v: Var Id) (e: Exp (VarDec Id)) (* assigns expression e to v *)
| Seq (a1: Action) (a2: Action) (* sequential composition *)
| Cond (c: Exp (VarDec Id)) (a: Action) (* conditional execution of action *)
| Ite (c: Exp (VarDec Id)) (ift: Action) (iff: Action). (* if then else *)
420 Tasks. A task is defined in the context of a VMT which is made of a set of tasks
communicating through shared variables and synchronised by notifications. A
task is a tuple Inputs, Counter, Body where,
425 • Counter is the number of notifications that the task waits for before start-
ing its execution. It should be ensured that if the number of notifications
reaches the value of the counter, all input variables are known.
22
• Body is an action defining the behavior of the task, which consists in
computing variables and performing notifications.
430 The Coq definition of a task is shown below. Several auxiliary definitions are
attached to tasks, derived from action observers. They provide helpers for the
definition of wellformedness conditions. The last section defines the run-time
task semantics with the help of the act_run function taking as parameters the
memory contents (sM), the environment of currently known signal variables and
435 the action of the task. It returns the updated environment and for each task
identifier, the set of variables known when notified.
As an example, we define in Coq Task t24 of Figure 5. The body of the task
440 is obtained by using the Cond action constructor to associate the action with its
guard:
23
Program Definition t24: Task TID_it M := {|
inputs := SV.list2set (VarDec VID_dec) [vId m_DC; vId c_C1to6; vId C1to6];
counter := 1;
body := Cond t24_guard t24_action
|}.
This Coq declaration should be completed by the proof of the three prop-
erties attached to tasks and guaranteeing its well-formedness. For example, we
445 prove that the knowledge of the given inputs is sufficient to run the body. It
has to be noted that the value given for the counter cannot be checked here:
the graph of tasks is needed for that and this static check should be done at the
VMT level.
24
Several important wellformedness conditions apply to a VMT and should
be ensured by the translation from the data dependency graph and thus be
465 guaranteed by the static analysis of the source (SIGNAL) model:
Inductive vmt_acyclic ‘{Id: EqDec} (vmt: VMT Id) (tid: TaskId vmt)
(d: Exp (VarDec Id)) : Prop :=
vmt_isReachable: (isSat d -> forall (pid: TaskId vmt) v,
vmt_acyclic vmt pid (eAnd (tk_notifyVar (M:=vmt_mem vmt)
(task pid) tid v) d)) -> vmt_acyclic vmt tid d.
It has to be noted that this acyclicity condition differs from the one derived
475 from other synchronous languages such as LUSTRE where arcs of the depen-
dency graph are unconditional. Thus, the direction of data flows may change
during system execution. This hypothesis has consequences on the acceptabil-
ity of the SIGNAL source code: it should be rejected if it contains some cyclic
conditional dependencies. As a consequence, this property relies on a decidable
480 sufficient condition. We have proved its decidability when arc labels are ignored.
Thus, the static test is for the moment more strict than necessary.
25
• There should exist at most one writer for each variable of the system. More
precisely, the conjunction of writing conditions of the same variable by two
distinct tasks should be unsatisfiable. It is thus possible for two guarded
490 actions to update the same variable if their guards are exclusive. This
can be the case for guarded actions derived from a default construct in
SIGNAL or in the translation of synchronous automata where assignments
would be state dependent.
These properties are decidable because the set of tasks is finite (declared
495 Iterable in Coq) and clock conditions are abstracted as propositional formulas.
• vmt env: the environment containing the value of currently known vari-
505 ables which will eventually constitute the STS reaction: once all tasks are
completed, the environment contains the system reaction and the value of
memorised variables.
• vmt prev: associates a task with the set of tasks from which it has received
510 a notification.
• vmt wrt: associates a variable of the environment with the task that has
produced its value.
Several invariant properties are associated to this structure. They are en-
sured by the initial empty environment (tasks should first read from memory),
26
515 and preserved by each task execution.
• (vmt dreq) input variables of terminated tasks are known by the environ-
ment,
• (vmt dsub) running a terminated task would not create new variable-value
mappings6 ,
520 • (vmt prev) sources of notifications are in the set of terminated tasks,
The fields defining a VMT run-time state together with their invariant prop-
erties are formally defined in Coq as follows:
Record vmt_state ‘{Id: EqDec} (vmt: VMT Id) (wf: VMT_WF vmt) (sM:
vmt_smem vmt): Type := {
vmt_min: SV.set (VarDec Id); (* needed variables *)
vmt_env: Env vmt_min; (* value of known variables *)
vmt_dom := dom vmt_env; (* valued variables *)
27
vmt_dsub: forall t (h: SV.set_In t vmt_done),
isSubEnv (as_env (tk_run (task t) sM (updEnv vmt_env (vmt_dreq t h))))
vmt_env;
vmt_prev: TaskId vmt -> SV.set (TaskId vmt); (* notify sources *)
vmt_pdone: forall t, SV.subset (vmt_prev t) vmt_done;
vmt_cnd: forall t p, SV.set_In p (vmt_prev t) ->
forall h, isTrue (eSem (tk_notifyCond (task p) t) (updEnv vmt_env h));
vmt_count tid := SV.card (vmt_prev tid);
530 A micro-step of the VMT selects a ready task and makes it update the
environment. Notifications and writes to variables are taken into account to
update the corresponding fields. Then proof obligations associated to state
invariants must have been proved. It comes to establish that when a task is
launched, i.e. when its declared counter has been reached, its input variables are
535 known by the environment. This is the main result related to VMT semantics. It
is expressed in Coq as the ability to define the function vmt_step computing the
next state after a micro step when the precondition VMT_enabled is fulfilled (the
task has not yet run and has received enough notifications). The following Coq
fragment only contains the header of the function. Several auxiliary variables
540 are introduced before defining the next state. Then, thanks to the Program
construct proof obligations are generated. They require to prove that all the
stated invariants are preserved. The statement of the invariants together with
the completion of these proofs constitute the main challenge of VMT definition.
Program Definition vmt_step ‘{Id: EqDec} (vmt: VMT Id) (wf: VMT_WF vmt) (sM: vm
545 t_smem vmt) (st: vmt_state wf sM) (en: VMT_enabled st)
: vmt_state wf sM := ...
The VMT runs while some ready task exists, which defines a macro-step
28
(named vmt_steps) in the following Coq code:
Inductive vmt_steps ‘{Id: EqDec} (vmt: VMT Id) (wf: VMT_WF vmt) {sM: vmt_smem
vmt} (st: vmt_state wf sM) : vmt_state wf sM -> Prop :=
vmt_end: (VMT_enabled st -> False) -> vmt_steps st st
| vmt_one: forall (h: VMT_enabled st) st’, vmt_steps (vmt_step h) st’
-> vmt_steps st st’.
The semantics of a VMT as a STS can now be given. The STS state is
550 defined as the set of valued memory locations. For each macro step, a VMT
runtime state is initialised. It contains an empty environment from which a
maximal sequence of micro steps is run. Then, the memory contents is updated
and the reaction label is built from two projections of the runtime state which
contains the value of all the variables making the reaction as well as the value
555 of memory variables.
Definition VMT_sem ‘{Id: EqDec} (vmt: VMT Id) (wf: VMT_WF vmt): sts _ :=
{|
State := vmt_smem vmt; (* memory structure *)
Init := vmt_init vmt; (* memory initialisation *)
Next st r st’ := (* transitions labelled by reactions *)
exists vst’, vmt_steps (vmt_init_step wf st) vst’ /\
r = env2reaction (vmt_env vst’) /\ (* projection to reaction *)
st’ = env2state (vmt_env vst’) st (* projection to memory *)
|}.
29
a guarded action), the corresponding taskId is derived from the variable name
(line 07); the Action field including most of the task body is generated from
the guarded action(line 08); the Inputs field is generated from the Action (line
09); the Counter and Notify are generated according to two rules: for each
570 edge whose ending vertex is the current vertex, their starting vertices are added
to the Counter (line 11 - line 12); likewise, for each edge whose starting vertex
is the current vertex, their ending vertices are added to the Notify (line 13 -
line 14). Then, the generated task is added to the Task field of VMT (line 17).
30
Figure 5: The VMT model of the running example(part).
580 statements (e.g. declared by t7 and t20 in Fig. 5). The Cond of t7 is an if-
structure while the condition of t20‘Cond is omitted because its value is always
true. In addition, the prefix “c “+ x, represents the clock of the x variable (in
). According to the intuitive semantics of guarded actions, the clock
symbol x
“c “+ x is assigned to true before the variable x is computed, otherwise, the
585 clock is set by f alse.
We could associate one Ada task to each DDG node and use the Ada ren-
dezvous mechanism or protected objects to control race conditions. However,
the generated code would be inefficient as it would contain too many tasks.
590 In addition, as mentioned before, the init data and the next update generated
from the delay construct x = x1 $ init c are dealt with outside of the multi-task
partition. The current data before next update, are always reused by the tasks,
i.e., reusing in-cache data is expected. Moreover, sometimes the task’s execution
time is very short. Hence, creating tasks and context-switching between them
595 incur significant overhead.
In this paper, we adopt concurrent JobQueue to support fine-grained par-
allelism for Ada. For instance, one task is created for one core at initialisation
time, a job is a set of data that is processed by a task. Thus the overhead of
creating/destroying tasks and context switching between them can be reduced.
31
Figure 6: Lock-free work-stealing deque.
600 The jobs which belong to a task, are stored in a job queue, and workers are
employed by the job scheduler to process the jobs. Efficient job scheduling im-
proves resource utilization by automatically load-balancing jobs across workers,
thereby enhancing the overall performance of the computation. In order to guar-
antee load balancing, we have chosen the lock-free work-stealing deque [22] [23]
605 to implement the parallel computation of DDG (Fig. 6): Each job corresponds
to one procedure in Ada, and each worker is bound to a specific core with one
local deque. The deque’s owner worker pushes and pops local job to and from
the deque’s bottom, and steals a job from other local deque if its deque becomes
empty.
610 The type TID is used to specify the number of available cores provided by
execution platforms.
32
generic
type TID is range < >;
-- ...
with procedure Run ( O : Object ; Id : TID );
package Worker is
procedure submit ( tsk : Object );
-- ...
end Worker ;
-- main . adb
type TID is new Integer range 1.. N ; -- N workers
type job is access procedure ( Id : TID );
procedure Run ( A : job ; Id : TID ) is
begin
A . all ( Id );
end Run ;
package Workers is new Worker ( TID , job , null , Run );
615 • PopBottom: poping an object from the bottom of the deque if the deque
is not empty, otherwise returning Empty;
generic
type Object is private ;
EMPTY : Object ;
package LocalQueue is
type Deque is limited private ;
620
-- ...
procedure PushBottom ( P : in out Deque ; Obj : in Object );
function PopBottom ( P : in out Deque ) return Object ;
function Steal ( P : in out Deque ) return Object ;
end LocalQueue ;
33
To implement the Wait/Notify mechanism, a lock-free counter should be de-
fined by calls to Lock free Try Write 32 from the Ada library System.Atomic -
Primitives, which atomically modifies a variable if it contains the expected value.
Each job has one counter with an initial value, which is the number of jobs it
625 depends on. When one of them is completed, the value decreases by 1 (i.e.
calling the procedure decr once). If the return value of decr z is zero, then the
job can be executed.
package LockFreeCounter is
type counter ( init : integer ) is tagged record
value : integer := init ;
end record ;
procedure decr ( C : in out Counter ; z : out integer );
-- ...
end LockFreeCounter ;
package body LockFreeCounter is
-- ...
procedure decr ( C : in out Counter ; z : out integer ) is
V : uint32 := Uint32 ( C . Value );
begin
loop
exit when L o c k _ F r e e _ T r y _ W r i t e _ 3 2 ( C . Value ’ Address , V , V -1);
end loop ;
z := Integer ( V ) -1;
if z =0 then C . value := C . Init ; end if ;
end decr ;
end LockFreeCounter ;
The other transformations from VMT to Ada are trivial: The init function
630 generated from Init is defined in the program body of the main, each task of
VMT is mapped to a procedure (or job). The procedure next generated from
mem is fired when the global synchronisation happens. It updates memory for
the next big step. In addition, all variable-declarations containing input/out-
put/local variables are transformed into global variables in Ada.
635 For instance, the Ada code generated from the running example is shown
below. Firstly, initialised variables are declared in the structure “begin ... end
34
Main“. Secondly, all jobs corresponding to tasks in VMT with empty counter
value are put into the lock-free work-stealing deque. Thirdly, the number of
workers is set to the number of available CPUs in the target platform to achieve
640 the fastest execution speed. Finally, when the counter value of c next is zero,
memory is updated, the deque is reinitialised and the value of three outputs is
recorded.
645 As mentioned in Fig.1, the MTCodeGen prototype tool also adopts a modu-
lar architecture, which is implemented in the functional programming language
OCaml. The statistical OCaml code of each module is shown in Table 1.
The architecture of the MTCodeGen tool consists of three layers: infrastruc-
ture, compilation and application, which is shown in Fig. 7.
35
Table 1: Main Modules of the MTCodeGen prototype tool.
36
650 The infrastructure layer specifies that the tool is developed on the OCaml
Eclipse plug-in OcaIDE 7 .
The compilation layer focuses on the compilation process from the source
OCaml project to the MTCodeGen plug-in. Firstly, the whole project is com-
piled into an execution file, i.e. the MTCodeGen compiler, by using the OcaIDE
655 environment; then the target plug-in is generated from the execution file accord-
ing to the instantiation mechanism of Eclipse.
The application layer includes two particular applications of the MTCode-
Gen compiler: Firstly, the compiler can consider SIGNAL models with a con-
figuration file as the input, and generate multi-task Ada code. Secondly, the
660 compiler has already been integrated with the AADL modelling environment
OSATE 8 , to support the co-modelling with AADL and SIGNAL, and code
generation.
5. Evaluation
We have conducted three case studies for evaluating our approach. The case
665 studies have been selected to address and balance several considerations.
The Guidance, Navigation and Control (GNC) system is a core system sup-
porting orbiting operations of spacecrafts, which undertakes the tasks of de-
termining and controlling spacecraft attitude and orbit. GNC is composed of
670 navigation sensors (such as navigation cameras, star sensors, gyroscopes, and
accelerometers), actuators (such as reaction flywheels, nozzles, orbit-controlled
engines), and control computers (AOCS) which process the guidance and con-
trol tasks of various sensors, perform orbit determination, orbit control, attitude
determination and attitude control. In addition, a data process unit (DPU) is
675 usually added between navigation sensors and AOCS to pre-process data sent by
7 http://www.algo-prog.info/ocaide/
8 https://osate.org/
37
navigation sensors according to engineering guidelines. A simplified architecture
of the GNC system is given in Fig.8.
38
Table 2: Statistical data of the GNC model.
GNC component Language size(line)
navigation cameras AADL 100+
star sensors AADL 100+
sensors gyroscopes AADL 100+
...
reaction flywheels AADL 100+
nozzles AADL 200+
actuators orbit-controlled engines AADL 100+
...
AD’s Architecture AADL 4000+
DPSS BA/SIGNAL 200+/200+
AD Shadow Region Detection BA 300+
...
OCn’s Architecture AADL 3500+
OCn COE BA/SIGNAL 300+/300+
Argument of Periapsis BA/SIGNAL 150+/100+
AOCS
...
AC’s Architecture AADL 4200+
EID SIGNAL 200+
AC
Capture Earth BA 200+
...
OCl’s Architecture AADL 2000+
OCl ...
Total AADL 20000+
BA 2400+
SIGNAL 2000+
39
Table 3: Statistical data of generated code of three cases.
The statistical data of Ada code generation (three case studies) is shown in
Table 3. Here, we use CASE A to illustrate the whole compilation process of
Ada code generation. For the CASE B, the Data Dependency Graph can refer
705 to Appendix B. In addition, the details of CASE C have already been shown
in the running example.
In CASE A, it involves two kinds of hardware devices: three sun sensors of
the Satellite (Sa, Sb, Sc) and a sun sensor of the Solar Array (SA), each sun
sensor has four batteries. The system receives the input data from the hardware
710 devices, performs the data processing (including 4 parallel sub-processes) and
sends the results to other subsystems (e.g. Data Processing of Star Sensor).
The main requirement of CASE A consists of:
• Req1.1: Converting the source data of the sensors (Sa, Sb, Sc) to the
corresponding voltage value.
40
715 • Req1.2: Computing the voltage value of the four batteries of each sensor,
if a sensor doesn’t satisfy the related constraint, resetting the solar angle
to zero, otherwise calculating the solar angle.
• Req1.3: Computing the filter of each solar angle by the filter algorithms.
• Req1.4: Using the data from two sensors (Sb and Sc) to calculate the
720 projection of the sun vector in the satellite celestial coordinate system.
• Req2.1: Converting the source data of the sensor (SA) to the correspond-
ing voltage value.
Three cases are also used to experiment various code generation strategies
comparisons for SIGNAL under a specific multi-core platform. The experiment
contains purpose, environment, strategies, process, result, analysis and conclu-
sion.
740 Experiment Purpose: We envision providing an experiment framework
to the industry engineers. Three modules (Case A, Case B and Case C), i.e.,
41
Figure 9: The compilation process of CASE A.
42
a part of real code are used in the experiment framework. In the experiment,
the goal is to compare the code generation strategies and test the validity of
aforementioned combination patterns. Without loss of generality, the industry
745 engineers can put all of the real code on this framework to use the concurrency.
Experiment Environment: The environment from our laboratory in-
cludes: 8-cores i7-7700 CPU 3.600GHz, 16G RAM, Ada2012 and the IDE of
Ada (GNAT 7.3.0).
Experiment Strategies: Four strategies are listed below:
750 • Coarse-grained: Multi-task code generation adopting the typical Ada ren-
dezvous mechanism.
• Schneider: Multi-task code using the vertical task partition method [13].
43
Figure 10: The experiment results of CASE A/B/C on multi-core
44
Figure 11: The experiment results of CASE A
JobQueue which indeed reduce the execution time of target code. Fig.11 shows
three experiment results of CASE A: The blue line is the execution time of
the target Ada code using JobQueue on different cores; The red line shows the
790 result of the target code adopting both JobQueue and the combination patterns;
The green one records the result of the target code using both the concurrent
JobQueue and the combination patterns.
Comparing the blue line with the red one, it presents that the combination
patterns reduce the execution time because these patterns reduce the number
795 of tasks of the target code and cut down communication consumption by means
of merging some tasks that are potentially suitable for sequential execution.
Comparing the red one with the green one, it shows that the concurrent
JobQueue method further reduces the execution time: the concurrent JobQueue
adopts the work-stealing deque method to achieve better load balancing and
800 replaces the Ada rendezvous mechanism with the lock-free call.
Note that, all three kind of strategies suffer from a higher execution time
when the number of cores is 6 or 8, one potential reason is that each node in
the Fig 9 (b) has low computation (few equations/statements) and the cost of
tasking administration could be greater than the cost of tasking computation
805 along with the increase of the number of CPUs. To validate it, each node
performs high computation, and Fig. 12 shows there is a positive correlation
45
between the cores’ number and the execution efficiency.
6. Related Work
Several compilers for synchronous languages have been proposed, such as the
815 commercial SCADE KCG code generator [32], and the academic LUSTRE V6
[33], Heptagon [34], ESTEREL V5 92 9 , Averest 10
for QUARTZ, Polychrony
for SIGNAL, and so on. With the advent of multi-core processors, automated
synthesis of multi-task code from synchronous languages has gradually become
a hot research topic.
820 Here, we classify the related work based on different synchronous languages.
For a synchronous program, several levels of parallelization are possible, such
9 http://www-sop.inria.fr/esterel.org/files/Html/Downloads/Downloads.htm
10 http://www.averest.org/
46
as inter-block parallelization (coarse-grained), intra-block parallelization (fine-
grained), etc. Moreover, task partition, synchronization, mapping and schedul-
ing are the main topics in the multi-task code generation for synchronous lan-
825 guages.
(1) LUSTRE
Graillat et al. [11] consider the top-level node of a LUSTRE application
as a software architecture description where each sub-node corresponds to a
potential parallel task. Given a mapping (tasks to cores), they automatically
830 generate code suitable for the targeted many-core architecture. However, they
focus on a minimal case where only the direct sub-nodes of the main node are
implemented as parallel tasks.
Souyris et al. [35] propose a solution for automatic parallel code gener-
ation from LUSTRE/Heptagon models with non-functional specification (e.g.
835 period). It is formed of two parts: the specification of each sequential task as
a synchronous program (nodes), and the integration specification. Each task
specification is compiled into sequential C code using a classical LUSTRE/Hep-
tagon compiler. The integration specification describes how tasks communicate
and synchronize. It is taken as input by the parallelization tool. So, they mainly
840 consider node-level parallelization.
(2) PRELUDE
Pagetti et al. [7] introduce a real-time software architecture description
language, named PRELUDE, which is built upon the synchronous language
LUSTRE and which provides a high level of abstraction for describing the func-
845 tional and the real-time architecture of a multi-periodic control system. They
have given a compilation from PRELUDE to multi-task execution on a mono-
processor real-time platform with an on-line priority-based scheduler such as
Deadline-Monotonic or Earliest-Deadline-First. [8] describes a static mapping
from dependent real-time task sets which are specified by PRELUDE, to a
850 many-core platform. Furthermore, it gives a lightweight run-time environment
for scheduling and execution of the resulting real-time system. Thus, their main
consideration is the mapping and scheduling of multi tasks on a platform.
47
(3) SCADE
In [9], ANSYS presents a first step and an overview of the generation of
855 parallel code from the SCADE application. Its principle is to rely on paral-
lelism annotations on the model that does not affect the semantics but tells the
compiler to generate independent tasks that communicate through channels.
The generated set of tasks form a Kahn Process Network (KPN). The actual
implementation of the generated set of tasks on the final platform, as well as its
860 timing analysis, is done afterwards and outside of the language.
The work [9] mainly focuses on the structure of generated code. Based on
it, ANSYS gives a detailed extension of SCADE to generate parallel code that
targets execution on Infineon’s latest generation AURIX multi-core processor
[10].
865 However, the solution of ANSYS requires the user to specify how to partition
the model for parallel execution with annotations parallel subsets.
(4) SIGNAL
In terms of multi-task code generation for SIGNAL, the report [36] describes
multi-task code generation strategies available in the Polychrony toolset, includ-
870 ing clustered code generation with static and dynamic scheduling, distributed
code generation. Jose et al. [18] propose a process-oriented and non-invasive
multi-task code generation using the sequential code generators in Polychrony
and separately synthesise some programming glue. Our previous works [16][19]
present a sequential/multi-task code generator for SIGNAL.
875 Comparing with existing work of multi-task code generation for SIGNAL,
this paper focuses on improving the efficiency of target code when applied to
real-world aerospace industrial cases, by supporting of fine-grained parallelism
with the concurrent JobQueue pattern.
(5) ESTEREL
880 Li et al. [37] present a multi-threaded processor that is the KEP3a, which
allows the efficient execution of concurrent ESTEREL programs.
Yuan et al. [38] propose two distinct approaches that distribute ESTEREL
threads evenly across multi-core architectures. The first approach statically
48
distributes threads based on the computation intensity approximated by the
885 number of instructions generated from each thread. The second approach dis-
tributes threads dynamically using a thread queue that dispatches a thread
whenever a core becomes idle.
In general, compared with the data-flow synchronous languages such as
LUSTRE, SCADE, PRELUDE, SIGNAL, and so on, ESTEREL offers con-
890 trol flow primitives to express reactive behaviors. As the threads within an
ESTEREL program are tightly coupled, the distribution technique introduced
in these works depends on the number of concurrent execution paths without
data dependencies.
(6) QUARTZ
895 Baudisch et al.[13] propose two synthesis procedures generating multi-threaded
OpenMP-based C code from QUARTZ by vertical/horizontal partitioning re-
spectively.
Furthermore, in [14], they show an automatic synthesis procedure that trans-
lates synchronous programs to software pipelines. Thereby, the original system
900 does not need to be divided into threads, but they are automatically generated
by cutting the original system into pipeline stages. It is based on pipelining
these programs before turning them into OpenMP-based C-Code. By connect-
ing all parts of the implementation by FIFO buffers, the execution of the stages
can be desynchronised.
905 Compared to our approach, their work also consider fine-grained parallelism.
However, our target language is Ada and we introduce concurrent JobQueue to
support fine-grained parallelism in the Ada multi-task model. In OpenMP a
structured concurrency is enforced and we do not always have such a structure.
(7) Other variants of synchronous languages
910 Li et al. [39] present the transformation from synchronous SystemJ code to
implementation on two types of time-predictable cores, the evolutionary algo-
rithm is used to evaluate multi-core scheduling solution for finding guaranteed
reaction time of real-time synchronous programs for multi-core targets. It aims
at finding the mapping and schedule of synchronous programs that guarantees,
49
915 statically, reaction times when mapped onto a multi-core platform.
Yip et al. [40] introduce the ForeC language that enables the deterministic
parallel programming of multi-cores. ForeC inherits the benefits of synchronous
language ESTEREL, such as determinism and reactivity, along with the benefits
and power of the C language, such as control and data structures. The ForeC
920 compiler generates statically scheduled code for direct execution on a predictable
parallel architecture. The aim is to generate code that is amenable to static
timing analysis.
Synchronous languages are widely adopted for the design and verification
925 of safety-critical systems. With the advent of multi-core processors, multi-task
code generation for synchronous languages has become a trend. MiniSIGNAL
is a multi-task code generation tool for SIGNAL. The existing MiniSIGNAL
code generation strategies mainly consider coarse-grained parallelism based on
Ada multi-task model. However the generated code is still inefficient when we
930 apply the tool to the real-world aerospace industrial cases. Therefore, this
paper presents a new multi-task code generation method for MiniSIGNAL,
which supports fine-grained parallelism. Our method first generates a platform-
independent multi-task structure (VMT) from the intermediate representation
S-CGA, then generates target Ada code with the concurrent JobQueue pattern
935 from VMT. Moreover, the formal syntax and the operational semantics of VMT
are mechanised in the proof assistant Coq, to support the semantics preservation
proof of the new multi-task code generation strategy proposed by this paper in
the future. Finally, the industrial case study has shown that the approach is
feasible.
940 We will consider to introduce this new Ada parallel model proposed in Ada
202x. With the widespread advent of multi-core processors, it further aggravates
the complexity of timing analysis. For instance, FAA has published the CAST-
32A document[41] and some recommendations for time-predictability on multi-
50
core, that is the timing behavior of a system must be analyzable and validable
945 off-line. An interesting work is to estimate worst-case execution time (WCET)
of SIGNAL programs running on multiprocessors. Separate compilation of syn-
chronous programs is also an important issue [42]. Constructive semantics [27]
of SIGNAL provides a basis for the separate compilation of SIGNAL programs
and we can implement it with several technologies such as interface theory. In
950 addition, we are currently working on the whole proof of semantics preservation
of MiniSIGNAL in Coq.
955 References
51
[6] K. Schneider, J. Brandt, Quartz: A Synchronous Language for Model-
970 Based Design of Reactive Embedded Systems, Springer Netherlands, Dor-
drecht, 2017, pp. 29–58.
[10] B. Pagano, C. Pasteur, G. Siegel, A Model Based Safety Critical Flow for
the AURIX Multi-core Platform, in: ERTS 2018, 9th European Congress
on Embedded Real Time Software and Systems (ERTS 2018), Toulouse,
985 France, 2018.
URL https://hal.archives-ouvertes.fr/hal-02156195
52
[13] D. Baudisch, J. Brandt, K. Schneider, Multithreaded code from syn-
chronous programs: Extracting independent threads for openmp, in: De-
sign, Automation & Test in Europe Conference & Exhibition (DATE 2010),
IEEE, 2010, pp. 949–952.
1005 [16] Z. Yang, J. Bodeveix, M. Filali, K. Hu, Y. Zhao, D. Ma, Towards a ver-
ified compiler prototype for the synchronous language SIGNAL, Frontiers
Comput. Sci. 10 (1) (2016) 37–53.
1020 [22] I. Shams, S. Vivek, Load balancing prioritized tasks via work-stealing, in:
J. L. Träff, S. Hunold, F. Versaci (Eds.), Euro-Par 2015: Parallel Process-
ing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2015, pp. 222–234.
53
[23] D. Chase, Y. Lev, Dynamic circular work-stealing deque, in: P. B. Gibbons,
P. G. Spirakis (Eds.), SPAA 2005: Proceedings of the 17th Annual ACM
1025 Symposium on Parallelism in Algorithms and Architectures, July 18-20,
2005, Las Vegas, Nevada, USA, ACM, 2005, pp. 21–28.
[28] P. Le Guernic, J.-P. Talpin, J.-C. Le Lann, Polychrony for system design,
Journal of Circuits, Systems, and Computers 12 (03) (2003) 261–303.
54
currency to System Design (ACSD 2004), 16-18 June 2004, Hamilton,
Canada, IEEE Computer Society, 2004, pp. 67–78.
1055 [32] J. Colaço, B. Pagano, M. Pouzet, SCADE 6: A formal language for embed-
ded critical software development (invited paper), in: F. Mallet, M. Zhang,
E. Madelaine (Eds.), 11th International Symposium on Theoretical Aspects
of Software Engineering, TASE 2017, Sophia Antipolis, France, IEEE Com-
puter Society, 2017, pp. 1–11.
55
[38] S. Yuan, L. H. Yoong, P. S. Roop, Efficient Compilation of Esterel for
Multi-core Execution, Research Report RR-8056, INRIA (Sep. 2012).
1080 [39] Z. Li, H. Park, A. Malik, I. Kevin, K. Wang, Z. Salcic, B. Kuzmin, M. Glaß,
J. Teich, Using design space exploration for finding schedules with guar-
anteed reaction times of synchronous programs on multi-core architecture,
Journal of Systems Architecture 74 (2017) 30–45.
56
1095 Appendix A. The SIGNAL model of the running example (CASE
C)
57
27. | add DC := (tmp DC + 1) when C1to6
28. | count DC := add DC default tmp DC
1130 29. |)
30.where
31. integer C1 DC, C2 DC, C3 DC, C4 DC, C5 DC, C6 DC;
32. integer tmp DC, add DC;
33. boolean C1, C2, C3, C4, C5, C6, C1to6;
1135 34. boolean jet sign T, jet sign F;
35. integer djet DC;
36. real f;
37.end;
58
Appendix B. The data dependency graph of CASE B
59