Logical Execution Time Implementation in AUTOSAR
Logical Execution Time Implementation in AUTOSAR
Abstract—The 2017 FMTV challenge has been extended to we developed two algorithms to solve the problem: a simple
consider with better precision the details of the HW platform; the Genetic Algorithm solution and an MILP formulation.
need for synthesis and optimization methods, and also introduces We provide the results of these two optimization methods
for consideration the Logical Execution Time (LET) model.
In this paper we highlight some of the problems and issues with an additional discussion on how to tackle the runnable
that relate to the implementation of the LET model and then placement optimization problem, which is most likely the most
we present and compare two approaches for optimizing the relevant design issue for a system like this.
placement of the labels in memory, including the time analysis
methods that will be used for the system. The paper concludes II. S YSTEM M ODEL AND N OTATION
with a discussion on the next steps and other fundamental
issues that are related to the general problem of optimizing the The challenge model is a case study from the Amalthea EU
placement of computations in multicore platforms.
project and it is in large part compliant with the AUTOSAR
metamodel. As such, the model adopts from AUTOSAR
I. I NTRODUCTION
definitions and most of the semantics for the activation and
The FMTV 2017 challenge consists of a timing analysis communication of functions (runnables in AUTOSAR). An
problem in which the AUTOSAR model of a set of cooperating attempt at the formal characterization of the challenge model
tasks in a fuel injection application is deployed onto a 4-core is the following.
platform. The objective of the challenge is to study the possible Task and runnable model. A task τi is composed of an
conditions for the implementation of the Logical Execution ordered sequence of ni runnables ρi,1 , . . . , ρi,ni , each of
Time (LET) model in the runnables communication and to which has its execution time Ci,j , defined as a truncated
provide methods for the analysis of the memory allocation Weibull distribution. For the purpose of worst-case analysis,
of the communication variables (labels) in the model. The the worst-case execution time (WCET) Ci,j and a best-case
variables need to be allocated in the available memory spaces execution time ci,j may be computed from the distribution
(local and global) of the AURIX microcontroller. Ci,j . Each runnable ρi,j may read or write labels from a set
The LET model was introduced as part of the Giotto L = {l1 , l2 , . . . , lp }. Each label li is characterized by a type
programming paradigm [1] as a method to eliminate output and a size (an integer number of bytes). Each task is defined
jitter and provide time determinism in the code implementation by a tuple τi = {Ci , Ti , Li , Di }, where Ci is the execution time
of controls. In essence, the LET delays the program output of distribution of the task, simply computed as the convolution
a task (or any function executed inside the task) at the end of of the distribution of the task runnables (by extension Ci and
the task period, trading delay for output jitter. ci are the worst and best case task execution times); Ti is
The analysis of the LET implementation is performed under the period or minimum inter-arrival time of the task activation
the assumption of the mechanisms and tools that are typical event(s); Li denotes the set of labels accessed by τi ; and Di is
of an AUTOSAR process. In AUTOSAR, the computation the relative (to the activation time) deadline. When applicable,
functions are called runnables and the communication im- relative deadlines are constrained to be smaller than or equal
plementation is provided by a layer of code automatically to periods, i.e., Di ≤ Ti . Ni,v denotes the number of times τi
generated by tools: the Run Time Environment or RTE. The accesses label `v ∈ Li .
consideration of the AUTOSAR process greatly influences In the worst case (the reasoning also applies to other types of
the implementation options. For the LET implementation analysis but we only discuss the worst-case analysis here), the
we discuss two possible implementation options: one that execution time Ci,j of a task may be expressed as the sum of
is compatible with the current mechanisms and tools of the the runnable execution times in the task. The execution times
standard AUTOSAR process, the other with a simple extension provided with the challenge do not include the execution cost
to the AUTOSAR implicit communication implementation to read and write the memory labels.
(providing for a much more efficient solution). The scheduling of each task is also controlled by its schedul-
For the label placement optimization problem, we discuss ing mode (cooperative or preemptive) and its priority πi , with
a simple method to bound the worst case latency when real- preemptive tasks having higher priority than cooperative tasks,
time tasks access a memory bank possibly competing with and cooperative tasks only preempting each other at runnable
other tasks. Using the provided bound for the memory latency, boundaries.
We denote as Ri,j the worst-case response time of the j-th by the upward arrow at the end of the box representing the
runnable of task τi , while ri,j denotes its best-case response task execution) has a significant jitter. Because of variable
time. hpP (i) and hpC (i) denote the set of preemptive and interference from τ1 , it occurs late in the first task instance and
cooperative tasks, respectively, having priority greater than τi . much earlier in the second. The LET solution is shown in the
We denote as hp(i) = hpP (i) ∪ hpC (i) the union of the two bottom timeline for task τ3 (taken as an example). The input
disjoint sets. of the task data is performed at the task activation, and the
Platform model. There are m = 4 identical processors output is performed at the end of the task execution period. All
P1 , . . . , Pm . There are four local memories M1 , . . . , Mm (one task inputs are stored in local variables at the task activation.
for each core) and a global memory Mm+1 . The platform Similarly, all outputs need to be stored in local variables and
disposes of a crossbar switch that provided point-to-point will be actually output only by the LET code at the end of
communication channels between each core and each memory. the cycle. This requires to allocate memory for local variables
Concurrent accesses to memory are arbitrated with a FIFO mirroring all input and output variables.
queue. Several mechanisms can be used to enforce the LET syn-
Task and label allocation model. The allocation of the tasks chronization of input and output operations, as a hardware or
is fixed and given in the provided Amalthea model. P (τi ) software implementation. In essence, LET is a sample and
denotes the processor to which τi is allocated; Γ(Pk ) denotes hold mechanism with synchronized execution of the input and
the set of tasks allocated to processor Pk ; and Γ(τi ) denotes output part. As such it is not too dissimilar to mechanisms
the set of tasks allocated to the same processor to which τi used to enforce flow preservation in the implementation of
is allocated. An allocation of the labels is also provided in synchronous models [2], [3]. When LET is implemented in
the Amalthea model. The following notation is used when SW (the HW implementation would not affect the design
discussing the label allocation. Mk denotes the set of labels analysis or the challenge goals) assuming a typical AUTOSAR
allocated to memory Mk . λR denotes the delay introduced by development model (for more information on the related
task during a conflict to a remote memory, while λL denotes assumptions please refer to [4], [5]), there are two main
the delay introduced by task during a conflict to its local options:
memory. The maximum time needed to access a word into
memory Mx from processor Px is denoted by • LET is implemented as part of the Run-Time Environ-
( ment (RTE) with support from the basic SW;
δ L if k = x (local memory), • LET is implemented at the application level by a set of
∆k,x =
δ R otherwise. dedicated runnables.
where δ R denotes the time needed to access a remote In both cases, since it requires a dedicated set of tasks
memory (GRAM or local RAM of another processor), and (and the corresponding scheduling configuration), the LET
δ L denotes the one needed to access the local memory Mk . implementation will most likely be modeled as a set of RTE
We assume λL = δ L and λR = δ R . Based on the challenge or application-level input and output tasks. Since it is required
information, the memory access and conflict times are λL = 1 that the input and output operations of these tasks are executed
cycle and 5 ns; λR = 9 cycles and 45 ns; as close as possible to the start and end of period instants,
Finally, with respect to a given allocation of the labels, the these tasks should be characterized by a very short WCET
time M Ai,v needed to access label `v ∈ Li from task τi is and a very high priority level. This has several implications
defined as M Ai,v = ∆k,x where Pk = P (τi ) and Mx is that are further discussed in the implementation section.
the memory in which `v is allocated. The same terminology • There may be more than one task dedicated to the input
applies to runnables. and output sampling for LET execution. If this is the
A. The LET model of execution case, then these tasks will internally preeempt each other
The Logical Execution Time model was probably first and the design of this additional set of tasks may be a
presented as part of the Giotto project [1]. The objective subproblem in its own.
of the LET model is to add time determinism to periodic • The execution of the output task (or action) at the
computations by eliminating the output jitter. end of the period may be very difficult to obtain wth
conventional scheduling strategies (“as late as possible
execution” is typically not supported). In this case the
τ1 output task needs to be actually executed at the begin-
ning of the next cycle, possibly in conjunction with the
τ2
corresponding input task (in a back-to-back fashion).
τ3
from input to program variables from program to output variables III. T IMING ANALYSIS WITH MEMORY CONTENTION
LET τ3
LET input LET output
This section presents a response-time analysis for tasks
Fig. 1. The LET model of execution. under partitioned fixed-priority scheduling that explicitly ac-
counts for the delay introduced by memory accesses and their
The LET execution model can be summarized as depicted corresponding memory contention. The same analysis can be
in Figure 1. In the figure, the output of task τ2 (denoted extended to runnables in a seamless manner.
Under the assumption of constrained deadlines, the worst- Equation (4) can be used in Equation (1) to bound the
case response time of a task τi is bounded by the least positive response times of the tasks. The term N RAi,k,x (t) depends
fixed-point of the following recurrent equation: on the response time of the tasks allocated to the remote pro-
X Ri cessors: this additional recursive dependency can be addressed
Ri = Wi + Wj + M Ci (Ri ) (1) with an iterative loop in which Equation (1) is solved for all
Tj the tasks until all the response-time bounds Ri converge. Such
τj ∈hp(τi )
τj ∈Γ(τi ) an iterative loop starts with Ri = Ci for all tasks τi .
P
where Wi = Ci + `v ∈Li Ni,v · M Ai,v (i.e., the worst- IV. I MPLEMENTING AND A NALYZING THE L OGICAL
case execution time of the task plus the cost for accessing E XECUTION M ODEL IN AUTOSAR
its labels) and M Ci (Ri ) represents the delay due to memory This section discusses solutions for the implementation and
contention incurred by τi and all the high-priority tasks, which the analysis of the LET model in AUTOSAR.
transitively affect the response time of the task under analysis.
Since memory contention is resolved according to the FIFO A. LET implementation as part of the RTE generation
policy, a safe bound on the term M Ci (Ri ) can be obtained by The discussion on the implementation of the LET model
simply inflating the terms Wi to account for m−1 contentions cannot be undertaken without the joint consideration of the
for each memory access. However, this approach may lead typical AUTOSAR model for the generation of task code and
to excessive pessimism, thus resulting in very coarse upper- the execution of runnables. AUTOSAR has two models of
bounds on the response times. communication. In the explicit model (top of Figure 2) the
In this work, we use the inflation-free analysis [6], [7] copy of the data in the communication variables is performed
to bound the blocking times for a synchronization protocol at the time each runnable invokes the communication API
for multiprocessor systems. Inflation-free analysis explicitly function. The implementation of the LET model in this case,
accounts for each memory access that may originate a con- would require the definition of two LET runnables that act
tention while task τi (under analysis) is pending. To this end, as proxies for the read and write operations. The reader and
we proceed by bounding the maximum number of accesses writer runnable should execute according to the pattern defined
N RAk,x (t) issued by tasks executing on the remote processors in the following section.
Pk 6= P (τi ) to each memory Mx in an arbitrary time window In the implicit model, even if a read or write operation is
of length t, that is invoked by the runnable in the middle of its execution, the
X X
t + Rj
actual code implementing the read from and write into the
N RAk,x (t) = Nj,v . (2) shared variables is automatically generated as part of the RTE
Tj
τj ∈Γ(Pk ) `v ∈Lj ∩Mx code at the beginning and at the end of the runnable code.
The result of the read operation is sampled at the beginning
Note that the above equation considers the sum over all the
of the runnable execution and then stored in a local variable
tasks allocated to Pk as they can produce memory contention
for the duration of the runnable execution. Similarly, the write
independently of their priority (FIFO arbitration). The term
value is locally stored in a variable and then output by RTE
d(t + Rj )/Tj e is a safe bound on the maximum number of
code after the runnable execution (shown in the middle of
jobs of τj ∈ Γ(Pk ) in any time window of length t [6], [7].
Figure 2, the darker rectangles before and after the runnable
Similarly, we also bound the number of accesses N LAi,x (t)
execution represent the RTE code). If the RTE generation tools
issued by the local processor P (τi ) to each memory Mx in an
are not modified, the LET implementation in this case would
arbitrary time window of length t while τi is pending, that is
require yet another set of runnables and an additional set of
X X t variables, which is clearly a source of additional memory and
N LAi,x (t) = Ni,v + Nj,v . (3) time overhead.
Tj
τj ∈hp(τi ) `v ∈Lj ∩Mx
τj ∈Γ(τi )
Due to the FIFO arbitration and the fact that the memory
accesses are non-interruptible, it follows that (i) each memory
access issued by a remote processor can delay at most one
access issued by the local processor and (ii) each access issued
by the local processor can be delayed by at most one remote
access per processor; hence the following bound holds:
X m+1
X
M Ci (t) = min {N RAk,x (t), N LAi,x (t)}·Λk,x ,
Pk 6=P (τi ) x=1
(4) Fig. 2. Code implementation of the LET model of execution with explicit or
where the term Λk,x is provided to distinguish the delay implicit communication.
introduced by the memory contentions as a function of each
pair (Pk , Mx ), and is defined as However, it is relatively straightforward to see how a simple
( modification of the RTE generation process for the implicit
λL if k = x (local conflict), communication model would be the best solution. A simple
Λk,x =
λR otherwise. RTE generation option could result in moving the input and
output to the LET tasks rather than the runnable boundaries. To optimize the system configuration based on the worst
The RTE generator could generate the LET input and output case behavior, we need to restore feasibility and to definie a
tasks together with the other RTE-generated code. suitable cost function. To restore feasibility, we consider the
mean execution times in place of the worst-case.
B. Reader and Writer Tasks, Definition and Analysis As a cost function, after the discussion on the forum and
The general architecture and scheduling of the input and based on the recommendations of the organizers, we selected
output tasks in a LET model is discussed at lenght in [8]. In the maximum normalized (with respect to the deadline) re-
their work, input and output tasks are scheduled together with sponse time of the tasks, as in the function
mode change tasks assuming a time-triggered schedule with Ri
jitter constraints for the input and output operations. C= max , (5)
τi ∈Γ(Pk ),∀Pk Di
In the case the tasks are implementing AUTOSAR
runnables, the input and output tasks can serve all the tasks VI. E ND - TO - END LATENCY
executing at the same rate. Of course, if a task has runnables We adopted the analysis provided in [9] to compute the
executing at multiples of the task period, the corresponding end-to-end latency of the effect chains.
input and output sections can be skipped when unnecessary. However, in order to consider the influence of sporadic com-
The input and output LET tasks may be scheduled using the putational activities, the best-case response time computation
AUTOSAR time-triggered mode when available, in order to for a runnable ρi,j must be corrected as follows:
ensure the output task is executed right before the end of the j
period of the tasks it serves. In a priority-based schedule, like X X
ri,j = ci,h + Nkact ck (6)
the one assumed in the challenge, the output and input tasks
h=1 k∈hp(i)
may be joined and executed back to back at the beginning
of each period. To arbitrate among the input and output where Nkact is defined as:
operations for the tasks executing at different rates there are ri,j
several options between two extremes: one is to have a single d Tk e − 1 for periodic tasks;
Nkact = (7)
LET task executing at the greatest common divisor of the task 0 for sporadic tasks.
periods (most likely inefficient for the challenge model). The The ISRs have been considered as sporadic tasks: this choice
other extreme is to have a LET task for each period. Of course has been adopted because the maximum inter-arrival time of
partial groupings may be possible and may be more efficient the ISRs provided in the Amalthea model seems too close to
in some cases. the minimum one.
In our challenge model we assume a LET task for each In this paper, the computation of end-to-end latencies is
period. LET tasks have higher priorities than the other tasks provided only for the case of explicit communication. The
(to enforce the precedence constraints) and we assign their case for LET-based communication is straightforward (modulo
relative priorities according to Rate-Monotonic. some minimal interference caused by high-priority LET tasks).
1) Effect Chain 1: In the effect chain 1, all runnables
V. T HE CHALLENGE MODEL
belong to the same task (Task 10ms, allocated to core 3). As
The provided challenge model has a set of special charac- there is backward communication between the third and the
teristics that affect the analysis and optimization methods and fourth runnable, this adds a one cycle delay until the last datum
strongly characterize the obtained results. is read. Therefore, the worst-case end-to-end latency of this
First and foremost, the tasks are allocated on the cores effect chain by L2F can be computed as:
accroding to a specific strategy. The first core only executes
LL2F
1 = T10ms + R10ms,107 (8)
interrupt service routines. The same is true for the fourth core,
that also executes a 10ms periodic task. The second core only This result is valid also when considering the L2L semantics.
executes (most likely details are not provided) a variable rate As for the F2F semantics, the analysis needs to consider a one
task (triggered at predefined angles in the engine rotation), and cycle delay for the first runnable, that is:
a very high rate 1ms periodic task. All the other periodic tasks
are executed by the third core. LF
1
2F
= 2T10ms + R10ms,107 . (9)
Using the worst-case execution times provided in the model, 2) Effect Chain 2: Runnables in this chain belong to
the system is definitively in overload with the following per- different tasks with different rates. In this case, the end-to-
core utilizations: 0.97, 1.336, 1.068 and 1.179. We attribute end latency calculation should also consider the over-sampling
the large overload in the second core to the modeling strategy effect between pairs of consecutive runnables. By the L2F
adopted for the Angle_sync task. We deem such a task semantics, we obtain:
to be an engine-triggered task with variable activation rate
and speed-dependent adaptive beahavior. The provided model LL2F
2 = R100ms,7 + min(T10ms − r10ms,19 , T100ms )
mostly likely considers a minimum inter-arrival time for the +R10ms,19 + min(T2ms − r2ms,8 , T10ms ) + R2ms,8
maximum engine speed, and a WCET computed for the
As for the F2F semantics, due to the over-sampling effect,
most time consuming operating mode. This is pessimistic
there are no input overwritings, hence the end-to-end latency
and explains the overload. The explicit consideration of the
is simply given by:
adaptive variable-rate (AVR) task model [9] would improve
the analysis precision. LF
2
2F
= LL2F
2 + T100ms .
Finally, the end-to-end latency computation for the L2L The exploration of new individuals in the solution space
semantics requires to verify Condition (8) from [9], for any is guaranteed by using also a certain number of mutation
pair of consecutive runnables. functions, which randomly change a limited (and casually
chosen) number of genes in the population. Each function
LL2L
2 b1 · T10ms − r10ms,19 + R10ms,19
= R100ms,7 + n has different activation probabilities and consist in random
+bn2 · T2ms − r2ms,8 + R2ms,8 . changing label positions, moving labels from one memory to
another one, and spreading labels from one memory to all the
3) Effect Chain 3: Also in this case, runnables belong others. On the other hand, in order to maintain a sort of elitism,
to different tasks with different rates. Task periods have a (small) number of clones of the best solutions are copied in
increasing values, leading to an under-sampling effect. the next generation without mutating.
By the L2F semantics we obtain:
B. MILP formulation
LL2F
3 = R700/800us,3 +min(T2ms − r2ms,3 , T700/800us )+ The formulation of the problem as a mixed-integer linear
R2ms,3 +min(T50ms −r50ms,36 , T2ms )+R50ms,36 µs program (MILP) required facing with several challenges that
cannot be discussed here due to lack of space. For the same
Due to the sporadic nature of the first runnable, we assume
reason, the complete MILP formulation, with the correspond-
T700/800us = 800 µs in order to maximize latency.
ing proofs of the constraints, is omitted. However, it is worth
The end-to-end latency by the F2F semantics requires to add
discussing two approximations that have been applied to the
one cycle delay with respect to L2F and to verify Condition (8)
analysis of Section III in order to express the response-time
from [9] for any pair of consecutive runnables.
bounds in a linear form.
First, instead of searching for the least positive fixed-point
LF
3
2F
= T700/800us + n1 · T700/800us + R700/800us,3 + of Equation (1), we adopted the approximated response-time
analysis proposed by Park and Park in [10]. Under rate-
n2 · T2ms + R2ms,3 + R50ms,36 = 75559 µs.
monotonic scheduling, the authors showed (with an exper-
Finally, the end-to-end latency for the L2L semantics is imental evaluation) that their approximation introduces an
equal to the L2F case, because no output is overwritten due extremely limited error (≤ 1%) with respect to the exact
to the under-sampling effect. response-time analysis. By extending Theorem 4 in [10] to
cope with the analysis presented in Section III, the response
VII. O PTIMIZING THE PLACEMENT OF M EMORY LABELS time of a task τi (if schedulable with a constrained deadline)
This section discusses possible approaches to compute the is bounded by
optimal placement of label and label copies (for LET) in
memory. We tried two possible solutions (MILP and Genetic
X t
Algorithm) for the case of explicit communication and LET- Ri = min ri = Wi + Wj + M Ci (t) : ri ≤ t
t∈Si Tj
based communication.
τj ∈hp(τi )
τj ∈Γ(τi )
A. Genetic algorithm ( ) (10)
j k
Due to the extremely large set of labels to be positioned, where Si =
S
τj ∈hp(τi )
Ti
Tj , Ti .
Tj
a metaheuristic has been chosen to find a sufficiently good τj ∈Γ(τi )
solution. A Genetic Algorithm (GA) approach has been found This approximation results very useful for encoding the
to be the most suitable candidate for this problem. Hereafter response-time bound into a MILP as (i) it allows getting rid
the structure of the algorithm is briefly presented. of the typical integer variables that are needed to model the
Using the common nomenclature for GAs, we define a term with the ceiling of Equation (1) (note that the terms
possible label placement as an individual I. Each individual in the set Si are all constants, hence that term is in turn a
is encoded as an ordered string of 10000 RAM ids (genes), constant); and (ii) it allows avoiding the need for quadratic
representing the position of each label in the memories. The constraints, which is implied by the fact that the terms Wj
set of individuals (called population) is firstly initialized ran- must be optimization variables (note that their values depend
domly. At every step we evaluate each individual with a fitness on the label placement).
function that is the cost function identified for the challenge: Second, to avoid requiring additional integer variables, the
F (I) = C(I) i.e., the maximum normalized response time term N RAk,x (t) of Equation (2) has been over-approximated
among all tasks with the labels positioned as in I. by replacing Rj with Dj .
At every iteration, the solutions are reordered by following Finally, it is worth mentioning that we leveraged on lower-
their fitness values F (I) (the smaller the better) and divided bounds of the response times to reduce the number of MILP
in three subsets: (i) reproductive survivors (elite), (ii) non- variables (and the corresponding constraints) that must be
reproductive survivors, and (iii) extinct individuals. The next provided to encode Equation (10). The lower-bounds have
generation is created by selecting random couples of parents been computed by accounting for one clock cycle for each
between the elite group, that generate new individuals using access to a label, which corresponds to the best case where
a crossover function: this strategy swaps randomly selected labels allocated in local memory and no contention is possible.
blocks of genes between the parents and save the resulting Such bounds allows reducing the elements into the set Si .
solutions as a new individuals. At the same time, extinct A taste of the MILP formulation. A binary variable Av,x has
individuals are removed from the population. been provided to each couple of label `v and memory Mx , with
the interpretation that Av,x = 1 iff `v is allocated to Mx . Such 6000
Pm+1 5562
variables have been constrained such that x=1 Av,x = 1
holds for each label `v ∈ L. 5000 4604
Explicit LET
A. LET model implementation
For the purposes of this challenge, the LET model has been
implemented only for the runnables involved in effect chains; Fig. 3. Placement of the labels for the case of explicit (dark bars) and LET-
based (light bars) communication.
the only jitter-sensitive parts of the system.
The effect chains are composed of 10 runnables and 7
labels. The approach proposed in this paper for the LET the local memory of core 2 (LRAM2). This result is attributed
implementation requires adding high-priority LET tasks dedi- to the fact that the analysis is dominated by the Angle_sync
cated to copying and writing data implied in the effect chain. task and the 1ms periodic task on core 1: as a consequence,
Runnables belonging to the same task need only one collective the optimization algorithms will mostly optimize the labels
task, thus only 5 LET tasks must be added to the system. As used by these tasks, while the placement of the other labels
every label needs two local copies (one for reading one for is almost indifferent (with possibly few exceptions due to
writing), the total number of labels is increased by 14. memory conflicts).
B. Optimal label placement: MILP formulation C. Optimal label placement: Genetic Algorithm
The MILP formulation has been solved with IBM CPLEX The Genetic Algorithm approach has been implemented in
on a machine equipped with an 8-core processor Intel(R) C++ and executed on an Intel(R) i7 4790K running at 4GHz.
Xeon(R) E5-2609v2 running at 2.50GHz. The solver is able The population has been set to 200 individuals I, all initialized
to immediately found (very first iterations) a feasible solution randomly, and the termination condition of the algorithm has
for the label placement. been defined to complete 30000 iterations. A feasible solution
For the case of explicit communication, the optimal place- (with objective function < 1) is usually reached after the
ment is computed in about 1 hour and 20 minutes, but first 2000 iterations. The algorithm takes approximatively 5
the solver is able to provide a sub-optimal solution with a seconds per iteration, with a completion time of less than
guaranteed gap to the optimum lower than 1% in less than two 40 hours. For each communication semantic we produced
minutes. The value of the objective function for the optimal approximately 20 distinct simulations.
solution is 0.8505. Recomputing the objective function with For the case of explicit communication, the best results
the analysis presented in Section III we obtain 0.849555: obtained with the Genetic Algorithm is an objective function
this result confirmed the effectiveness of the approximation of 0.85161, while for the LET communication the value is
adopted in the MILP formulation. Using the label placement 0.85173. The solutions obtained are only slightly worse than
provided in the Amalthea model of the challenge, the objective the ones found with the MILP formulation, but required a large
function is 1.32634: hence, our solution provides an improve- running time to be computed.
ment that is larger than 35%.
The label placement for the optimal solution is illustrated in D. End-to-end latencies
Figure 3 (dark bars). As shown in the graph, most of the labels End-to-end latencies of the effect chains have been com-
are allocated in the local memory of the first core (LRAM0). puted using the optimal label placement that has been obtained
For the case of LET communication, the optimal placement with the MILP formulation. The best and worst case response
is computed in about 1 hour and 50 minutes. Similarly to the times of the runnables under explicit communication are
first case, the solver is able to provide a sub-optimal solution reported in Table I, while the corresponding latencies of the
with a guaranteed gap lower than 1% in less than seven effect chains are reported in Table II.
minutes. Using the label placement provided in the Amalthea Under LET-based communicaiton, once a label placement
model of the challenge, and placing the 14 labels required for that guarantees the task schedulability is found (as done by the
implementing the LET communication as in our solution (as proposed optimization algorithms), the end-to-end latencies
they do not exist in the challenge data), the objective function can be computed in a straightforward manner. Further inves-
is the same for the case of explicit communication. tigation may target the integration of the end-to-end latencies
Surprisingly, the label placement is completely different as constraints in the MILP formulation, with the objective
from the one computed for the case of explicit communication of computing label placements that guarantee specific timing
(see Figure 3, light bars), as most of the labels are allocated in constraints related to the effect chains.
TABLE I [3] H. Zeng and M. Di Natale, “Mechanisms for guaranteeing data consis-
R ESPONSE TIMES (µSEC ) FOR THE RUNNABLES IN THE EFFECT CHAINS tency and flow preservation in autosar software on multi-core platforms,”
UNDER EXPLICIT COMMUNICATION . in 6th IEEE International Symposium on Industrial Embedded Systems
(SIES), Vasteras, Sweden, June 2011.
[4] M. Di Natale and A. Sangiovanni-Vincentelli, “Moving from federated
Runnable name Best RT Worst RT Period to integrated architectures in automotive: The role of standards, methods
Runnable10ms149 2077.68 4118.33 10000 and tools,” in Proceedings of the IEEE, vol. 98 (4), 2010, pp. 603–620.
Runnable10ms243 3315.99 6328.08 10000 [5] A. Ferrari, M. Di Natale, G. Gentile, G. Reggiani, and P. Gai, “Time
Runnable10ms272 3644 7175.57 10000 and memory tradeoffs in the implementation of autosar components,”
Runnable10ms107 1367.16 2745.87 10000 in Design, Automation and Test in Europe Conference. DATE’09, April
Runnable100ms7 152.335 13820.5 100000 2009.
Runnable10ms19 298.715 650.74 10000 [6] A. Wieder and B. Brandenburg, “On spin locks in AUTOSAR: block-
Runnable2ms8 55.465 117.12 2000 ing analysis of FIFO, unordered, and priority-ordered spin locks,” in
RTSS’13.
RunnableSporadic700us800us3 17.815 27.11 700 [7] A. Biondi and B. Brandenburg, “Lightweight real-time synchroniza-
Runnable2ms3 20.025 42.385 2000 tion under P-EDF on symmetric and asymmetric multiprocessors,” in
Runnable50ms36 1070.3 13089.6 50000 ECRTS’16.
[8] T. Henzinger, B. Horowitz, and C. Kirsch, “Giotto: A time-triggered
language for embedded programming.” in Proc. International Workshop
TABLE II on Embedded Software (EMSOFT), volume 2211 of LNCS, Springer,
E ND - TO - END LATENCIES FOR THE EFFECT CHAINS UNDER EXPLICIT Ed., 2001, pp. 166–184.
COMMUNICATION . [9] A. Biondi, M. D. Natale, and G. Buttazzo, “Response-time analysis
for real-time tasks in engine control applications,” in Proceedings of the
6th International Conference on Cyber-Physical Systems (ICCPS 2015),
Chain index Latency type Latency value (µs) Seattle, Washington, USA, April 14-16, 2015.
1 L2F 12746 [10] M. Park and H. Park, “An efficient test method for rate monotonic
1 F2F 22746 schedulability,” IEEE Transactions on Computers, vol. 63, no. 5, 2014.
2 L2F 26234 [11] A. Biondi, M. Di Natale, Y. Sun, and S. Botta, “Moving from single-
core to multicore: initial findings on a fuel injection case study,” in SAE
2 F2F 126234
Technical Paper, SAE Conference, Detroit, USA, April 2016.
2 L2L n1 = 11 and n
154234 (b b2 = 5)
3 L2F 15959
3 L2F 75559 ( n1 = 2 and n2 = 30)
R EFERENCES
[1] T. A. Henzinger, C. M. Kirsch, M. A. A. Sanvido, and W. Pree, “From
control models to real-time code using giotto,” in Control Systems
Magazine, IEEE, 2003.
[2] G. Wang, M. Di Natale, and A. Sangiovanni-Vincentelli, “Improving
the size of communication buffers in synchronous models with time
constraints,” in IEEE Transactions on Industrial Informatics, vol. 5 (3),
2009, pp. 229–240.