10 WorstCaseExecutionTime
10 WorstCaseExecutionTime
Lothar Thiele
Intellectual Intellectual
Prop. Code Prop. Block
Sideairbag in car,
Reaction in <10 mSec
Unsafe:
Best Case Execution Time
Execution Time Measurement
Upper bound
Worst Case
Execution Time
Execution Time
Constituents of A:
A ≡ A1; A2;
A1 and A2
yes no
B
ub(A) =
A1 A2 ub(B) +
max(ub(A1), ub(A2))
i←1
ub(A) =
ub(i ← 1) +
no
i ≤ 100 100 × ( ub(i ≤ 100) +
yes ub(A1) ) +
ub( i ≤ 100)
A1
cycles
ub(x ← a + b) =
add 4
cycles(load a) +
cycles(load b) + loadNot
m applicable
12
cycles(add) + tostore
modern processors!
m 14
cycles(store x) move 1
Swiss Federal Computer Engineering
Institute of Technology 10 - 11 and Networks Laboratory
Modern Hardware Features
Modern processors increase performance by using:
Caches, Pipelines, Branch Prediction, Speculation
PPC 755
Execution Time (Clock Cycles)
350
300
250
200
Clock Cycles
150
100
50
0
Best Case Worst Case
CFG Builder
Loop Unfolding
Loop-
ILP-Generator
Value Analyzer Bounds
Micro-
Architecture LP-Solver
Cache/Pipeline
Analyzer WCET-
Evaluation
Visualization
Timing
Information
Micro-architecture Worst-case Path
Analysis Determination
Swiss Federal Computer Engineering
Institute of Technology 10 - 16 and Networks Laboratory
Contents
Introduction
problem statement, tool architecture
Program Path Analysis
Value Analysis
Caches
must, may analysis
Pipelines
Abstract pipeline models
Integrated analyses
10
Swiss Federal Computer Engineering
Institute of Technology 10 - 18 and Networks Laboratory
Program Path Analysis
Program Path Analysis
which sequence of instructions is executed in the worst-case
(longest runtime)?
problem: the number of possible program paths grows
exponentially with the program length
Model
fixed number of cycles for each basic block (from static
analysis)
loops must be bounded
Concept
Transform structure of CFG into a set of (integer) linear
equations.
Solution of the Integer Linear Program (ILP) yields bound on
the WCET.
Swiss Federal Computer Engineering
Institute of Technology 10 - 19 and Networks Laboratory
Basic Block
Definition: A basic block is a sequence of instructions
where the control flow enters at the beginning and exits at
the end, without stopping in-between or branching (except at
the end).
t1 := c - d
t2 := e * t1
t3 := b * t1
t4 := t2 + t3
if t4 < 10 goto L
i := 0
t2 := 0
L t2 := t2 + i
i := i + 1
if i < 10 goto L i < 10
x := t2 i >= 10
r = j;
B7 r = j;
Swiss Federal Computer Engineering
Institute of Technology 10 - 23 and Networks Laboratory
Calculation of the WCET
Definition: A program consists of N basic blocks, where
each basic block Bi has a worst-case execution time ci and
is executed for exactly xi times. Then, the WCET is given by
N
WCET = ∑ ci ⋅ xi
i =1
the ci values are determined using the static analysis.
how to determine xi ?
• structural constraints given by the program structure
• additional constraints provided by the programmer (bounds for
loop counters, etc.; based on knowledge of the program context)
B7 r = j;
Swiss Federal Computer Engineering
10 - 25
Institute of Technology d10 and Networks Laboratory
Additional Constraints
d1
B1 s = k; loop is executed for at most 10
times:
d2
B2 WHILE (k<10) x3 <= 10 · x1
d3
B3 if (ok)
d4 d5 B5 is executed for at most one
time:
j = 0;
B4 j++; B5
ok = true; x5 <= 1 · x1
d6 d7
d9 d8
B6 k++;
B7 r = j;
Swiss Federal Computer Engineering
10 - 26
Institute of Technology d10 and Networks Laboratory
WCET - ILP
ILP with structural and additional constraints:
program is executed
once
N
WCET = max {∑ ci ⋅ xi d1 = 1 ∧
i =1
∑ d j = ∑ d k = xi , i = 1...N ∧
j ∈in ( Bi ) k ∈ out ( Bi )
structural
additional constraints } constraints
CFG Builder
Loop Unfolding
Loop-
ILP-Generator
Value Analyzer Bounds
Micro-
Architecture LP-Solver
Cache/Pipeline
Analyzer WCET-
Evaluation
Visualization
Timing
Information
Micro-architecture Worst-case Path
Analysis Determination
Swiss Federal Computer Engineering
Institute of Technology 10 - 29 and Networks Laboratory
Abstract Interpretation (AI)
Semantics-based method for static program analysis
add D1,D0
access Processor
takes
~ 1 cycle
fast, small,
Cache expensive
access
takes Bus
~ 100 cycles
(relatively)
Memory slow, large,
cheap
Abstraction
Changing the domain: sets of memory blocks in single cache
lines
Must Analysis:
For each program point (and calling context), find out which
blocks are in the cache.
Determines safe information about cache hits. Each
predicted cache hit reduces WCET.
May Analysis:
For each program point (and calling context), find out which
blocks may be in the cache. Complement says what is not in
the cache.
Abstraction
z
s
x
a s
α
z z
t {}
x {}
t x
z s {z,x}
s {s}
x x
t s
z
t
Concretization
γ
z, x ∈ { s∈
{
{}
{}
{z,x}
{s}
x s
t x
y t
s y
{ }
{ } Interpretation:
{ a, c } memory block a is definitively in the
{d} (concrete) cache => always hit
Abstraction
z
s
x
a s
α
z z
t {z,s,x}
x
x {t}
t
z s {}
s {a}
x x
t s
z
t
Concretization
m γ {z,s,x}
m ∈ {z,s,x} {t}
n
n,o ∈ {z,s,x,t} {}
o
p ∈ {z,s,x,t,a} {a}
p
t s
y t
x y
s x
{ a, c }
{e} Interpretation:
{f} all blocks may be in the cache; none is
{d} definitely not in the cache.
if s is in must-cache:
. tWCET = thit
. otherwise
. tWCET = tmiss
tmiss
ref to s if s is in may-cache:
. thit tBCET = thit
.
. otherwise
tBCET = tmiss
T1 T2
Einzyklenverarb.
single cycle LW SW
T1 T2 T3 T4 T5 T6 T7 T8 T9
LW SW
Mehrzyklenverarb.
multiple cycle IF RF EX MEM WB IF RF EX MEM
pipelining
Pipelineverarb. IF RF EX MEM WB LW
IF RF EX MEM WB SW
28
What is different?
Abstract states may lack information, e.g. about cache
contents.
Assume local worst cases is safe
(in the case of no timing anomalies)
Traces may be longer (but never shorter).
s1 s2
s?
Pipeline analysis
assume cache hits where predicted,
assume cache misses where predicted or not excluded.
Only the “worst” result states of an instruction need to be
considered as input states for successor instructions!