Exercise8 - Solution - Introduction For Embedded Systems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Networked Embedded Systems Lab

Prof. Marco Zimmerling

Introduction to Embedded Systems – WS 2022/23


Sample Solution to Exercise 8: Architecture Synthesis II

Task 1: Scheduling with Pipeline Resources

Pipeline-resources process data in time intervals that are smaller than the actual execution time w. As soon as
after the start of a task v1 the so-called pipeline-interval P I has elapsed, the next task v2 can be started on
the same resource (see Figure 1). Non-pipeline-resources are a special case of pipeline-resources with P I = w.

v2
v1
t
w
PI

Figure 1: Tasks on pipeline-resource

NOP 0

1 2 3

4 5

n
NOP

Figure 2: Sequence graph for Pipelining

a) Modify the LIST algorithm given in the lecture notes so that pipeline-resources are considered. Which
step has to be reformulated and how? (Explain your answer!)

b) Perform the scheduling for the sequence graph given in Figure 2 using the modified algorithm. You can
use Table 1. The multiplication (r2 ) lasts 4 time units and the length of the pipeline-interval is 2 time

1
units. The addition (r1 ) lasts 2 time units and cannot be executed as pipeline-operation. 1 adder and 1
multiplier are available. Use the number of successor nodes as priority criterion. What is the resulting
latency?

Solution to Task 1:

a) [. . . ]
Determine candidates Ut,k to be scheduled;
Determine set of occupied resources Ot,k ;
Choose subset St ⊆ Ut,k with maximal priority and |St,k | + |Ot,k | ≤ α(vk )
[. . . ]

Ot,k is the set of resources of type k that are occupied in the time slot t and are not yet available for the
following operation. On each of these resources exactly one operation is executed in a pipeline-interval.

b) The resulting schedule is shown in Table 1.


The resulting latency is 12.

t k Ut,k Ot,k St,k


r1 v3 — v3
0
r2 v1 , v2 — v1
r1 — v3 —
1
r2 v2 v1 —
r1 — — —
2
r2 v2 , v5 — v2
r1 — — —
3
r2 v5 v2 —
r1 — — —
4
r2 v5 — v5
r1 — — —
5
r2 — v5 —
r1 v4 — v4
6
r2 — — —
r1 — v4 —
7
r2 — — —
r1 — — —
8
r2 v6 — v6
r1 — — —
9
r2 — v6 —
r1 — — —
10
r2 — — —
r1 — — —
11
r2 — — —
r1 — — —
12
r2 — — —

Table 1: Schedule for Task 1

2
Task 2: Integer Linear Programming

Given the sequence graph GS = (VS , ES ) in Fig. 3.

NOP 0

1 2 6 8 10

3 7 9 11

n
NOP

Figure 3: Sequence graph.

For the execution times of the operations assume: A multiplication operation (MULT) takes 2 time units and
all other (ALU) operations take 1 time unit each. Two units of the resource type r1 (multiplier) and two units
of the resource type r2 (ALU) are allocated.

(a) Apply the ASAP and ALAP algorithms to compute the earliest (li ) and the latest (hi ) starting time of
all operations vi ∈ Vs , i ∈ {1, . . . , 11}. For ALAP, assume the maximum latency L = 7. Fill in the
starting times in Table 2.
(b) Formulate the problem of latency minimization with restricted resources as an integer linear program
(ILP). For this, you should introduce the binary variables xi,t ∈ {0, 1} ∀vi ∈ VS and ∀t ∈ {t ∈
Z | li ≤ t ≤ hi }. τ (vi ) is used to denote the starting time of operation vi ∈ VS and α(ri ) with
ri ∈ VR = {MULT, ALU} denotes the number of allocated resource instances. Given the above
P
notations, write down the following equations/inequations without using the symbol.
(i) Express the objective function of the ILP
(ii) Define τ (vi ) ∀i ∈ {1, . . . , 11} as a function of xi,t , where l1 ≤ t ≤ h1
(iii) Express all data dependencies
(iv) Express all resource limitations
(c) In an analogous manner try to formulate an ILP that solves the problem of cost minimization with
latency limitation. Hint: We assume that the cost of a realization is the sum of the costs c of the
multipliers with c(r1 ) = 2 per allocated unit, and of the ALUs with c(r2 ) = 1 per allocated unit. For
the latency bound, we choose L̄ = 6.

Solution to Task 2:

(a) The starting times are listed in Table 2. The corresponding ASAP/ALAP schedules are depicted in
Figure 4.

3
li (ASAP) hi (ALAP)
v1 1 2
v2 1 2
v3 3 4
v4 5 6
v5 6 7
v6 1 3
v7 3 5
v8 1 5
v9 3 7
v10 1 6
v11 2 7

Table 2: Earliest and latest starting times (Task 2a)

NOP 0 NOP 0
ASAP ALAP
1 2 6 8 10
t=1
11 1 2
t=2
3 7 9 6
t=3

3
t=4
4 7 8
t=5
5 4 10
t=6
n 5 9 11
NOP t=7
n
NOP

Figure 4: Schedule with ASAP and ALAP

(b) (i) Objective function:


min. L = τ (vn ) − τ (v0 )

(ii) Introduction of binary variables:

x1,1 + x1,2 = 1 1 · x1,1 + 2 · x1,2 = τ (v1 )


x2,1 + x2,2 = 1 1 · x2,1 + 2 · x2,2 = τ (v2 )
x3,3 + x3,4 = 1 3 · x3,3 + 4 · x3,4 = τ (v3 )
x4,5 + x4,6 = 1 5 · x4,5 + 6 · x4,6 = τ (v4 )
x5,6 + x5,7 = 1 6 · x5,6 + 7 · x5,7 = τ (v5 )
x6,1 + x6,2 + x6,3 = 1 1 · x6,1 + 2 · x6,2 + 3 · x6,3 = τ (v6 )
x7,3 + x7,4 + x7,5 = 1 3 · x7,3 + 4 · x7,4 + 5 · x7,5 = τ (v7 )
x8,1 + . . . + x8,5 = 1 1 · x8,1 + . . . + 5 · x8,5 = τ (v8 )
x9,3 + . . . + x9,7 = 1 3 · x9,3 + . . . + 7 · x9,7 = τ (v9 )
x10,1 + . . . + x10,6 = 1 1 · x10,1 + . . . + 6 · x10,6 = τ (v10 )
x11,2 + . . . + x11,7 = 1 2 · x11,2 + . . . + 7 · x11,7 = τ (v11 )

4
(iii) Data dependencies:

τ (v3 ) − τ (v1 ) ≥ 2 τ (v3 ) − τ (v2 ) ≥ 2


τ (v4 ) − τ (v3 ) ≥ 2 τ (v5 ) − τ (v4 ) ≥ 1
τ (v7 ) − τ (v6 ) ≥ 2 τ (v5 ) − τ (v7 ) ≥ 2
τ (v9 ) − τ (v8 ) ≥ 2 τ (v11 ) − τ (v10 ) ≥ 1
τ (vn ) − τ (v5 ) ≥ 1 τ (vn ) − τ (v9 ) ≥ 1
τ (vn ) − τ (v11 ) ≥ 1

τ (v1 ), τ (v2 ), τ (v6 ), τ (v8 ), τ (v10 ) ≥ τ (v0 ) ≥ 1

(iv) Resource limitations:


t = 1:
x1,1 + x2,1 + x6,1 + x8,1 ≤ 2
x10,1 ≤ 2
t = 2:
x1,1 + x1,2 + x2,1 + x2,2 + x6,1 + x6,2 + x8,1 + x8,2 ≤ 2
x10,2 + x11,2 ≤ 2
t = 3:
x1,2 + x2,2 + x6,2 + x6,3 + x8,2 + x8,3 + x3,3 + x7,3 ≤ 2
x10,3 + x11,3 + x9,3 ≤ 2
t = 4:
x6,3 + x8,3 + x8,4 + x3,3 + x3,4 + x7,3 + x7,4 ≤ 2
x10,4 + x11,4 + x9,4 ≤ 2
t = 5:
x8,4 + x8,5 + x3,4 + x7,4 + x7,5 ≤ 2
x10,5 + x11,5 + x9,5 + x4,5 ≤ 2
t = 6:
x8,5 + x7,5 ≤ 2
x10,6 + x11,6 + x9,6 + x4,6 + x5,6 ≤ 2
t = 7:
(0 ≤ 2)
x11,7 + x9,7 + x5,7 ≤ 2
(c) Restating the resource limitations, and introducing additional variables:
t = 1:
x1,1 + x2,1 + x6,1 + x8,1 − α(r1 ) ≤ 0
x10,1 − α(r2 ) ≤ 0
[. . . ]

Latency limitations:
L = τ (vn ) − τ (v0 ) ≤ L̄ = 6
New objective function:

min. C = α(r1 ) · c(r1 ) + α(r2 ) · c(r2 ) = 2 · α(r1 ) + α(r2 )

5
Task 3: Iterative Algorithms

Please answer the following questions considering the given video codec application specified as a marked
graph in Figure 5.

ν1 ν2 ν3 ν4 ν5
w(νi ) 10 10 10 5 5

Figure 5: Video codec marked graph representation Table 3: Execution time of each function

(a) Formulate all existing dependencies in Figure 5 from νi to νj in the form of

τ (νj ) − τ (νi ) ≥ w(νi ) − dij · P,

where P is the minimum iteration interval. The execution time of each function is listed in Table 3.
(b) Assuming unlimited resources and only one token on the edge between ν5 and ν1 , determine the minimum
iteration interval P and the latency L. To justify your answer, draw the scheduling on the timeline given
in Figure 6 with the dependency from ν5 to ν1 highlighted.

Figure 6: Scheduling result of the video codec

(c) The motion estimation function (ν1 ) uses the result of the previous frame (See the dependency between
ν1 and ν5 ). Let us now suppose that any arbitrary number of tokens can be inserted to reduce P using
functional pipelining. Then, determine the minimum number of tokens that should be added on the
edge ν5 → ν1 to achieve P = 10? To justify your answer, draw the pipelined scheduling on the timeline
given in Figure 7 with the dependency from ν5 to ν1 highlighted and calculate the latency L of the
schedule.

Solution to Task 3:

(a) Dependencies:
τ (ν2 ) − τ (ν1 ) ≥ 10
τ (ν3 ) − τ (ν2 ) ≥ 10
τ (ν4 ) − τ (ν2 ) ≥ 10

6
Figure 7: Pipelined scheduling result of the video codec

τ (ν5 ) − τ (ν4 ) ≥ 5
τ (ν1 ) − τ (ν5 ) ≥ 5 − 1 · P

(b) We solve the system of inequalities of 3a) for P .


⇒ Pmin = 30
L = 30

Figure 8: Scheduling result of the video codec

(c) Now the iteration interval P is given (P = 10) and we are looking for the number of tokens n. Therefore,
we replace the last inequation in 3a) by τ (ν1 ) − τ (ν5 ) ≥ 5 − n · 10 and solve the new set of inequations
for n.
⇒ nmin = 3
We have to add at least 2 tokens on the edge between ν5 and ν1 .
L = 30

Figure 9: Pipelined scheduling result of the video codec

You might also like