0% found this document useful (0 votes)
15 views10 pages

Short 2010

This document discusses improved techniques for enforcing earliest deadline first (EDF) scheduling on recurring tasks in real-time and embedded systems. It proposes using timing and indexed deadline wheels or digital search trees to represent the ready and waiting queues, allowing all scheduling decisions to be made in time proportional to the logarithm of the largest time representation. Experimental results on an ARM7 microcontroller show worst-case scheduling overheads remain effectively constant below 20 microseconds even when the number of tasks is large.

Uploaded by

Sagar Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Short 2010

This document discusses improved techniques for enforcing earliest deadline first (EDF) scheduling on recurring tasks in real-time and embedded systems. It proposes using timing and indexed deadline wheels or digital search trees to represent the ready and waiting queues, allowing all scheduling decisions to be made in time proportional to the logarithm of the largest time representation. Experimental results on an ARM7 microcontroller show worst-case scheduling overheads remain effectively constant below 20 microseconds even when the number of tasks is large.

Uploaded by

Sagar Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium

Improved Task Management Techniques for


Enforcing EDF Scheduling on Recurring Tasks
Michael Short
Electronics & Control Group,
Teesside University,
Middlesbrough, UK
[email protected]

Abstract— The management of tasks is an essential requirement queue – in which a specific event is inserted, to be activated at
in most real-time and embedded systems, but invariably leads to some future point in time - is a typical means to achieve the
unwanted CPU overheads. This paper is concerned with task desired behavior.
management in real-time and embedded systems employing the
Earliest Deadline First (EDF) scheduling algorithm. Currently,
Maintaining a ready queue: Many systems are required to
the best known techniques to manage EDF scheduling lead to
overheads with complexity O(log n), where n is the number of respond to different internal or external events and perform
recurring (periodic/sporadic) tasks. In this paper it will be shown specific processing in a timely fashion; when multiple events
that if both the ready and waiting queues are represented by occur simultaneously, then some form of scheduling algorithm
either i) timing and indexed deadline wheels or ii) digital search is normally required to process the events in an appropriate
trees, then all scheduling decisions may be made in time order. A priority queue – into which an active event is inserted
proportional to the logarithm of the largest time representation along with some notion of priority (e.g. deadline) is inserted -
required by the system, pm. In cases where pm is relatively small, is a typical means to achieve the desired behavior.
for example in some embedded systems, extremely efficient task
management may then be achieved. Experimental results are
then presented, and it is shown that on an ARM7
microcontroller, when the number of tasks is comparatively large
for such a platform (> 250), the worst-case scheduling overheads
remain effectively constant and below 20 µs. The results indicate
that the techniques provide some improved performance over
previous methods, and also seem to indicate that there is little
discernable difference between the overheads incurred between
employing a fixed- or dynamic-priority scheduler in a given
system.

Keywords-Hashed Wheels; Timers; Deadline Scheduling; Figure 1. Aspects of task management.


Implementation Models; Task Management; Digital Search Trees.
These two main aspects are illustrated in Fig. 1. The
I. INTRODUCTION performance of algorithms to implement task management is
The effective management of tasks is an essential an issue worthy of study - not only for resource-constrained
requirement in many real-time and embedded systems. If these and embedded systems - but also wider applications such as
overheads – CPU time reserved for making task scheduling general-purpose operating system implementations, network
decisions - are not carefully managed, system performance can and discrete simulation environments, and also process
be negatively affected leading to reduced real-time response planning applications [1][2][3][4]. This paper is concerned
and increased power consumption. This paper is concerned with systems implementing the Earliest Deadline First (EDF)
with the efficient management of such overheads for recurring scheduling algorithm; specifically, it will suggest new task
(periodic / sporadic) task models, with applications to single- management techniques to reduce the overheads in such
processor, real-time and embedded systems. In this context, systems. The main motivating factors for the current work are
the two main aspects (requirements) of task management can as follows. It is known that when preemption is allowed, EDF
be stated as follows: allows the full utilization of the CPU and is optimal on a
single processor under a wide variety of different operating
Maintaining a waiting queue: Many types of events are constraints ([3][5][6]). Although EDF generally allows greater
either periodic or quasi-periodic in nature; for example the utilization of the CPU over alternate fixed-priority schemes
signaling of repeated (recurring) events, or the enforcement of (and - arguably - has many other benefits [7]), it is generally
temporal isolation between system responses to sporadically seen as being more cumbersome, complex and harder to
occurring (external) events. In these situations, a waiting implement than fixed-priority schemes [7][8]. When
preemption is not allowed, the scheduling problem (in general)

1080-1812/10 $26.00 © 2010 IEEE 56


DOI 10.1109/RTAS.2010.22
becomes NP-Complete [9]. However if attention is restricted not. The advantage of such an approach is that if the number
to ‘efficient’ schedulers – i.e. those that do not allow the use of tasks is less than the bit-width of the underlying CPU, the
of inserted idle-time – then non-preemptive EDF (npEDF) is ready queue can be represented as an unsigned integer;
known to be optimal for many types of task sets amongst this determining the task with the highest priority (i.e. lowest
sub-class of schedulers [4][10][11]. Although this paper is index) can be performed by simply finding the least set bit in
primarily concerned with preemptive EDF, the results easily this integer. This can be performed with a single machine
generalize to npEDF. The remainder of this paper is organized instruction if supported by the underlying hardware (e.g. on
as follows. Section II describes the assumed task model, and x86-based CPUs) or with 4 machine instructions if not directly
reviews previous work in this area. Section III presents the supported (e.g. on an ARM or other RISC CPU), using a
proposed techniques for EDF task management. Section IV technique based on the deBruijn sequence [12].
presents a small computational study designed to provide
initial performance measures for the proposed techniques,
evaluated using an ARM7 microcontroller. Section V
concludes the paper.
II. PREVIOUS WORK
Several previous works have considered the problem of
task management overheads, for both fixed-priority and
deadline-driven systems; this Section will briefly review this
material, along with related work on the management of
timers and time representation in real-time and embedded
systems. Prior to this, the assumed task model will first be
discussed.
A. Task Model
Figure 2. Using a hierarchical bit-field index to encode fixed task
This paper is concerned with implementing the recurring priorities.
task model on a single processor. Such a system may be
represented by a set τ of n tasks, where each task ti ∈ τ is When the number of tasks exceeds the bit-width of the
represented by a tuple: CPU, then a hierarchical approach may be used, as illustrated
in Fig. 2 for an 8-bit machine. The queue may be separated
ti = ( pi , ci , d i ) into several prioritized sub-queues, and a master index
(1) employed to indicate which sub-queue contains one (or more)
set bits (ready tasks). First the master index is tested to find
In which pi is the task period (minimum inter-arrival), ci is the lowest indexed sub-queue with a set bit; then this sub-
the (worst-case) computation requirement of the task and di is queue is examined to find the corresponding task. On a 32-bit
the task (relative) deadline. A similar model was introduced in machine, a simple two-level hierarchy can be used to represent
this context by Liu & Layland [6] and since has been widely up to 1,024 tasks; if a third hierarchy is added, this increases
adopted – see, for example, [2][3][8][11]. This paper is to 32,768 tasks; the highest priority task may be determined
primarily concerned with periodic tasks; however, extensions with only 3 quick iterations. The solution requires storage of
to sporadic task management are also described in Section III. O(n) bits to represent an indexed queue of n tasks.
Attention is restricted in this paper to implicit and constrained When the number of priority levels is fixed (e.g. 1024),
deadline tasks, i.e. those in which di ≤ pi; such tasks are the then such an approach – in conjunction with a timing wheel
most widely discussed in the literature (and used in practice). (to be discussed in Section 2C) - may be used to create an
O(1) solution [4]. Typically, an index can be placed over an
B. Fixed Priority Task Management array of FIFO queues, and used to indicate which of the
When task indices and priorities are fixed – as opposed to queues contain one or more elements; finding the task with the
dynamic – the problem of task management seems to be highest priority can effectively be performed in O(1). An
conceptually much simpler [7]. The ready queue may be approach similar to this is employed in the so-called ‘O(1)
represented as a simple priority queue; one simple approach to scheduler’ used in the Linux kernel [13], primarily aimed at
its implementation is to represent the ready tasks as a bit-field. x86 systems. However when optimal priority assignments
In this approach, it is first assumed that the tasks are ordered such as such as rate or deadline monotonic are employed
such that for any two tasks ti and tj, i < j iff fi > fj, where fi is [6][8], it is easy to observe that the worst-case number of task
the (fixed) priority assigned to task i, and the goal is to always priorities required will be directly proportional to the largest
schedule the ready task with the highest priority (lowest task period pm as n increases, leading to solutions that are
index). Each task is then assigned a corresponding bit in the effectively O(log pm). However, since priorities are dynamic in
bit-field, the bit index corresponding to the task index; the the case of the EDF scheduler, this solution does not seem to
individual bit is set to ‘1’ if the task is ready, and ‘0’ if it is directly translate, and alternate schemes are required.

57
C. EDF Task Management An alternative to the HoH data structure of [15] is to
In the case of EDF scheduling, task priorities are represent the ready queue (and also the elements of the
dynamically assigned inversely proportional to their absolute waiting queue) as binomial heaps [16]. With such an
deadlines. A basic solution is to sort the tasks – with every approach, transferring multiple tasks from the waiting queue
new task arrival – in order of increasing deadlines, giving a to the ready queue can be achieved by merging the two
solution of complexity O(n2), O(n log n) or O(n) depending on binomial heaps in question; since the merging of two binomial
the implementation. At first glance, it seems a simple way to heaps is an O(log n) operation, this leads to a solution with
reduce this complexity to O(log n) would be to employ a identical worst-case complexity, and is the technique adopted
priority queue based around a binary heap [14]. A binary heap in the LITMUSRT multiprocessor kernel [16]. This particular
consists of a number of nodes, one for each task, where each kernel also has an option to represent the waiting queue using
node has a maximum of two children. A heap represents a an alternate technique to a balanced tree, known as a timing
partially sorted list; the partial sorting is achieved via the basic wheel. This, along with other work in time representation for
heap operations, which reorder nodes (through swapping) real-time systems, is the subject of the next Section.
when performing inserting or deleting operations in order to D. Timer Management and Time Representation
respect the heap invariant. In a min-heap, the heap invariant is
It can be assumed that in all embedded and real-time
simply that the priority of a node must be either identical to or
systems, time can be represented by unsigned integers (see for
lower than the priority of both its children.
example [17] for justification). As shown by Varghase &
Accordingly, the root of the heap contains the task with the
Lauck [1], if sufficient memory on the target hardware is
minimum priority; in the case of EDF scheduling, absolute
available then a timing wheel may be employed to represent a
deadlines can be used as keys for the heap invariant, and the
waiting queue. A timing wheel is simply a sequential array of
root node then becomes the task with the earliest deadline.
records, whose combined size corresponds to the maximum
Although basic heap operations can be performed in time
size of timer required by a system (in the case of task
proportional to O(log n), if a heap is also used to represent the
management, this corresponds to the largest task period
waiting queue problems may arise when several tasks with
divided by the required timer granularity, henceforth denoted
identical release times must be inserted. As shown by Mok
as pm).
[15], in the case of EDF scheduling there are several
conditions which cause heap management to degenerate to
O(n). For example, consider the heap of waiting tasks depicted
in Fig. 3. Note that insertion of an element into a heap - which
already contains one or more elements with identical keys –
does not merge the data into a single equivalent element. At t
= 20, all 5 tasks shown in Fig. 3 will need transferring to the
ready queue; when crossing the hyperperiod boundary for
periodic tasks, the entire heap contents will need to transferred
to the run queue, an O(n) operation.
A solution to this problem, as suggested by Mok [15], is
twofold; firstly the waiting queue may be represented as a Figure 3. Heap of tasks showing their next release times; at t = 20 all tasks
balanced binary search tree (such as a RB, 3/4 or AVL tree will need to be inserted into the ready queue.
[14]), where the data contained in each node represents the If the records in question correspond to priority queues (or
root of a heap. Secondly, the ready queue can be represented references to priority queues), such a scheme can be operated
as a ‘Heap of Heaps’ (HoH) data structure; this is essentially a as follows in a real-time system. When a task – with absolute
heap whose elements can be individual nodes, or heaps release time ri - is inserted into the waiting queue, it is inserted
themselves; the root node of a sub-heap is used as its effective into the ri mod(pm)th element (queue) in the timing wheel,
priority when performing basic operations on the ‘master’ assuming that time is incremented mod(pm) with each timer
heap. This overcomes the problems outlined above as follows. tick. Since time is represented in integer – and using integers
When a task runs to completion, its next release time is to index into an array is a trivial operation – insert and delete
updated and this is used as the key for insertion into the operations take place in constant time. This creates a highly
balanced tree. Because the tree is balanced, it supports lookup, effective solution that is also much simpler than implementing
insertion and deletion operations in time proportional to O(log a balanced tree, and is shown graphically in Fig. 4, with
n); if a node with an identical release time exists within the elements that are indexed from 0. However, a drawback is that
tree, then a heap is created at this node and the tasks relative large arrays will be needed for large values of pm; it is easily
deadline – which remains static – is used as the key to push seen that such a scheme is best suited to embedded systems in
the task down the heap. It is trivial to observe that when which pm is bounded to be some small power of 2, e.g. 1024.
multiple tasks are required to be released simultaneously, then The assumption that time is represented as integer – and in
all that is required is to locate the root of the heap small embedded systems, normally with a fixed number of bits
corresponding to the release time, and push this heap onto the (e.g. 16) may also to lead to timer rollover problems.
HoH ready queue; again an O(log n) operation. Deadlines will naturally ‘wrap around’ due to this modular

58
representation of time, and since the normal laws of arithmetic timing wheel such that it can also represent a ready queue;
no longer hold it cannot be guaranteed that di mod(2b) < dj such a queue will be termed a deadline wheel. The basic idea
mod(2b) when di < dj and time is represented by b-bit unsigned behind this approach can be summarized as follows:
integers. In such cases, assuming that the inequality pm < 2b / 2
– i.e. the maximum period is less than half the life time of the 1. Upon power-on, all tasks are inserted into the waiting
underlying linear timer – holds for a given task set, then this queue according to their first release time; any tasks that
problem may be efficiently overcome by using Carlini & hash to an identical release time are inserted into a bit-field
Buttazzo’s ICTOH algorithm [18]. This algorithm – which has priority queue according to their index (relative deadline).
a very simple code implementation - exploits the fact that the 2. When a timer interrupt occurs, after saving the context of
modular distance between any two events (deadlines) x and y, the current task the priority queue Pt of tasks to be released
encoded by b-bit unsigned integers, may be determined by (if any) at the current time t is extracted from the timing
performing a subtraction modulo 2b between x and y, with the wheel.
result interpreted as a signed integer. The ICTOH algorithm
has been integrated into the ERIKA Enterprise kernel [23], 3. The index of the task with the smallest relative deadline b is
peeked from Pt, and its absolute deadline db is determined.
and is easily extended such that it may be integrated with
either regular or binomial heaps for comparing deadlines, to 4. Pt is inserted into the deadline wheel using the key db
maintain the heap invariants. mod(pm); if another queue exists at this location, a linked
list of queues is formed (ties can be broken arbitrarily under
EDF).
5. Starting from index i = t + 1, the deadline wheel is searched
by incrementing i mod(pm) to find the first non-empty slot;
a reference to priority queue Pb (possibly at the head of a
linked list of queues) residing in this slot is extracted; if no
valid entry is found and i = t, the idle task index is selected.
6. The task with the smallest relative deadline b is peeked
from Pb, and switched into context.
7. When task b runs to completion, it is deleted from Pb before
having its next release time and absolute deadline updated;
b is then inserted into the waiting queue (timing wheel).
Figure 4. Employing a timing wheel to represent waiting tasks.
8. If Pb is not empty, the index of the task with the smallest
relative deadline is peeked from Pb, and its absolute
III. PROPOSED EDF TASK MANAGEMENT TECHNIQUES deadline db determined; it is then inserted into the timing
Given the discussion of the previous Section, it can be seen wheel using the key db mod(pm).
that the current ‘state-of-the-art’ solutions manage tasks in an
EDF implementation in O(log n) time. This Section will 9. Repeat from step 5 until the next timer interrupt (step 2).
present two solutions that can potentially improve upon this It is clear from this description, and previous descriptions
situation by performing task management in time proportional of the HoH technique discussed by Mok [14], that this solution
to O(log pm), or equivalently linear in b. In some systems, e.g. implements the EDF algorithm; the main point of note is that
small embedded systems, pm is likely to be small – for example any sub-set of tasks that are simultaneously released, will be
1024 – and may lead to more efficient scheduler sequentially processed (neglecting preemptions and
implementations in such situations. Please note that proposed interference from tasks with later release times) in order of
solutions assume that only feasible task sets are to be scheduled increasing relative deadlines.
on a single processor; transient or permanent task overloads
Although this solution will essentially run in constant time
may lead to timeline breaks, are not considered further in this
paper. The first solution employs an extension of the timing for a given (fixed) pm, the size of the fixed constant
wheel, and is described below. degenerates to pm iterations in the worst case, which is
unacceptable in most situations. The size of this fixed constant
A. The Indexed Deadline Wheel Approach can be reduced to O(log pm) by introducing a hierarchical bit-
field index over the deadline wheel; such an index is
Suppose that pm is small enough to warrant the use of a illustrated for a pm with 12 slots in Fig. 5. When slot i contains
timing wheel in a given design; as mentioned this will a valid record, index bit i is set to a ‘1’, and ‘0’ otherwise. At a
normally be the case for kernels in small embedded systems. given time t, finding the nearest non-zero element can be
Suppose also that tasks are indexed by the scheduler such that performed by finding the least set bit after first left barrel-
for any two tasks ti and tj, i < j iff di <= dj; with such an shifting the index by t + 1, and then adding t + 1 to the
approach, a priority queue of tasks – sorted according to returned value (if the index is non-zero). Note that in a
minimum relative deadline – may be implemented using a feasible task set, no task should have a deadline at t; a valid
(hierarchical) bit-field representation, as discussed in Section record in slot t must have a deadline at t + pm mod(pm) = t.
2B. The main idea of the first solution is to extend the use of a

59
When the hierarchical bit-field index requires several levels, assumption of implicit and constrained deadline tasks – the
for example to cover a larger sized deadline wheel, although number of relative deadlines in the task set is upper-bounded
its implementation is slightly more complicated there are still by pm; the operations insert, peek min and delete on a bit-field
only a small number of operations required to determine the priority queue (or indexed array of FIFO queues) of size pm
nearest valid record. Even for large deadline wheels, this can be performed in O(log pm) [7]. Finally, searching the
reduces the value of the fixed constant to the time taken to deadline wheel for the nearest non-empty slot using the
execute a relatively small amount of integer operations, as will techniques described requires at most log pm iterations. Since
be shown in Section IV. all constituent operations can be performed in constant time,
the theorem is proved.

For cases when the use of timing and deadline wheels is


impractical, the next Section describes a somewhat more
generic approach employing digital search trees.

Figure 6. Using a LI and UI on a deadline wheel of size 2pm.

B. The Digital Search Tree Approach


A Digital Search Tree (DST) is a simple data structure that
Figure 5. Indexing the deadline wheel. was first described by Coffman [19]. In many respects a DST
At the expense of increased memory use, a simpler resembles a balanced binary search tree, but has two major
solution than performing barrel shifting of indexes would be advantages; firstly, component routines for inserting, deleting
as follows. If enough memory can be allocated such that the and searching for keys are extremely simple - the tree does not
deadline wheel is made up of 2pm records, then two (separate) require complicated rebalancing procedures [14]. Secondly,
indices of size pm – a lower index LI and an upper index UI - when keys are unsigned integers, for a key length of b bits and
may be placed over each half of the deadline wheel. Deadlines a radix r, these component routines have a worst-case time
may then be inserted into the wheel and time incremented complexity of O(b/r) [14][19][20]. These two advantages
mod(2pm); task activations can still be inserted into (and make a (slightly modified) DST a good candidate for EDF
retrieved from) a timing wheel of size pm. When the nearest task management. Each node in a DST has 2r child links (a
non-zero record is sought from the deadline wheel, then only binary DST has a radix r = 1). Inserting a key into a DST
two cases need to be considered. If the current time variable t involves simply traversing nodes until a NULL pointer is
is < pm then the lowest indexed non-zero element is first found. At this point, a new node is created from the supplied
sought from the LI; if no record is found then the lowest index data and attached to this NULL pointer. When a node is
is then sought from the UI. If the current time variable t is ≥ deleted, the node with the required key is first located (if it
pm then the lowest indexed non-zero element is first sought exists), and swapped with any one of its descendant leaf
from the UI; if no record is found then the lowest index is then nodes. The procedure that controls how nodes in the tree are
sought from the LI. This approach is illustrated in Fig. 6; traversed is simple; branches are taken according to the next r
given that no task has a deadline greater than t + pm, searching most significant bits of the key, from MSB to LSB. At each
the wheel in this manner will always return the next non-zero node, the supplied key is compared to the node key; if not
record, and no barrel-shifting of the indices is required. equal, the appropriate child node is simply taken from the
index of 2r possible child links. Insertion of the 4-bit key
Theorem 1: The use of a timing wheel, deadline wheel and ‘1010’ into a partially built DST with r = 1 is shown in Fig. 7
bit-field priority queues solve the EDF task management below.
problem in time proportional to log pm per scheduling event. It is trivial to observe that a DST is a good candidate for
implementing a waiting queue; for example, if time is
Proof: It suffices to show that all constituent operations in this represented by 16-bit integers, with a radix r = 4 each node
solution can be performed in O(log pm) or better. Firstly, it will have 16 possible child links, and insertion and deletion
follows from [1] that insertion, peek and removal operations into the DST will only require - at most - 4 nodes to be
on a sequential array of records such as a deadline or timing traversed, and 4 keys inspected, regardless of the number of
wheel can be performed in O(1). Secondly – under the

60
nodes in the DST1. Since a DST does not keep the nodes in a and in the right, an MSB of ‘1’. Now suppose that the ICTOH
sorted order, it may seem a poor candidate to implement a invariant - discussed in Section 2D - holds over the task period
priority queue; however a DST certainly has structure, and relationships, i.e. pm < 2b / 2. When the key with the earliest
given the node traversal procedure the following can be deadline is sought from the ready queue, the MSB of the
observed; the node with the smallest key lies on a path that is current time variable t is used to control which branch is first
traversed by starting at the root node and taking the left-most taken from the root node. If the MSB of t is ‘0’, then we
(non-NULL) child link at each node, until a leaf is simply search for the smallest key as detailed above. If the
encountered [14][20]. Thus, the node with the smallest key MSB of t is a ‘1’, however, at the root node we first attempt to
may be found by traversing no more than O(b/r) nodes, and take the right branch – if non-NULL – and search for the
taking the smallest key of those examined. For example, in smallest key in this sub-tree as before. If the right branch is
Fig. 6 the path 0101 → 0010 → 0011 is traversed, with the NULL, then the left branch is first taken from the root and the
smallest key located being 0010. For nodes with r > 1, a bit- smallest key sought as before. It is clear that given the period
wise index may be optionally employed at each node, to restrictions described above, any tasks with deadlines that
indicate non-NULL child links and speed up this process. have rolled over will not be processed until there are no keys -
at any particular time t > 2b / 2 - with a deadline value d lying
in the interval [t, 2b). For best results, then, it is suggested to
determine the number of bits b to use when encoding time for
a particular system according to the following simple
relationship:

b = ⎡log 2 pm ⎤ + 1
(2)

Theorem 2: The use of DST’s solves the EDF task


management problem in constant time per scheduling event for
fixed b and r, leading to a O(log pm) solution.
Proof: For fixed b and r, all DST operations – including
finding the minimum element – run in constant time, i.e. O(1)
[14][20]. Since b can be selected using (2), the proof trivially
follows from the discussion above.

It can be estimated that actual performance of a DST is


Figure 7. DST node insertion. largely dependent on the choice of radix r; for larger r, fewer
nodes need to be traversed (and hence keys compared).
Thus, when mission times are such that timer rollover is However, since each node requires 2r child links, for larger r
not an issue, the DST may be used to solve the EDF task the required storage space can increase dramatically. This is
management problem as follows. The same procedure as illustrated in Table I below, which shows the effect of
outlined in Section A may be employed, with both the waiting increasing radix on computation time (worst-case number of
and ready queues represented by DST’s. Each node in these key comparisons per insertion/lookup/deletion), and required
DST’s uses a key of either a task release time or absolute node storage space (memory locations) for a system using 16-
deadline respectively, and has as data a reference to a priority bit timestamps (keys). Note that when r = 16, the solution
queue implemented as a DST. The priority queue DST’s use a essentially mimics an (inefficient) timing wheel, and child
task’s relative deadline (encoded in b bits) as a key, and the links should be replaced by a fixed array of records. It can be
task index as data. Alternatively, since relative deadlines are observed from this that the DST approach is perhaps best
fixed, then bit-field priority queues may be employed instead suited to systems in which there is a relatively large amount of
of DST’s. However, in the case when timer rollover is an issue system memory available, and the number of tasks is
that must be dealt with, perhaps the simplest solution is as potentially very large (for example in a real-time x86-based,
follows. The root node of the ready queue DST can be made embedded PC system).
such that it cannot contain a key (or data); it has two child
links, and is simply used to branch on the MSB of the inserted Table I. Effect of increasing DST radix on memory and basic
key. Clearly, any nodes in the left sub-tree have an MSB of ‘0’ operation overheads (required number of key comparisons).

1
In the special case when the number of child nodes is made equal to pm, the
DST essentially acts as a timing wheel; it can be noted that in general a DST
resembles the hierarchical timing wheel described in [1], but gives improved
performance in terms of lookup, insertion and deletion.

61
C. Extensions to Sporadic Tasks [23]. The LPC2387 microcontroller from NXP
Sporadic tasks were introduced by Mok [21] to model semiconductors was employed in this study, running with a
event-driven, reactive real-time systems. Arrival times of CPU clock speed of 72 MHz. The device has 512 Kb of on-
sporadic tasks, as opposed to periodic tasks, are not explicitly chip flash, and 98 Kb of on-chip RAM. The following five
known beforehand; that is, successive arrivals of a task i are cases were considered in the experiments (note that a timing
not necessarily separated by exactly pi time units as with wheel with 1024 elements was used to represent the waiting
periodic tasks. Instead, all that can be guaranteed is that queue for cases 1-4):
successive arrivals of a sporadic task will be separated by a
minimum of pi time units. Typically, when an event (interrupt) 1. A single-indexed, 1024-element deadline wheel to
linked to a sporadic task occurs at time t, if the interrupt is represent the ready queue (SDW);
enabled the interrupt handler will simply update the absolute 2. A dual-indexed, 2048-element deadline wheel to represent
deadline of the task to t + di, and insert it into the ready queue. the ready queue (DDW);
The advantages of such a simple technique are numerous; the 3. A HoH to represent the ready queue, with the ICTOH
implementation is straightforward, each interrupt can be algorithm employed as the heap invariant (HoH);
assigned the same (or similar) priority, and on systems which 4. A single-indexed, 1024 element fixed-priority FIFO
make use of a multiplexed interrupt controller – e.g. x86 and arrangement to represent the ready queue (FPP).
ARM-based devices - only a single, generic interrupt handler 5. A basic O(n) EDF implementation, which indexes through
that reads back from an interrupt status register needs to be the task array using ICTOH to determine the best task to
coded (and hence verified) [22]. execute with each timer tick, and with each end task (LIN).
Most often it will be wished to enforce temporal isolation
between successive occurrences of a task. This may be These particular cases were chosen in order to compare the
achieved by temporarily disabling the corresponding interrupt proposed techniques (case 1 and 2) with both the current state-
source, upon activation. Most devices have multiple interrupt of-the-art (case 3), and also baseline cases including the fixed
channels that can be enabled or disabled by writing an priority case (case 4) and a basic EDF implementation (case
appropriate code to an Interrupt Enable Register (IER). After 5). The deadline monotonic priority assignment was employed
updating the task deadline and insertion into the ready queue, in case 4. Please note that for case 3, an implicit n-element
the interrupt activation time – the earliest time at which the array was employed as the ‘master’ heap, each element of
task can next be activated, i.e. t + pi – is also updated and an which contains a reference to a sub-heap of tasks consisting of
event inserted into a distinct interrupt waiting queue. This an indexed bit-field. In each experiment, dummy task sets
queue may be implemented by either a timing wheel or DST. were created. In order to induce the worst-case overhead
The data fields of the interrupt queue are integers of the same behavior in these task sets, the following parameters were
width as the IER; when an event to re-enable the ith bit of the employed for any particular task i, with the total number of
IER is inserted, the corresponding ith bit of the record is set to tasks being n (in time units of ms):
a ‘1’. With each timer interrupt, if a valid record is retrieved
from the interrupt waiting queue, its data contents are ORed ci = 1.1
with the IER and the record immediately deleted. Since the
number of interrupts is fixed (and in many cases equal to the pi = 4 n
word-width of the CPU) in any system implementation, such a d i = 4n − 2(n − i )
solution requires very little computation and has a constant
worse-case performance. ri = n − i
(3)
IV. EXPERIMENTAL ANALYSIS AND DISCUSSION
Although the task management techniques described in the A timer of granularity 1 ms was employed in each case.
previous Section have O(log pm) complexity, it is clear that This choice of task parameters ensured a relatively low CPU
there are several factors that may influence actual computation utilization (≈ 27.5%) and that at time t = n-1, task 1 is
times and required storage space on a given platform. This transferred to the ready queue – which already contains the
Section will describe a series of computational studies were remaining n-1 tasks - and becomes the task with the earliest
undertaken to begin to quantify this behavior in real situations. deadline (highest priority in case 4). This choice also ensured
that the tasks were indexed by the scheduler such that the
A. Target Hardware and Experimental Procedures index i was inversely proportional to the relative deadline d
A total of five experiments were carried out in order to (or fixed priority in case 4). Similarly, prior to crossing the
explore and compare the performance of the proposed hyperperiod boundary (at t = 2n + n-1 [3][17]), task n is
techniques - in terms of time taken to perform scheduling required to be inserted into the waiting queue - which contains
events - as the number of tasks increases. These experiments all remaining n-1 tasks, each with a different activation time.
were carried out on what is currently considered to be a typical The overheads in each case were measured as the number of
hardware platform for the implementation of deeply- tasks was increased, starting from n = 4, and doubled in each
embedded real-time systems, the 32-bit ARM7TDMI-S core successive case. A maximum of 256 tasks was used, as above

62
this point the internal memory space of the device became ready and waiting queues. The cost of task preemption was
exceeded; there seems to be little option but to implement task fixed for each implementation, and therefore omitted. It was
data structures with anything less than O(n) storage. found that the cost of saving a task context was 0.36 µs, and
Timestamps from a free-running timer with an accuracy of 0.1 restoring a task context 0.39 µs; heavy use was made of the
µs were used to calculate the execution times of both the timer ARM7 Fast-Interrupt-reQuest (FIQ) scratch registers and the
‘tick’ overheads and ‘end task’ overheads. The GNUARM C store/load multiple register instructions for these routines.
complier was employed (within the Keil µVision IDE) to Results for each of the experiments detailed in the previous
generate code for the target; code optimization was disabled to Section are shown in tabular form in Table II, and graphically
prevent out-of-order instruction execution impacting the in Figs. 8 and 9, which show the timer tick and end task
results. overheads on a logarithmic scale representing n. Please note
that the SDW results were omitted from these figures to
B. Experimental Results
improve clarity, due to their similarity to the DDW results.
The results to be presented are in units of µs, and include
only the cost of updating task data structures, TCBs, and the

Table II. Effect of increasing task numbers on overheads (µs).

Figure 8. Graphical representation of increasing task numbers on scheduler ‘tick’ overheads.

63
Figure 9. Graphical representation of increasing task numbers on scheduler ‘end task’ overheads.

the expense of increased memory used to implement the larger


C. Discussion
deadline wheel. It may be observed that the reported levels of
It can be observed from these data and figures that when overhead - in all cases - may be improved by allowing code
the number of tasks n is relatively small (e.g. ≤ 8), then the optimizations (either at the compiler level or by hand-
overheads in all cases are comparable to the overheads assembly), however these benefits would be effectively
reported for other basic O(n) solutions (with a small number uniform and would serve little purpose in a comparative study.
of tasks) that have been previously reported for similar targets Informal experimentation seems to suggest these
[23][24]. Indeed in such a situation, the basic LIN approach as optimizations have a greater effect on the ‘end task’ overheads
implemented in this study is comparable with (or outperforms) on the current platform.
competing methods; since the code implementation and It is interesting to observe the level of overheads between
required data structures are also comparatively simpler, the proposed techniques and the fixed-priority case.
prospective system designers should be aware this would Considering Fig. 5, in the fixed-priority case each record
appear to be the implementation of choice for small n. As the would consist of a FIFO queue containing tasks with the same
number of tasks increases, the logarithmic rise in overheads of priority. To determine the best task, they may be searched for
the HoH approach can be observed, whilst the FPP, SDW and the first non-empty record in a fixed order from record 0
DDW overheads remain effectively constant. The drawback of (highest priority) to 11 (lowest priority). The only differences
the LIN approach also becomes apparent; as may be expected, between the dynamic- and fixed- priority cases in this situation
the ‘tick’ overheads are not competitive with the DDW, SDW are that i) deadlines (and not fixed priorities) are used for
and FPP for n > 6, and with the HoH method for n > 16. insertion into a particular FIFO record and ii) the FIFO
Although the ‘end task’ overheads perform somewhat better, records are searched not in a fixed order (highest to lowest)
they are still not competitive with the alternatives for n > 24 but in an order that depends on the current time t. As
and n > 64 respectively. Although the logarithmic rise in demonstrated by the results obtained, the run-time differences
overheads in the HoH case is clearly acceptable for many between the two are almost – but not exactly - negligible.
systems, these results would indicate a small – but perhaps A final point for discussion is the amount of memory space
significant in some cases - improvement in the SDW and that is required for each implementation. It is possible to
DDW techniques over the HoH technique in this instance. calculate prior to run-time the number of bit field priority
Since it known that overheads can lead to increased task queues that are required. In the worst case, a safe upper bound
response times (e.g. see [23][24]), the adoption of the to employ would be min(n, pm) queues, although – as is the
proposed techniques may prove beneficial to certain classes of also the case with stack sharing [8][23] - harmonically related
real-time system. Although the DDW overheads are periodic tasks may sometimes share a single queue. Memory
marginally less than the SDW (and are almost space may be allocated statically at compile time for these
indistinguishable from the FPP case), as mentioned this is at

64
queues, and single-level pointer indirection used to build the [7] G.C. Buttazzo, “Rate Monotonic vs. EDF: Judgement Day”, Real-Time
Systems, No. 29, pp. 5-26, 2005.
records to be inserted into the deadline and timing wheels.
[8] G.C. Buttazzo, Hard Real-Time Computing Systems: Predictable
These latter two points are obviously beneficial with respect to Scheduling Algorithms and Applications”, Spinger-Verlag, New York,
coding guidelines and standards such as MISRA, which 2005.
forbids the use of dynamic memory allocation and places strict [9] M.R. Garey and D.S. Johnson, Computers and Intractability: A guide to
limits on the use of pointer indirection [25]. the Theory of NP-Completeness, W.H. Freeman & Co Ltd, April 1979.
[10] L. George, N. Rivierre and M. Supri, “Preemptive and Non-Preemptive
V. CONCLUSIONS AND FURTHER WORK Real-Time Uni-Processor Scheduling”, Research Report RR-2966,
INRIA, Le Chesnay Cedex, France, 1996.
This paper has considered techniques for the enforcement
[11] K. Jeffay, D. Stanat and C. Martel, “On Non-Preemptive Scheduling of
of EDF scheduling on periodic and sporadic task sets on a Periodic and Sporadic Tasks”, Proc. of the IEEE Real-Time Systems
single-processor. Previous work in this area has been Symposium, 1991.
described, and several different solution techniques have been [12] C. Leiserson, H. Prokop and K. H. Randall, “Using de Bruijn Sequences
suggested. These solution techniques are primarily intended to to Index a 1 in a Computer Word,” MIT Technical Report, web:
http://supertech.csail.mit.edu/papers/debruijn.pdf [Accessed 08-09-
reduce the overheads to a relatively small value given by 2009], 1998.
log(pm), and essentially independent of the number of tasks. [13] J. Aas, “Understanding the Linux 2.6.8.1 CPU Scheduler,” Silicon
Preliminary results have been presented that indicate the Graphics Technical Report, web: http://josh.trancesoftware.com/linux
potential usefulness of the techniques; however more [Accessed 08-09-2009], 2005.
exploration in this area is required. Specifically, further work [14] R. Sedgewick, Algorithms in C: Third Edition. Addision-Wesley, 2002.
will aim to assess and compare the potential memory / [15] A. Mok, “Task Management Techniques for Enforcing ED Scheduling
execution time tradeoffs further, and explore the techniques on a Periodic Task Set”, Proceedings of the Fifth IEEE Workshop on
Real-Time Software and Operating Systems, pp. 42-46, Washington,
based around DST’s. Finally, perhaps the most interesting D.C., May 12-13, 1988.
observation of this paper - which would seem to be supported [16] B. Brandenburg and J. Anderson, “On the Implementation of Global
by the computational study - is that when compared to the Real-Time Schedulers,” to appear in Proceedings of the 30th IEEE Real-
fixed-priority case, the proposed EDF scheduling techniques Time Systems Symposium (RTSS 2009), IEEE, December 2009.
not only have the same analytical worst-case overheads, but [17] S. Baruah, L. Rosier and R. Howell, “Algorithms and Complexity
the observable levels of overheads in both cases seems to be Concerning the Preemptive Scheduling of Periodic, Real-Time Tasks on
One Processor”, Real-Time Systems, Vol. 2, No. 4, pp. 301-324, 1991.
almost indistinguishable in practice.
[18] A. Carlini and G.C. Buttazzo, “An Efficient Time Representation for
Real-Time embedded Systems,” in Proc. Of the ACM Symp. On
ACKNOWLEDGMENT Applied Computing (SAC 2003), Florida, USA, pp. 705-712, March
The author wishes to express thanks to Prof. Aloysius K. 2003.
Mok for kindly supplying a copy of reference [15] during the [19] E. Coffman, Jnr, “File structures using hashing functions,”
preparation of the paper, which was out of print in the UK at Communications of the ACM, Vol. 13, No. 7.
the time. The author also wishes to express thanks to Dr. [20] F.Plavec, Z.G. Vranesic and S.D. Brown, “On Digital Search Trees: A
Simple Method for Construcing Balanced Binary Trees,” in Proceedings
Nathan W. Fisher for his guidance when preparing the revised of the 2nd International Conference on Software and Data Technologies
manuscript. (ICSOFT '07), Vol. 1, Barcelona, Spain, pp. 61-68, July 2007.
[21] A. Mok, “Fundamental Design Problems of Distributed Systems for the
REFERENCES Hard Real-Time Environment”, PhD Dissertation, MIT, USA, 1983.
[22] M. Short, “Waving goodbye to the interrupt handler: implementing the
[1] G. Vargese and A. Lauck, “Hashed and Hierachical Timing Wheels:
sporadic task model on modern microcontrollers,” University of
Efficient Data Structures for Implementing a Timer Facility,” IEEE Leicester Technical Report ESL:09:02, April 2009.
Transactions On Networking, Vol. 5, No. 6, pp. 824-834, 1997.
[23] G.C. Buttazzo and P. Gai, “Efficient EDF Implementation for Small
[2] M. Pont, Patterns for time-triggered embedded systems, ACM Press / Embedded Systems,” in Proc. Of the 2nd Int. Workshop on Operating
Addison-Wesley Education, 2001.
System Platform for Embedded Real-Time applications (OSPERT
[3] E. Coffman, Jnr, “Introduction to Deterministic Scheduling Theory”, in 2006), Dresden, Germany, July 2006.
Computer and Job-Shop Scheduling Theory, Wiley, New York, 1976.
[24] F. Bimbard and L. George, “EDF Feasibility Conditions with Kernel
[4] J.R. Jackson, “Scheduling a Production Line to Minimize Maximum Overheads on an Event Driven OSEK System,” In Proc. Third Int. Conf.
Tardiness”, Research Report 43, Management Science Research Project, on Systems (ICONS), pp. 277-284, Cancun, Mexico, 2008.
University of California, Los Angeles, USA, 1955.
[25] MISRA Consortium, Guidelines for the use of the C language in vehicle
[5] M.L. Dertouzos, “Control robotics: the procedural control of physical based software, Motor Industry Software Reliability Report, Released
processes”, Information Processing, Vol. 74, 1974. October 2004.
[6] J. Liu and J. Layland, “Scheduling algorithms for multiprogramming in
a hard real-time environment,” Journal of the ACM, Vol. 20, No. 1, pp.
46-61, 1973.

65

You might also like