0% found this document useful (0 votes)
45 views12 pages

AnalysisOfFederatedyGlobalScheduling4ParallelRTTask Li Chen Agrawal Gill

This document analyzes scheduling strategies for parallel real-time tasks modeled as directed acyclic graphs (DAGs). It examines three strategies: federated scheduling, which the paper proposes; global earliest deadline first (G-EDF); and global rate monotonic (G-RM). The paper proves capacity augmentation bounds for all three strategies and shows federated scheduling can schedule task sets with a total utilization of m on m cores of speed 2, while G-EDF and G-RM require speeds of 2.618 and 3.732 respectively. It also reviews related work on scheduling bounds for parallel real-time tasks.

Uploaded by

johnny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views12 pages

AnalysisOfFederatedyGlobalScheduling4ParallelRTTask Li Chen Agrawal Gill

This document analyzes scheduling strategies for parallel real-time tasks modeled as directed acyclic graphs (DAGs). It examines three strategies: federated scheduling, which the paper proposes; global earliest deadline first (G-EDF); and global rate monotonic (G-RM). The paper proves capacity augmentation bounds for all three strategies and shows federated scheduling can schedule task sets with a total utilization of m on m cores of speed 2, while G-EDF and G-RM require speeds of 2.618 and 3.732 respectively. It also reviews related work on scheduling bounds for parallel real-time tasks.

Uploaded by

johnny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Analysis of Federated and Global Scheduling for Parallel Real-Time Tasks

Jing Li† , Jian-Jia Chen§ , Kunal Agrawal† , Chenyang Lu† , Chris Gill† , Abusayeed Saifullah†

Washington University in St. Louis, U.S.A.
§
TU Dortmund University, Germany
[email protected], § [email protected], {kunal, lu, cdgill}@cse.wustl.edu, [email protected]

Abstract for task sets with intra-task parallelism (in addition to


inter-task parallelism), where individual tasks are parallel
This paper considers the scheduling of parallel real- programs and can potentially utilize more than one core in
time tasks with implicit deadlines. Each parallel task is parallel. These models enable tasks with higher execution
characterized as a general directed acyclic graph (DAG). demands and tighter deadlines, such as those used in
We analyze three different real-time scheduling strate- autonomous vehicles [31], video surveillance, computer
gies: two well known algorithms, namely global earliest- vision, radar tracking and real-time hybrid testing [28]
deadline-first and global rate-monotonic, and one new In this paper, we consider the general directed acyclic
algorithm, namely federated scheduling. The federated graph (DAG) model. We analyze three different scheduling
scheduling algorithm proposed in this paper is a gener- strategies: a new strategy, namely federated scheduling,
alization of partitioned scheduling to parallel tasks. In and two classic strategies, namely global EDF and global
this strategy, each high-utilization task (utilization ≥ 1) rate-monotonic scheduling. We prove that all three strate-
is assigned a set of dedicated cores and the remaining gies provide strong performance guarantees, in the form
low-utilization tasks share the remaining cores. We prove of capacity augmentation bounds, for scheduling parallel
capacity augmentation bounds for all three schedulers. In DAG tasks with implicit deadlines.
particular, we show that if on unit-speed cores, a task One can generally derive two types of performance
set has total utilization of at most m and the critical- bounds for real-time schedulers. The traditional bound
path length of each task is smaller than its deadline, is called a resource augmentation bound (also called
then federated scheduling can schedule that task set on a processor speed-up factor). A scheduler S provides a
m √cores of speed 2; G-EDF can schedule it with speed resource augmentation bound of b ≥ 1 if it can successfully
3+ 5
2 √ ≈ 2.618; and G-RM can schedule it with speed schedule any task set τ on m cores of speed b as long as
2 + 3 ≈ 3.732. We also provide lower bounds on the the ideal scheduler can schedule τ on m cores of speed
speedup and show that the bounds are tight for federated 1. A resource augmentation bound provides a good notion
scheduling and G-EDF when m is sufficiently large. of how close a scheduler is to the optimal schedule, but
it has a drawback. Note that the ideal scheduler is only
a hypothetical scheduler, meaning that it always finds a
I. Introduction feasible schedule if one exists. Unfortunately, since we
often cannot tell whether the ideal scheduler can schedule a
In the last decade, multicore processors have become given task set on unit-speed cores, a resource augmentation
ubiquitous and there has been extensive work on how to bound may not provide a schedulability test.
exploit these parallel machines for real-time tasks. In the Another bound that is commonly used for sequential
real-time systems community, there has been significant tasks is a utilization bound. A scheduler S provides a
research on scheduling task sets with inter-task parallelism utilization bound of b if it can successfully schedule any
— where each task in the task set is a sequential program. task set which has total utilization at most m/b on m
In this case, increasing the number of cores allows us to cores.1 A utilization bound provides more information than
increase the number of tasks in the task set. However, since a resource augmentation bound; any scheduler that guar-
each task can only use one core at a time, the computa- antees a utilization bound of b automatically guarantees a
tional requirement of a single task is still limited by the
capacity of a single core. Recently, there has been some 1 A utilization bound is often stated in terms of 1/b; we adopt this
interest in the design and analysis of scheduling strategies notation in order to be consistent with the other bounds stated here.
resource augmentation bound of b as well. In addition, it 3.73 using global deadline monotonic scheduling. Nelissen
acts as a very simple schedulability test in itself, since the et al. [40] proved a resource augmentation bound of 2
total utilization of the task set can be calculated in linear for general synchronous tasks. More recently, Saifullah et
time and compared to m/b. Finally, a utilization bound al. [44] provide a decomposition strategy for general DAG
gives an indication of how much load a system can handle; tasks that provides a capacity augmentation bound of 4.
allowing us to estimate how much over-provisioning may For non-decomposition strategies, researchers have
be necessary when designing a platform. Unfortunately, it studied primarily global earliest deadline first (G-EDF)
is often impossible to prove a utilization bound for parallel and global rate-monotonic (G-RM). Andersson and Niz [4]
systems due to Dhall’s effect; often, we can construct show that G-EDF provides resource augmentation bound
pathological task sets with utilization arbitrarily close to of 2 for synchronous tasks with constrained deadlines.
1, but which cannot be scheduled on m cores. Both Li et al. [35] and Bonifaci et al. [15] concurrently
Li et al. [35] defined a concept of capacity augmenta- showed that G-EDF provides a resource augmentation
tion bound which is similar to the utilization bound, but bound of 2 for general DAG tasks with arbitrary deadlines.
adds a new condition. A scheduler S provides a capacity In their paper, Bonifaci et al. also proved that G-RM
augmentation bound of b if it can schedule any task set provides a resource augmentation bound of 3 for parallel
τ which satisfies the following two conditions: (1) the DAG tasks with arbitrary deadlines; Li et al. also provide
total utilization of τ is at most m/b, and (2) the worst- a capacity augmentation bound of 4 for G-EDF for task
case critical-path length of each task Li (execution time sets with implicit deadlines.
of the task on an infinite number of cores)2 is at most 1/b In summary, the best known capacity augmentation
fraction of its deadline. A capacity augmentation bound is bound for implicit deadlines tasks are 4 for DAG tasks us-
quite similar to a utilization bound: it also provides more ing G-EDF, and 3.73 for parallel synchronous tasks using
information than a resource augmentation bound does; any decomposition combined with G-DM. The contributions of
scheduler that guarantees a capacity augmentation bound this paper are as follows:
of b automatically guarantees a resource augmentation 1 We propose a novel federated scheduling strategy.
bound of b as well. It also acts as a very simple schedula- Here, each high-utilization task (utilization ≥ 1)
bility test. Finally, it can also provide an estimation of the is allocated a dedicated cluster (set) of cores. A
load a system is expected to handle. multiprocessor scheduling algorithm is used to sched-
There has been some recent research on proving both ule all low-utilization tasks, each of which is run
resource augmentation bounds and capacity augmentation sequentially, on a shared cluster composed of the
bounds for various scheduling strategies for parallel tasks. remaining cores. Federated scheduling can be seen
This work falls in two categories. In decomposition-based as a partitioned strategy generalized to parallel tasks.
strategies, the parallel task is decomposed into a set of This is the best known capacity augmentation bound
sequential tasks and they are scheduled using existing for any scheduler for parallel DAGs.
strategies for scheduling sequential tasks on multiproces- 2 We prove that the capacity augmentation bound for
sors. In general, decomposition-based strategies require this federated scheduler is 2. In addition, we also
explicit knowledge of the structure of the DAG off-line in show no scheduler can provide a better capacity
order to apply decomposition. In non-decomposition based augmentation bound of 2 − 1/m for parallel tasks.
strategies, the program can unfold dynamically since no Therefore, a bound of 2 for federated scheduling is
off-line knowledge is required. tight when m is large enough.
For a decomposed strategy, most prior work considers 3 We improve√ the capacity augmentation bound of G-
synchronous tasks (subcategory of general DAGs) with im- EDF to 3+2 5 ≈ 2.618 for DAGs. When m is large,
plicit deadlines. Lakshmanan et al. [32] proved a capacity there is a matching lower bound for G-EDF [35];
augmentation bound of 3.42 for partitioned fixed-priority hence, this result closes the gap for large m. This
scheduling for a restricted category of synchronous tasks 3 is the best known capacity augmentation bound for
under decomposed deadline monotonic scheduling. Saiful- any global scheduler for parallel DAGs.
lah et al. [45] provide a different decomposition strategy 4 We show that √G-RM has a capacity augmentation
for general parallel synchronous tasks and prove a capacity bound of 2 + 3 ≈ 3.732. This is the best known
augmentation bound of 4 when the decomposed tasks are capacity augmentation bound for any fixed-priority
scheduled using global EDF and 5 when scheduled using scheduler for DAG tasks. Even if restricted to syn-
partitioned DM. Kim et al. [31] provide another decompo- chronous tasks, this is still the best bound for global
sition strategy and prove a capacity augmentation bound of fixed priority scheduling without decomposition.
The paper is organized as follows. Section II defines
2 Critical-path length of a sequential task is equal to its execution time the DAG model for parallel tasks and provides some
3 Fork-join task model, in their terminology. definitions. Section III presents our federated scheduling

2
algorithm and proves the augmentation bound. Section IV Utilization-Based Schedulability Test. In this paper, we
proves a lower bound for any scheduler for parallel tasks. analyze schedulers in terms of their capacity augmentation
Section V presents a canonical form to give an upper bound bounds. The formal definition is presented here:
of the work of a DAG that should be done in a specified
Definition 1. Given a task set τ with total utilization of
interval length. Section VI proves that G-EDF provides a
UP , a scheduling algorithm S with capacity augmenta-
capacity augmentation bound of 2.618. Section VII shows
tion bound b can always schedule this task set on m cores
that G-RM provides a capacity augmentation bound of
of speed b as long as τ satisfies the following conditions
3.732. We discuss some practical considerations of the
on unit speed cores.
three schedulers in Section VIII. Section IX discusses X
related work and Section X concludes this paper. Utilization does not exceed total cores, ui ≤ m (1)
τi ∈τ
II. System Model For each task τi ∈ τ, the critical path Li ≤ Di (2)
We now present the details of the DAG task model for
Since no scheduler can schedule a task set τ on m unit
parallel tasks and some additional definitions.
speed cores unless Conditions (1) and (2) are met, a capac-
We consider a set τ of n independent sporadic real-
ity augmentation bound automatically leads to a resource
time tasks {τ1 , τ2 , . . . , τn }. A task τi represents an infinite
augmentation bound. This definition can be equivalently
sequence of arrivals and executions of task instances (also
stated (without reference to the speedup factor) as follows:
called jobs). We consider the sporadic task model [9,
Condition (1) says that the total utilization UP is at most
29] where, for a task τi , the minimum inter-arrival time
m/b and Condition (2) says that the critical-path length
(or period) Ti represents the time between consecutive
of each task is at most 1/b of its relative deadline, that
arrivals of task instances, and the relative deadline Di
is, Li ≤ Di /b. Therefore, in order to check if a task set
represents the temporal constraint for executing the job.
is schedulable we only need to know the total task set
If a task instance of τi arrives at time t, the execution of
utilization, and the maximum critical-path utilization. Note
this instance must be finished no later than the absolute
that a scheduler with a smaller b is better than another with
deadline t + Di and the release of the next instance of task
a larger b, since when b = 1 S is an optimal scheduler.
τi must be no earlier than t plus the minimum inter-arrival
time, i.e. t+Ti . In this paper, we consider implicit deadline III. Federated Scheduling
tasks where each task τi ’s relative deadline Di is equal to
its minimum inter-arrival time Ti ; that is, Ti = Di . We This section presents the federated scheduling strategy
consider the schedulability of this task set on a uniform for parallel tasks with implicit deadlines. We prove that it
multicore system consisting of m identical cores. provides a capacity augmentation bound of 2 on m-core
Each task τi ∈ τ is a parallel task and is characterized as machine parallel real-time tasks.
a directed acyclic graph (DAG). Each node (subtask) in the A. Federated Scheduling Algorithm
DAG represents a sequence of instructions (a thread) and
each edge represents a dependency between nodes. A node Given a task set τ , the federated scheduling algorithm
(subtask) is ready to be executed when all its predecessors works as follows: First, tasks are divided into two disjoint
have been executed. Throughout this paper, as it is not sets: τhigh contains all high-utilization tasks — tasks with
necessary to build the analysis based on specific structure worst-case utilization at least one (ui ≥ 1), and τlow
of the DAG, only two parameters related to the execution contains all the remaining low-utilization tasks. Consider a
pattern of task τi are defined: high-utilization task τi with worst-case execution time Ci ,
• total execution time (or work) Ci of task τi : This is worst-case critical-path length Li , and deadline Di (which
the summation of the worst-case execution times of all is equal to its period Ti ). We assign ni dedicated cores to
the subtasks of task τi . τi , where ni is  
Ci − Li
• critical-path length Li of task τi : This is the length ni = (3)
of the critical-path in the given DAG, in which each Di − Li
P
node is characterized by the worst-case execution time We use nhigh = τi ∈τhigh ni to denote the total num-
of the corresponding subtask of task τi . Critical-path ber of cores assigned to high-utilization tasks τhigh . We
length is the worst-case execution time of the task on assign the remaining cores to all low-utilization tasks τlow ,
an infinite number of cores. denoted as nlow = m − nhigh . The federated scheduling
Given a DAG, obtaining work Ci and critical-path length P the task set τ , if nlow is non-negative
algorithm admits
Li [46, pages 661-666] can both be done in linear time. and nlow ≥ 2 τi ∈τlow ui .
For brevity, the utilization C Ci
Ti = Di of task τi is denoted
i
After a valid core allocation, runtime scheduling pro-
by ui for implicit
P deadlines. The total utilization of the task ceeds as follows: (1) Any greedy (work-conserving) paral-
set is UP = τi ∈τ ui . lel scheduler can be used to schedule a high-utilization task

3
τi on its assigned ni cores. Informally, a greedy scheduler Lemma 3. [Li13] If a job of task τi is executed by a
is one that never keeps a core idle if some node is ready to greedy scheduler, then every incomplete step reduces the
execute. (2) Low-utilization tasks are treated and executed remaining critical-path length of the job by 1.
as though they are sequential tasks and any multiprocessor
scheduling algorithm (such as partitioned EDF [37], or From Lemmas 2 and 3, we can establish Theorem 2.
various rate-monotonic schedulers [3]) with a utilization
bound of at most 1/2 can be used to schedule all the low-
Theorem 2. If an implicit-deadline m deterministic parallel
utilization tasks on the allocated nlow cores. The important l
Ci −Li
observation is that we can safely treat low-utilization tasks task τi is assigned ni = D i −Li
dedicated cores, then
as sequential tasks since Ci ≤ Di and parallel execution all its jobs can meet their deadlines, when using a greedy
is not required to meet their deadlines.4 scheduler.

B. Capacity Augmentation Bound of 2 for Proof: For contradiction, assume that some job of a high-
Federated Scheduling utilization task τi misses its deadline when scheduled on
ni cores by a greedy scheduler. Therefore, during the Di
Theorem 1. The federated scheduling algorithm has a time steps between the release of this job and its deadline,
capacity augmentation bound of 2. there are fewer than Li incomplete steps; otherwise, by
To prove Theorem 1, we consider a task set τ that Lemma 3, the job would have completed. Therefore, by
satisfies Conditions (1) and (2) from Definition 1 for b =2. Lemma 2, the scheduler must have finished at least ni Di −
Then, we (1) state the relatively obvious Lemma 1; (2) (ni − 1)Li work.
prove that a high utilization task τi meets its deadline ni Di − (ni − 1)Li = ni (Di − Li ) + Li
when assigned ni cores; and (3) Pshow that nlow is non-

Ci − Li

negative and satisfies nlow ≥ b τi ∈τlow ui and therefore = (Di − Li ) + Li
Di − Li
all low utilization tasks in τ will meet deadlines when
Ci − Li
scheduled using any multiprocessor scheduling strategy ≥ (Di − Li ) + Li = Ci
with utilization bound no less than b (i.e. can afford total Di − Li
task set utilization of m/b = 50%m). These three steps Since the job has work of at most Ci , it must have finished
complete the proof. in Di steps, leading to a contradiction.
Lemma 1. A task set τ is classified into disjoint subsets Low-Utilization Tasks are Schedulable. We first calculate
s1 , s2 , ..., sk , and each subset is assigned a dedicated a lower bound on nlow , the number of total cores assigned
P of cores with size n1 , n2 , ..., nk respectively, such
cluster to low-utilization tasks, when a task set τ that satisfies
that i ni ≤ m. If each subset sj is schedulable on its Conditions (1) and (2) of Definition 1 for b = 2 is
nj cores using some scheduling algorithm Sj (possibly scheduled using a federated scheduling strategy.
different for each subset), then the whole task set is
guaranteed to be schedulable on m cores. Lemma 4. The number ofP cores assigned to low-utilization
High-Utilization Tasks Are Schedulable. Assume that a tasks is at least nlow ≥ 2 low ui .
machine’s execution time is divided into discrete quanta
called steps. During each step a core can be either idle or Proof: Here, for brevity of the proof, we denote σi = DLi .
i

performing one unit of work. We say a step is complete It is obvious that Di = σi Li and hence Ci = Di ui =
if no core is idle during that step, and otherwise we say σi ui Li . Therefore,
it is incomplete. A greedy scheduler never keeps a cores 
Ci − Li
 
σi ui Li − Li
 
σi ui − 1

idle if there is ready work available. Then, for a greedy ni = = =
Di − Li σi Li − Li σi − 1
scheduler on ni cores, we can state two straightforward
lemmas [35].
Since each task τi in task set τ satisfies Condition (2)
Lemma 2. [Li13] Consider a greedy scheduler running on of Definition 1 for b = 2; therefore, the critical-path length
ni cores for t time steps. If the total number of incomplete of each task is at most 1/b of its relative deadline, that is,
steps during this period is t∗ , the total work F t done Li ≤ Di /b =⇒ σi ≥ b = 2.
during these time steps is at least F t ≥ ni t − (ni − 1)t∗ .
By the definition of high-utilization task τi , we have
1 ≤ ui . Together with σi ≥ 2, we know that:
4 Even if these tasks are expressed as parallel programs, it is easy to
enforce correct sequential execution of parallel tasks — any topological (ui − 1)(σi − 2)
ordered execution of the nodes of the DAG is a valid sequential execution.
0≤
σi − 1

4
From the definition of ceiling, we can derive is schedulable and we can admit it without deadline misses.

σi ui − 1
 This schedulability test admits a strict superset of tasks
ni = admitted by the capacity augmentation bound test, and in
σi − 1
practice, it may admits task sets with utilization greater
σi ui − 1 σi ui + σi − 2
< +1= than m/2.
σi − 1 σi − 1
σi ui + σi − 2 (ui − 1)(σi − 2) IV. Lower Bound on Capacity Augmentation
≤ +
σi − 1 σi − 1 of Any Scheduler for Parallel Tasks
σi ui + σi − 2 + σi ui − 2ui − σi + 2
=
σi − 1 On a system with m cores, consider a task set τ with a
2σi ui − 2ui 2ui (σi − 1) single task, τ1 , which starts with sequential execution for
= = = 2ui
σi − 1 σi − 1 1 −  time and then forks m−1 + 1 subtasks with execution
= bui time . Here, we assume  is an arbitrarily small positive
number. Therefore, the total work of task τ1 is C1 = m
In summary, for each high-utilization
P task, nP
i < bui . and its critical-path length Li = 1. The deadline (and also
So, their sum τhigh satisfies nhigh = high ni < b high ui . minimum inter-arrival time) of τ1 is 1.
Since the task set also satisfies Condition (1), we have
X X X Theorem 3. The capacity augmentation bound for any
nlow = m − nhigh > b ui − b ui = b ui 1
scheduler for parallel tasks on m cores is at least 2 − m ,
all high low +
when  → 0.
Thus, the number of remaining cores P allocated to low- Proof: Consider the system defined above. The finishing
utilization tasks is at least nlow > 2 low ui .
time of τ1 by running at speed α is not earlier than 1−
α +
m−1
 
Corollary 1. For task sets satisfying Conditions (1) and
mα = α2 − mα 1
− α . If α > 2 − m 1
and  →+ 0,
(2), a multiprocessor scheduler with utilization bound of 2 1 
then − mα − α > 1, then task τ1 misses its deadline.
α
at least 50% can schedule all the low-utilization tasks Therefore, we reach the conclusion.
sequentially on the remaining nlow cores. Since Lemma 3 works for any scheduler for parallel
tasks, we can conclude that the lower bound on capacity
Proof: Low-utilization tasks are allocated nlow cores, and
augmentation of federated scheduling is at least 2 , when m
from Lemma 4 we know that the total utilization of the low
is sufficiently large. Since we have shown that the upper
utilization tasks is less than nlow /b = 50%nlow . Therefore,
bound on capacity augmentation of federated scheduling
any multiprocessor scheduling algorithm that provides a
is also 2, therefore, we have closed the gap between the
utilization bound of 2 (i.e., can schedule any task set with
lower and upper bound of federated scheduling for large
total worst-case utilization ratio no more than 50%) can
m. Moreover, for sufficiently large m, federated scheduling
schedule it.
has the best capacity augmentation bound, among all
Many multiprocessor scheduling algorithms (such as
schedulers for parallel tasks.
partitioned EDF or partitioned RM) provide a utilization
bound of 1/2 (i.e., 50%) to sequential tasks. That is, given V. Canonical Form of a DAG Task
nlow cores, they can schedule any task set with a total
worst-case utilization up to nlow /2. Using any of these In this section, we introduce the concept of a DAG’s
algorithms for low-utilization tasks will guarantee that canonical form. Note each task can have an arbitrarily
the federated algorithm meets all deadlines with capacity complex DAG structure which may be difficult to analyze
augmentation of 2. and may not even be known before runtime. However,
Therefore, since we can successfully schedule both high given the known task set parameters (work, critical path
and low-utilization tasks that satisfy Conditions (1) and (2), length, utilization, critical-path utilization, etc.) we repre-
we have proven Theorem 1 (using Lemma 1). sent each task using a canonical DAG that allows us to
As mentioned before, a capacity augmentation bound upper bound the demand of the task in any given interval
acts as a simple schedulability test. However, for federated length t. These results will play an important role when
scheduling, this test can be pessimistic, especially for tasks we analyze the capacity augmentation bounds for G-EDF
with high parallelism. Note, however, that the federated in Section VI and G-RM in Section VII. Recall that in this
scheduling algorithm described in Section III-A can also paper, we analyze tasks with implicit deadline, so period
be directly used as a (polynomial-time) schedulability equals to deadline (Ti = Di ).
test: given a task set, after assigning cores to each high- Recall that we classify each task τi as a low-utilization if
utilization task using this algorithm, if the remaining cores ui = Ci /Di < 1 (and hence Ci < Di ); or high-utilization
are sufficient for all low-utilization tasks, then the task set task, if τi ’s utilization ui ≥ 1.

5
For analytical purposes, instead of considering the com- task i as the maximum amount of work (computation) that
plex DAG structure of individual tasks τi , we consider a S∞,α must do on the sub-jobs of τi in any interval of length
canonical form τi∗ of task τi . The canonical form of a t. We can derive worki (t, α) as follows:
task is represented by a simpler DAG. In particular, each
worki (t, α) =
subtask (node) of task τi∗ has execution time , which is (
positive and arbitrarily small. Note that  is a hypothetical j i −kqi (Di − t, α)
C j k t ≤ Di
(4)
unit-node execution time. Therefore, it is safe to assume t t
Di Ci + worki (t − Di Di , α) t > Di .
that Di and Ci are both integers. Low and high-utilization
tasks have different canonical forms described below. Clearly, both qi (t, α) and worki (t, α) for a task depend
• The canonical form τi∗ of a low-utilization task τi is on the structure of the DAG.
simply a chain of Ci / nodes, each with execution time We similarly define qi∗ (t, α) for the canonical form τi∗ .
. Note that task τi∗ is a sequential task. As the canonical form in task τi∗ is well-defined, we can
• The canonical form τi∗ of a high-utilization task τi derive qi∗ (t, α) directly. Note that  can be arbitrarily small,
starts with a chain of Di / − 1 nodes each with execu- and, hence, its impact is ignored when calculating qi∗ (t, α).
tion time . The total work of this chain is Di − . The We can now define the canonical maximum load
last node of the chain forks all the remaining nodes. worki∗ (t, α) as the maximum workload of the canonical
Hence, all the remaining (Ci − Di + )/ nodes have task τi∗ in any interval t in schedule S∞,α . For a low-
an edge from the last node of this chain. Therefore, all utilization task τi , where Ci /Di < 1, and τi∗ is a chain, it
these nodes can execute entirely in parallel. is easy to see that the canonical workload is
Figure 1 provides an example for such a transformation
worki∗ (t, α) = (5)
for a high-utilization task. It is important to note that the 
canonical form τi∗ does not depend on the DAG structure 0

 t < Di − Ci
α
of τi at all. It depends only on the task parameters of τi . α · (tk − (Di − Cαi )) j Di − Cαi ≤ t ≤ Di
j k
  t Ci + worki∗ (t − t


Di Di Di , α) t > Di .
..
.
nodes

4 3 Similarly, for high-utilization tasks, where Ci /Di ≥ 1,


  ···   when  is arbitrarily small, we have
4+


2 1 ..
. worki∗ (t, α) = (6)

16  D
5 2 3  − 1 nodes
0

 t < Di − α i

Di
(a) original DAG (b) canonical form: heavy task j i −kDi + α · (t − (Dij− αk ))
C Di − Dαi ≤ t ≤ Di
 t Ci + worki∗ (t − t Di , α) t > Di .


Fig. 1: A high-utilization DAG task τi with Li = 12, Ci = Di Di
20, Ti = Di = 16, and ui = 1.25 and its canonical form,
where the number in each node is its execution time. Figure 2 shows the qi∗ (t, α), qi (t, α), worki∗ (t, α), and
As an additional analysis tool, we define a hypothetical worki (t, α) of the high-utilization task τi in Figure 1 when
scheduling strategy S∞ that must schedule a task set τ on Di = 16, α = 1, and α = 2. Note that worki∗ (t, α) ≥
an infinite number of cores, that is, m = ∞. With infinite worki (t, α). In fact, the following lemma proves that
number of cores, the prioritization of the sub-jobs becomes worki∗ (t, α) ≥ worki (t, α) for any t > 0 and α ≥ 1.
unnecessary and S can obtain an optimal schedule by Lemma 5. For any t > 0 and α ≥ 1, worki∗ (t, α) ≥
simply assigning a sub-job to a core as soon as that sub- worki (t, α).
job becomes ready for execution. Using this schedule, all
the tasks finish within their critical-path length; therefore, Proof: For low-utilization tasks, the entire work Ci is se-
if Li ≤ Di for all tasks τi in τ , the task set always meets quential. When t < Cαi , qi∗ (t, α) is αt, so qi (t, α) ≥ αt =
the deadlines. We denote this schedule as S∞ . Similarly, qi∗ (t, α). When Cαi ≤ t < Di , qi (t, α) = Ci = qi∗ (t, α).
S∞,α is the resulting schedule when A∞ schedules tasks Similarly, for high-utilization tasks, the first Di units
on cores of speed α ≥ 1. Note that S∞,α finishes a job of of work is sequential, so when t < Dαi , qi∗ (t, α) = αt. In
task τi exactly Li /α time units after it is released. addition, S∞,α finishes τi exactly Lαi time units after it is
We now define some notations based on S∞,α . Let released, while it finishes the τi∗ at Dαi . Since the critical-
qi (t, α) be the total work finished by S∞,α between the path length Li ≤ Di for all τi and τi∗ at unit-speed system,
arrival time ri of task τi and time ri + t. Therefore, in the when t < Lαi , qi (t, α) ≥ αt = qi∗ (t, α). When Lαi ≤ t <
Di ∗ Di ∗
interval from ri + t to ri + Di (interval of length Di − t) α , qi (t, α) = qi ( α , α) > qi (t, α), When Li < t ≤ Di ,
Di
the remaining Ci − qi (t, α) workload has to be finished. Lastly, when α ≤ t < Di , qi (t, α) = Ci = qi∗ (t, α)
We define maximum load, denoted by worki (t, α), for We can conclude that qi∗ (t) ≤ qi (t) for any 0 ≤

6
40 40
38 38
36 36
34 34
32 32
30 30
28 qi∗ (t, 2) 28
26 qi (t, 2) 26
24 24
22 22 worki (t, 1)
20 20
18 18
16 16
14 14 worki (t, 2)
12 12 worki∗ (t, 1)
10 qi∗ (t, 1) 10
8 8
6 6
4 qi (t, 1) 4 worki∗ (t, 2)
2 2
0 t 0 t
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
(a) qi∗ (t, α) and qi (t, α) (b) worki∗ (t, α) and worki (t, α)

Fig. 2: qi∗ (t, α), qi (t, α), worki∗ (t, α) and worki (t, α) for the high-utilization task τi with Di = 20 in Figure 1.
t < Di . Combining with the definition of work(t, α) If τi is a light task with∗ 1 ≤ ui < α and hence (b) is
worki (Di ,α)
(Equation (4)), we complete the proof. true, then we have Di = ui .
We classify task τi as a light or heavy task. A task is a If heavy task τi with α ≤ ui and hence (a) is true, then
light task if ui = Ci /Di < α. Otherwise, we say that τi is worki∗ (Di − Di
α , α) Ci − Di ui − 1
heavy (ui ≥ α) . The following lemmas provide an upper = =
Di − Di
Di − Dαi 1 − α1
bound on the density (the ratio of workload that has to be α
finished to the interval length) for heavy and light tasks. Therefore, Inequality (7) holds for 0 < t ≤ Di .
0
Lemma 6. For any task τi , t > 0 and 1 < α, we have
Case
j 2:k t > Di — Suppose that t is kDi + t , where k
t 0
( is Di and 0 < t ≤ Di . When ui < α, by Equation (5)
worki (t, α) worki∗ (t, α) ui (0 ≤ ui < α) and Equation (6), we have
≤ ≤ ui −1 (7)
t t 1− 1
(α ≤ ui ) worki∗ (t, α) kCi + worki∗ (t0 , α) kui Di + ui t0
α
= 0
≤ = ui
t kDi + t kDi + t0
Proof: The first inequality in Inequality (7) comes from
ui −1
Lemma 5. We now show that the second inequality also When α ≤ ui , we can derive that ui ≤ 1− α1 . By
holds for any task. Note that the right hand side is positive, Equation (6), we have
since 1 > ui > 0. There are two cases: i −1 0
worki∗ (t, α) kCi + worki∗ (t0 , α) kui Di + u1− 1 t
Case 1: 0 < t ≤ Di . = ≤ α

• If τi is a low-utilization task, where worki∗ (t, α) is t kDi + t0 kDi + t0


ui −1 ui −1 0
defined in Equation (5). For any 0 < t ≤ Di , we have k 1− 1 Di + 1− 1 t ui − 1
≤ α α

Ci Ci Ci kDi + t0 1 − α1
worki∗ (t, α) − · t ≤ α(t − Di + )− ·t
Di α Di Hence, Inequality (7) holds for any task and any t > 0.
Ci We denote τL and τH as the set of light and heavy tasks
= (t − Di )(α − )≤0
Di in a task set, respectively; kτH k as the number of heavy
where we rely on assumptions: (a) t ≤ Di ; (b) since tasks in the taskPset; and total utilization of light and heavy
Ci P
τi is a low-utilization task, D i
< 1; and (c) α > 1. tasks as UL = τL ui and UH = τH ui , respectively.

worki (t,α) Ci
Then, t ≤ Di = ui . Lemma 7. For any task set, the following inequality holds:
• If τi is a high-utilization task, where worki∗ (t, α) is  P 
α·U −UL −α·kτH k
W = ( ni=1 worki (t, α)) ≤
P
defined in Equation (6). Inequality (7) holds trivially α−1 · t (8)
when 0 < t < Di − Dαi , since the left side is 0 and  
right side is positive. For Di − Dαi ≤ t ≤ Di , we have ≤ αm−αα−1 ·t (9)

worki∗ (t, α) Ci + αt − αDi ui − α Proof: By Lemma 6, for any α > 1, it is clear that
= = α + Di ( ),
t t t W
P worki∗ (t,α) P P
τH (ui −1)
work∗ (t,α) t = τL +τH sup t>0 t ≤ τL ui + 1− 1
Therefore, i
t is maximized either (a) when t = P P P P α

Di α· τ ui − τ ui +α· τ ui −α· τ 1 α·UP −UL −α·kτH k


Di − α if ui − α ≥ 0 or (b) t = Di if ui − α < 0. = L L
α−1
H H
= α−1

7
where sup is the supremum of a set of numbers, τL and 2.6

capacity augmentation bound


τH are the sets of heavy tasks (ui ≥ α) and light tasks 2.4
(ui < α), respectively.
P P
Note that τL ui + τH ui = UP P ≤ m.
L + UH = U 2.2
Since for τi ∈ τH , ui ≥ α, UH = τH ui ≥ kτH kα, we
can derive the following upper bound: 2

UP − UL − kτH kα ≤ sup(UP − UL − kτH kα) = UP − α 1.8


τ UY / ||oH||=20.94
1.6
This is because for any task set, there are two cases: UY / ||oH||=10.47
• If kτH k = 0 and hence UP = UL , then UP − UL − 1.4 UY / ||oH||=5.236
kτH kα = 0. UY / ||oH||=2.618
• If kτH k ≥ 1, then UP − UL = UH ≤ UP and kτH kα ≥ 1.2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
α. Therefore, UP − UL − kτH kα ≤ UPP− α UY /m
Together with the definition of UP and τH 1 = kτH k,
 P   P    Fig. 3: The required speedup of G-EDF when m is
α·U −UL −α·kτH k α·U −α αm−α sufficiently large and UL = 0 (i.e. UP = UH ).
W ≤ α−1 t ≤ α−1 t ≤ α−1 t
2.6

capacity augmentation bound


which proves the Inequalities 8 and 9 of Lemma 7.
We use Lemma 7 in Sections VI and VII to derive 2.4
bounds on G-EDF and G-RM scheduling. 2.2

VI. Capacity Augmentation of Global EDF 2


1.8
In this section, we use the results from Section
√ V to
prove the capacity augmentation bound of (3 + 5)/2 for 1.6
G-EDF scheduling of parallel DAG tasks. In addition, we 1.4
m=3
also show a matching lower bound when m ≥ 3. m=6
1.2 m=12
A. Upper Bound on Capacity Augmenta- m=24
1
tion of G-EDF 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
UY /m
Our analysis builds on the analysis used to prove the
resource augmentation bounds by Bonifaci et al. [15]. We Fig. 4: The required speedup of G-EDF when kτH k = 1
first review the particular lemma from the paper that we and UL = 0 (i.e. UP = UH ).
will use to achieve our bound.
Pn heavy task kτH k, then this task set will be schedulable
Lemma 8. If ∀t > 0, (αm−m+1)·t ≥ i=1 worki (t, α), under G-EDF ron a m-core machine with speed α =
UP −kτH k−1 (U P −kτ k−1)2
the task set is schedulable by G-EDF on speed-α cores. 2+ +
4(UH −kτH k)
+
H
m m m2
2 .
Proof: This is based on a reformulation of Lemma 3 and
Definition 10 in [15] considering cores with speed α. Proof: The proof is the same as in the proof of Theorem 4,
but without using the Inequality (9). Instead, we directly
Theoremq4. The capacity augmentation bound for G-EDF α·UP −UL −α·kτH k
2
3− m + 8
5− m + m42 √ use Inequality (8). If α−1 ≤ (αm − m + 1),
3+ 5
is 2 (≈ 2 , when m is large). by Lemma 8 the schedulability test for G-EDF holds for
this task set. Solving this, we can get the required speedup
Proof: From Lemma 7 Inequality (9), we have ∀t > 0,
Pn αm−α α·m−α α for the schedulability of the task set.
i=1 work i (t, α) ≤ ( α−1 ) · t. If α−1 ≤ (αm − m +
1), by Lemma 8 the schedulability test for G-EDF holds. However, note that heavy tasks are defined as the set of
To calculate α, we solve the derived equivalent inequality all tasks τi with utilization ui ≥ α. Therefore, given a task
 α, we start
set, to accurately calculate q with the upper
 bound
mα2 − (3m − 2)α + (m − 1) ≥ 0. 2 8
on α, which is α̂ = 3 − m + 5 − m + m2 /2; then 4
q
for each iteration i, we can calculate the required speedup
 
2 8
which solves to α = 3 − m + 5− m + m42 /2.
αi by using the UHi−1 and kτH ki−1 from the (i − 1)-th
We now state a more general corollary relating the more
iteration; we iteratively classify more tasks into the set of
precise bound using more information of a task set.
heavy tasks and we stop when no more tasks can be added
Corollary 2. If a task set has total utilization UP , to this set, i.e., kτH ki−1 = kτH ki . Through these iterative
the total heavy task utilization UH and the number of steps, we can calculate an accurate speedup.

8
Figure 3 illustrates the required speedup of G-EDF 2.6

capacity augmentation bound


provided in Corollary 2 when m is sufficiently large (i.e.,
m = ∞ and 1/m = 0) and UL = 0 (i.e. UP = UH ). We 2.5
UP UP UP
vary m and kτH k . Note that kτH k = kτUHHk is the average
utilization of all heavy tasks, which should be no less than 2.4

(3 + 5)/2 ≈ 2.618. It √ can be also be seen that the bound
UP 2.3
is getting closer to (3 + 5)/2, when kτH k is larger, which
results from kτH k = 1 and m = ∞.
2.2
Figure 4 illustrates the required speedup of G-EDF
provided in Corollary 2 when kτH k ≤ 1 and UL = 0 (i.e.
2.1
UP = UH ) with varying m. Note that α > 1 is required upper bound
by the proof of Corollary 2. And kτH k = 1 can only be lower bound
UP α UP 2
true, if m ≥ m (i.e. m ≥ 1/3 for m = 3). 10 20 30 40 50 60 70 80 90 100
m
B. Lower Bound on Capacity Augmenta- Fig. 5: The upper bound of G-EDF provided in Theorem
tion of G-EDF 4 and the lower bound in Theorem 5 with respect to the
capacity augmentation bound.
As mentioned above, Li et al.’s lower bound [35]
demonstrates the tightness of the above bound for large will execute the sequential execution of task τ1 and the
m. We now provide a lower bound for the capacity sub-jobs of task τ1 first, and then execute τ2 .
augmentation bound for small m. The finishing time of τ1 at speed α is not earlier than
m−2
Consider a task set τ with two tasks, τ1 and τ2 . Task 1−   1− m−2
α + mα = α + mα . Hence, the finishing time of
τ1 starts with sequential execution for 1 −  time and then 1− m−2
task τ2 is not earlier than α + mα + α .
1
1− α
forks m−2  + 1 subtasks with execution time . Here, we 1− 1
assume  is an arbitrarily small positive number and hence If 1− m−2 
α + mα + α > 1 + α , then task τ2 misses its
α

1
it is safe to assume that m is a positive integer. Therefore, deadline. By Lemma 9, we reach the conclusion.
the total work of task τ1 is C1 = m−1 and its critical-path Figure 5 illustrates the upper bound of G-EDF provided
length is Li = 1. The minimum inter-arrival time of τ1 is in Theorem 4 and the lower bound in Theorem 5 with
1. respect to the capacity augmentation bound. It can be easily
Task τ2 is simply a sequential task with work (execution seen that the upper and lower bounds are getting closer
time) of 1 − α1 and minimum inter-arrival time also 1 − when m is larger. When m is 100, the gap between the
1 upper and the lower bounds is roughly about 0.00452.
α , where α > 1 will be defined later. Clearly, the total
utilization is m and the critical-path length of each task is It is important to note that the more precise speedup in
at most the relative deadline (minimum inter-arrival time). Corollary 2 is tight even for small m. This is because
2
q in the above example task set, UP = m, total high
3− m −δ+ 5− 12 4
m + m2
Lemma 9. When α < and δ = 2 + task utilization UH = m − 1 and number of heavy task
2 1
1− α kτH k = 1, then according to the Corollary, the capacity
g() and m ≥ 3, then 1−2
α +
m−2
mα >1− α holds.
augmentation bound for this task set under G-EDF is
1
1− α
Proof: By solving 1−2 m−2
α + mα = 1 − , we know that
r
UP −kτH k−1 (U P −kτ k−1)2
α 2+ +
4(UH −kτH k)
+
H
the equality holds when m m m2
2
q q q
2
−2+ 5− 12 4 4(m−1−1) (m−1−1)2 2
+ 5− 12 4
3− m m + m2 −g() 2+ m−1−1
m + m + m2
3− m m + m2
α< 2
= 2 = 2
1
where g() is a positive function of ,
which approaches which is exactly the lower bound in Theorem 5 for  →+ 0.
to 0 when  approaches 0. Now, by setting δ to 2 + g(),
we reach the conclusion. VII. G-RM Scheduling
Theorem 5. The q
capacity augmentation bound for G-EDF This section proves that√G-RM provides a capacity
2
3− m + 5− 12 4
m + m2 + augmentation bound of 2 + 3 for large m. The structure
is at least 2 , when  → 0.
of the proof is very similar to the analysis in Section VI.
Proof: Consider the system with two tasks τ1 and τ2
Again, we use a lemma from [15], restated below.
defined in the beginning of Section VI-B. Suppose that
the arrival of task τ1 is at time 0, and the arrival of task Lemma 10. If ∀t > 0, 0.5(α · m − m + 1)t ≥
τ2 is at time α1 + α . By definition, the first jobs of τ1 and
P
i worki (t, α) the task set is schedulable by G-RM on
τ2 have absolute deadlines at 1 and 1 + α . Hence, G-EDF speed-α cores.

9
heavy task kτH k, then this task set will be schedulable
capacity augmentation bound

3.5 under G-RM onr a m-core machine with speed α =


2UP −2kτH k−1 (2U P −2kτ k−1)2
8(UH −kτH k) H
2+ m + m + m2
2 .
3
Proof: The proof is the same as in the proof of Corollary 2,
except that instead of using Lemma 8, it uses Lemma 10.
2.5

UY / ||oH||=29.86 Figure 6 illustrates the required speedup of G-RM


2 UY / ||oH||=14.93 provided in Corollary 3 when m is sufficiently large (i.e.,
UY / ||oH||=7.464 m = ∞ and 1/m = 0) and UL = 0 (i.e. UP = UH ). We
UP UP UP
1.5 UY / ||oH||=3.732 vary m and kτH k . Again, kτH k is the average utilization
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 of all heavy tasks. It√can be also Pbe seen that the bound is
UY /m U
getting closer to 2+ 3, when kτH k is larger, which results
Fig. 6: The required speedup of G-RM when m is suffi- from kτH k = 1 and m = ∞.
ciently large and UL = 0 (i.e. UP = UH ).
VIII. Practical Considerations
Proof: This is based on a reformulation of Lemma 6 and As have shown in the previous sections, the capacity
Definition 10 in [15]. Note that the analysis in [15] is for augmentation bound for federated scheduling, G-EDF and
deadline-monotonic scheduling, by giving a sub-job of a G-RM are 2, 2.618 and 3.732, respectively. In this section,
task higher priority if its relative deadline is shorter. As we we consider their run-time efficiency and efficacy from
consider tasks with implicit deadlines, deadline-monotonic a practical perspective. We consider four dimensions —
scheduling is the same as rate-monotonic scheduling. static vs. dynamic priorities, and global vs. partitioned
Proofs like those in Section VI-A give us the bounds scheduling, overheads due to scheduling and synchroniza-
below for G-RM scheduling. tion overheads, and work-conserving vs. not.
Practitioners have generally found it easier to imple-
Theoremq6. The capacity augmentation bound for G-RM ment fixed (task) priority schedulers (such as RM) than
3
4− m + 12− 20 9
m + m2

is (≈ 2 + 3, when m is large). dynamic priority schedulers (such as EDF). Fixed priority
2
schedulers have always been well-supported in almost all
Proof: (Similar to the proof of Theorem
Pn 4:) First, we real-time operating systems. Recently, there has been ef-
know from Lemma 7 that ∀t > 0, i=1 worki (t, α) ≤ forts on efficient implementations of job-level dynamic pri-
( αm−α
α−1 )t; Second, if
α·m−α
α−1 ≤ 0.5(αm − m + 1). we ority (EDF and G-EDF) schedulers for sequential tasks [16,
can also conclude that the schedulability test for G-RM 34]. Federated scheduling does not require any priority
in Lemma 10 holds.
q By solving the inequality above, we assignment for high-utilization tasks (since they own their
3
4− m + 12− 20
m+
9

have α ≥ m
, and prove Theorem 6.
2 cores exclusively) and can use either fixed or dynamic
2
The result in Theorem 6 is the best known result for priority for low-utilization tasks. Thus, it is relatively easier
the capacity augmentation bound for global fixed-priority to implement G-RM and federated scheduling.
scheduling for general DAG tasks with arbitrary structures. For sequential tasks, in general, global scheduling may
√ incur more overhead due to thread migration and the
Interestingly, Kim et al. [31] get the same bound of 2+ 3
for global fixed-priority scheduling of parallel synchronous associated cache penalty, the extent of which depends on
tasks (a subset of DAG tasks). the cache architecture and the task sets. In particular, for
parallel tasks, the overheads for global scheduling could
The strategy used in [31] is quite different. In their
be worse. For sequential tasks, preemptions and migrations
algorithm, the tasks undergo a stretch transformation which
only occur when a new job with higher priority is released.
generates a set of sequential subtask (each with its release
In contrast, for parallel tasks, a preemption and possibly
time and deadline) for each parallel task in the original
migration could occur whenever a node in the DAG of
task set. These subtasks are then scheduled using a G-DM
a job with higher priority is enabled. Since nodes in a
scheduling algorithm [11]. Note that even though the paral-
DAG often represent a fine-grained units of computation,
lel tasks in the original task set have implicit deadlines, the
the number of nodes in the task set can be larger than the
transformed sequential tasks have only constrained dead-
number of tasks. Hence, we can expect a larger number of
lines — hence the need for deadline monotonic scheduling
such events. Since federated scheduling is a generalization
instead of rate monotonic scheduling.
of partitioned scheduling to parallel tasks, it has advantages
Corollary 3. If a task set has total utilization UP , similar to partitioning. In fact, if we use a partitioned
the total high task utilization UH and the number of RM or partitioned EDF strategy for low-utilization tasks,

10
there are only preemptions but no migration for low- in order to expose parallelism within the programs. Using
utilization tasks. Meanwhile, federated scheduling only these constructs generates tasks whose structure can be
allocates the minimum number of dedicated cores to ensure represented with different types of DAGs.
the schedulability of each high-utilization task, so there is Tasks with parallel synchronous tasks have been studied
no preemptions for high-utilization tasks and the number of more than others in the real-time community. These tasks
migrations is minimized. Hence, we expect that federated are generated if we use only parallel-for loops to generate
scheduling will have less overhead than global schedulers. parallelism. Lakshmanan et al. [32] proved a (capacity)
In addition, parallel runtime systems have additional augmentation bound of 3.42 for a restricted synchronous
parallel overheads, such as synchronization and scheduling task model which is generated when we restrict each
overheads. These overheads (per task) usually are approx- parallel-for loop in a task to have the same number of
imately linear in the number of cores allocated to each iterations. General synchronous tasks (with no restriction
task. Under federated scheduling, a minimum number of on the number of iterations in the parallel-for loops), have
cores is assigned. However, depending on the particular also been studied [4, 31, 40, 45]. (More details on these
implementation, global scheduling may execute a task on results were presented in Section I) Chwa et al. [20]
all the cores in the system and may have higher overheads. provide a response time analysis.
Finally, note that federated scheduling is not a greedy If we do not restrict the primitives used to parallel-for
(work conserving) strategy for the entire task set, although loops, we get a more general task model — most easily
it uses a greedy schedule for each individual task. In many represented by a general directed acyclic graph. A resource
1
real systems, the worst-case execution times are normally augmentation bound of 2 − m for G-EDF was proved for
over-estimated. Under federated scheduling cores allocated a single DAG with arbitrary deadlines [8] and for multiple
2
to tasks with overestimated execution times may idle due DAGs [15, 35]. A capacity augmentation bound of 4 − m
to resource over-provisioning. In contrast, work-conserving was proved in [35] for tasks with for implicit deadlines.
strategies (such as G-EDF and G-RM) and can utilize Liu et al. [36] provide a response time analysis for G-EDF.
available cores through thread migration dynamically. There has been significant work on scheduling non-real-
time parallel systems [5, 6, 24–26, 42]. In this context, the
IX. Related Work goal is generally to maximize throughput. Various provably
good scheduling strategies, such as list scheduling [18, 27]
In this section, we review closely related work on real- and work-stealing [12] have been designed. In addition,
time scheduling, concentrating primarily on parallel tasks. many parallel languages and runtime systems have been
Real-time multiprocessor scheduling considers schedu- built based on these results. While multiple tasks on a
ling sequential tasks on computers with multiple proces- single platform have been considered in the context of
sors or cores and has been studied extensively (see [10, 23] fairness in resource allocation [1], none of this work
for a survey). In addition, platforms such as LitmusRT [17, considers real-time constraints.
19] have been designed to support these task sets. Here, we
review a few relevant theoretical results. Researchers have X. Conclusions
proven resource augmentation bounds, utilization bounds
and capacity augmentation bounds. The best known re- In this paper, we consider parallel tasks in the DAG
source bound for G-EDF for sequential tasks on a mul- model and prove that for parallel tasks with implicit
tiprocessor is 2 [7]; a capacity augmentation bound of deadlines the capacity augmentation bounds of federated
1
2− m +  for small  [14]. Partitioned EDF and ver- scheduling, G-EDF and G-RM are 2, 2.618 and 3.732
sions partitioned static priority schedulers also provide a respectively. In addition, the bound 2 for federated sche-
utilization bound of 2 [3, 37]. G-RM provides a capacity duling and the bound of 2.618 for the G-EDF are both
augmentation bound of 3 [2] to implicit deadline tasks. tight for large m, since there exist matching lower bounds.
For parallel real-time tasks, most early work considered Moreover, the three bounds are the best known bounds for
intra-task parallelism of limited task models such as mal- these schedulers for DAG tasks.
leable tasks [22, 30, 33] and moldable tasks [39]. Kato There are several directions of future work. The G-RM
et al. [30] studied the Gang EDF scheduling of moldable capacity augmentation bound is not known to be tight. The
parallel task systems. current lower bound of G-RM is 2.668, inherited from the
Researchers have since considered more realistic task sequential sporadic real-time tasks without DAG structures
models that represent programs generated by commonly [38]. Therefore, it is worth investigating a matching lower
used parallel programming languages such as Cilk fam- bound or lowering the upper bound. In addition, since
1
ily [13, 21], OpenMP [41], and Intel’s Thread Building the lower bound of any scheduler is 2 − m , it would
Blocks [43]. These languages and libraries support primi- be interesting to investigate if it is possible to design
tives such as parallel for-loops and fork/join or spawn/sync schedulers that reach this bound. Finally, all the known

11
capacity augmentation bound results are restricted to im- [20] H. S. Chwa, J. Lee, K.-M. Phan, A. Easwaran, and I. Shin. “Global
plicit deadline tasks; we would like to generalize them to EDF Schedulability Analysis for Synchronous Parallel Tasks on
Multicore Platforms”. In: ECRTS. 2013.
constrained and arbitrary deadline tasks. [21] CilkPlus. http://software.intel.com/en-us/articles/intel-cilk-plus.
[22] S. Collette, L. Cucu, and J. Goossens. “Integrating job parallelism
Acknowledgment in real-time scheduling theory”. In: Information Processing Letters
106.5 (2008), pp. 180–187.
This research was supported in part by the priority [23] R. I. Davis and A. Burns. “A survey of hard real-time scheduling
for multiprocessor systems”. In: ACM Computing Surveys 43
program ”Dependable Embedded Systems” (SPP 1500 - (2011), 35:1–44.
spp1500.itec.kit.edu), by DFG, as part of the Collaborative [24] X. Deng, N. Gu, T. Brecht, and K. Lu. “Preemptive Scheduling
of Parallel Jobs on Multiprocessors”. In: SODA. 1996.
Research Center SFB876 (http://sfb876.tu-dortmund.de/), [25] M. Drozdowski. “Real-time scheduling of linear speedup parallel
by NSF grants CCF-1136073 (CPS) and CCF-1337218 tasks”. In: Inf. Process. Lett. 57 (1996), pp. 35–40.
(XPS). The authors thank anonymous reviewers for their [26] J. Edmonds, D. D. Chinn, T. Brecht, and X. Deng. “Non-
clairvoyant Multiprocessor Scheduling of Jobs with Changing
suggestions on improving this paper. Execution Characteristics”. In: Journal of Scheduling 6.3 (2003),
pp. 231–250.
References [27] R. L. Graham. “Bounds on Multiprocessing Anomalies”. In: SIAM
Journal on Applied Mathematics (1969), 17(2):416–429.
[1] K. Agrawal, C. E. Leiserson, Y. He, and W. J. Hsu. “Adaptive [28] H.-M. Huang, T. Tidwell, C. Gill, C. Lu, X. Gao, and S. Dyke.
work-stealing with parallelism feedback”. In: ACM Trans. Com- “Cyber-physical systems for real-time hybrid structural testing:
put. Syst. 26 (2008), pp. 112–120. a case study”. In: International Conference on Cyber Physical
Systems. 2010.
[2] B. Andersson, S. Baruah, and J. Jonsson. “Static-priority schedu-
ling on multiprocessors”. In: RTSS. 2001. [29] A. Ka and L. Mok. Fundamental design problems of distributed
systems for the hard-real-time environment. Tech. rep. 3. 1983.
[3] B. Andersson and J. Jonsson. “The utilization bounds of parti-
tioned and pfair static-priority scheduling on multiprocessors are [30] S. Kato and Y. Ishikawa. “Gang EDF Scheduling of Parallel Task
50%”. In: ECRTS. 2003. Systems”. In: RTSS. 2009.
[4] B. Andersson and D. de Niz. “Analyzing Global-EDF for Mul- [31] J. Kim, H. Kim, K. Lakshmanan, and R. Rajkumar. “Parallel
tiprocessor Scheduling of Parallel Tasks”. In: Principles of Dis- scheduling for cyber-physical systems: analysis and case study
tributed Systems. 2012, pp. 16–30. on a self-driving car”. In: ICCPS. 2013.
[5] N. S. Arora, R. D. Blumofe, and C. G. Plaxton. “Thread Schedu- [32] K. Lakshmanan, S. Kato, and R. R. Rajkumar. “Scheduling Paral-
ling for Multiprogrammed Multiprocessors”. In: SPAA. 1998. lel Real-Time Tasks on Multi-core Processors”. In: Proceedings of
the 2010 31st IEEE Real-Time Systems Symposium. RTSS. 2010.
[6] N. Bansal, K. Dhamdhere, J. Konemann, and A. Sinha. “Non-
clairvoyant Scheduling for Minimizing Mean Slowdown”. In: [33] W. Y. Lee and H. Lee. “Optimal Scheduling for Real-Time Parallel
Algorithmica 40.4 (2004), pp. 305–318. Tasks”. In: IEICE Transactions on Information Systems E89-D.6
(2006), pp. 1962–1966.
[7] S. Baruah, V. Bonifaci, A. Marchetti-Spaccamela, and S. Stiller.
“Improved Multiprocessor Global Schedulability Analysis”. In: [34] J. Lelli, G. Lipari, D. Faggioli, and T. Cucinotta. “An efficient and
Real-Time Syst. 46.1 (2010), pp. 3–24. scalable implementation of global EDF in Linux”. In: OSPERT.
2011.
[8] S. Baruah, V. Bonifaci, A. Marchetti-Spaccamela, L. Stougie, and
A. Wiese. “A generalized parallel task model for recurrent real- [35] J. Li, K. Agrawal, C.Lu, and C. Gill. “Analysis of Global EDF
time processes”. In: RTSS. 2012. for Parallel Tasks”. In: ECRTS. 2013.
[9] S. K. Baruah, A. K. Mok, and L. E. Rosier. “Preemptively [36] C. Liu and J. Anderson. “Supporting Soft Real-Time Parallel
Scheduling Hard-Real-Time Sporadic Tasks on One Processor”. Applications on Multicore Processors”. In: RTCSA. 2012.
In: RTSS. 1990. [37] J. M. López, J. L. Dı́az, and D. F. Garcı́a. “Utilization Bounds
[10] M. Bertogna and S. Baruah. “Tests for global EDF schedulability for EDF Scheduling on Real-Time Multiprocessor Systems”. In:
analysis”. In: Journal of System Architecture 57.5 (2011). Real-Time Systems 28.1 (2004), pp. 39–68.
[11] M. Bertogna, M. Cirinei, and G. Lipari. “New Schedulability [38] L. Lundberg. “Analyzing Fixed-Priority Global Multiprocessor
Tests for Real-time Task Sets Scheduled by Deadline Monotonic Scheduling”. In: IEEE Real Time Technology and Applications
on Multiprocessors”. In: Proceedings of the 9th International Symposium. 2002, pp. 145–153.
Conference on Principles of Distributed Systems. 2006. [39] G. Manimaran, C. S. R. Murthy, and K. Ramamritham. “A
[12] R. D. Blumofe and C. E. Leiserson. “Scheduling multithreaded New Approach for Scheduling of Parallelizable Tasks inReal-
computations by work stealing”. In: Journal of the ACM 46.5 Time Multiprocessor Systems”. In: Real-Time Syst. 15 (1998),
(1999), pp. 720–748. pp. 39–60.
[13] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, [40] G. Nelissen, V. Berten, J. Goossens, and D. Milojevic. “Tech-
K. H. Randall, and Y. Zhou. “Cilk: An Efficient Multithreaded niques optimizing the number of processors to schedule multi-
Runtime System”. In: PPoPP. 1995, pp. 207–216. threaded tasks”. In: ECRTS. 2012.
[14] V. Bonifaci, A. Marchetti-Spaccamela, and S. Stiller. “A constant- [41] OpenMP Application Program Interface v3.1. http://www.openm
approximate feasibility test for multiprocessor real-time schedu- p.org/mp-documents/OpenMP3.1.pdf. 2011.
ling”. In: Algorithmica 62.3-4 (2012), pp. 1034–1049. [42] C. D. Polychronopoulos and D. J. Kuck. “Guided Self-Scheduling:
[15] V. Bonifaci, A. Marchetti-Spaccamela, S. Stiller, and A. Wiese. A Practical Scheduling Scheme for Parallel Supercomputers”. In:
“Feasibility Analysis in the Sporadic DAG Task Model”. In: Computers, IEEE Transactions on C-36.12 (1987).
ECRTS. 2013. [43] J. Reinders. Intel threading building blocks: outfitting C++ for
[16] B. B. Brandenburg and J. H. Anderson. “On the Implementation multi-core processor parallelism. O’Reilly Media, 2010.
of Global Real-Time Schedulers”. In: RTSS. 2009. [44] A. Saifullah, D. Ferry, J. Li, K. Agrawal, C. Lu, and C. Gill.
[17] B. B. Brandenburg, A. D. Block, J. M. Calandrino, U. Devi, H. “Parallel real-time scheduling of DAGs”. In: IEEE Transactions
Leontyev, and J. H. Anderson. LITMUS RT: A Status Report. 2007. on Parallel and Distributed Systems (2014).
[18] R. P. Brent. “The Parallel Evaluation of General Arithmetic [45] A. Saifullah, J. Li, K. Agrawal, C. Lu, and C. Gill. “Multi-core
Expressions”. In: Journal of the ACM (1974), pp. 201–206. real-time scheduling for generalized parallel task models”. In:
Real-Time Systems 49.4 (2013), pp. 404–435.
[19] J. M. Calandrino, H. Leontyev, A. Block, U. C. Devi, and J. H.
Anderson. “LITMUSRT : A Testbed for Empirically Comparing [46] R. Sedgewick and K. D. Wayne. Algorithms. 4th. 2011.
Real-Time Multiprocessor Schedulers”. In: RTSS. 2006.

12

You might also like