Simplified Design Flow: (A Picture From Ingo Sander)
Simplified Design Flow: (A Picture From Ingo Sander)
Hardware architecture
So far, we have talked about only single processor systems
Concurrency implemented by scheduling
Distributed Systems
Loosely connected processors by a low-speed network e.g. CAN, Ethernet, Token
Ring etc. --
Complete system
LAN
(d)
Task Assignment
In the design flow:
First, the application is partitioned into tasks or task graphs.
At some stage, the execution times, communication costs, data
and control dependencies of all the tasks become known.
P2
P1
Task Assignment
The task models used in task assignment can vary in complexity
depending what considered/ignored:
Communication costs
Data and control dependencies
Resource requirements e.g. WCET, memory etc
Todays plan
Why multiprocessor?
OS etc
Task Assigment
Multiprocessor scheduling
(semi-)partitioned scheduling
global scheduling
Hardware: Trends
Multicore:
Requires
Parallel
Applications
Performance
[log]
1000
Single Core
100
10
Now
Year
Keep the
4 cores in notebooks
12 cores in servers
AMD 12-core Magny-Cours will consume less energy
than previous generations with 6 cores
Embedded Systems
4 cores in ARM11 MPCore embedded processors
What next?
Manycores (>100s of cores) predicted to be here
in a few years e.g. Ericsson
CPU
CPU
CPU
CPU
L1
L1
L1
L1
Bandwidth
Off-chip memory
L2 Cache
L1
L1
L1
L1
CPU
CPU
CPU
CPU
15
15
Multiprocessor scheduling
"Given a set J of jobs where job ji has length li and a number of processors mi,
what is the minimum possible time required to schedule all jobs in J on m
processors such that none overlap?"
Wikipedia
That is, design a schedule such that the response time of the
last tasks is minimized
(Alternatively, given M processors and N tasks,
find a mapping from tasks to processors such that
all the tasks are schedulable)
The problem is NP-complete
It is also known as the load balancing problem
Multiprocessor scheduling
static and dynamic task assignment
Partitioned scheduling
Static task assignment
Each task may only execute on a fixed processor
No task migration
Semi-partitioned scheduling
Static task assignment
Each instance (or part of it) of a task is assigned to a fixed processor
task instance or part of it may migrate
Global scheduling
Dynamic task assignment
Any instance of any task may execute on any processor
Task migration
Multiprocessor Scheduling
Global Scheduling
Partitioned Scheduling
Partitioned Scheduling
with Task Splitting
new task
4
waiting queue
22
22
23
1
cpu 1
cpu 2
6
cpu 3
cpu 1
cpu 2
cpu 3
cpu 1
cpu 2
cpu 3
Bin-packing/NP-hard problems
Multiple resources e.g. caches, bandwidth
Underlying causes
The root of all evil in global scheduling: (Liu, 1969):
The simple fact that a task can use only one processor even
when several processors are free at the same time adds a
surprising amount of difficulty to the scheduling of multiple
processors.
Dhalls effect: with RM, DM and EDF, some lowutilization task sets can be un-schedulable regardless
of how many processors are used.
Hard-to-find critical instant: a critical instant does not
always occur when a task arrives at the same time as
all its higher-priority tasks.
critical section
P1
P2
3
0
P1
4
4
P2
5
8
10
12
14
16
18
20
22
20
22
2
3
blocking
4
4
10
12
5
14
16
18
From Uppsala
RTAS 2010
70
60
69.3
50
40
65
30
20
38
10
[OPODIS08]
Fixed
Priority
50
50
50
[TPDS05]
[ECRTS03]
[RTSS04]
Dynamic
Priority
Fixed
Priority
Dynamic
Priority
66
[RTCSA06]
Fixed
Priority
Dynamic
Priority
Task Splitting
Global
Partitioned
Multiprocessor Scheduling
Global Scheduling
Global Scheduling
new task
4
waiting queue
2
1
cpu 1
cpu 2
6
cpu 3
Global scheduling
All ready tasks are kept in a global queue
When selected for execution, a task can be
dispatched to any processor, even after being
preempted
Disadvantages:
Few results from single-processor scheduling can be used
No optimal algorithms known except idealized assumption (Pfair sch)
Poor resource utilization for hard timing constraints
No more than 50% resource utilization can be guaranteed for hard RT tasks
Partition-Based Scheduling
Partitioned Scheduling
cpu 1
cpu 2
cpu 3
Partitioned scheduling
Two steps:
Determine a mapping of tasks to processors
Perform run-time scheduling
Bin-packing algorithms
The problem concerns packing objects of varying
sizes in boxes (bins) with the objective of
minimizing number of used boxes.
Solutions (Heuristics): Next Fit and First Fit
Rate-Monotonic-First-Fit (RMFF):
[Dhall and Liu, 1978]
First, sort the tasks in the order of increasing periods.
Task Assignment
All tasks are assigned in the First Fit manner starting from
the task with highest priority
A task can be assigned to a processor if all the tasks
assigned to the processor are RM-schedulable i.e.
the total utilization of tasks assigned on that processor is bounded
by n(21/n-1) where n is the number of tasks assigned.
(One may also use the Precise test to get a better assignment!)
Partitioned scheduling
Advantages:
Most techniques for single-processor scheduling
are also applicable here
Disadvantages:
Cannot exploit/share all unused processor time
May have very low utilization, bounded by 50%
22
22
23
cpu 1
cpu 2
cpu 3