Scheduling Algorithms: (Or The Process of Choosing The Best Job To Run)
Scheduling Algorithms: (Or The Process of Choosing The Best Job To Run)
Scheduling Algorithms
Introduction By default Workload managers, their schedulers, and algorithms Future directions
Introduction
Efficient scheduling across nodes is necessary to maximize application performance regardless of the efficiency of your parallel algorithms. Dynamic scheduling in a heterogeneous environment is significantly more complicated. Programming parallel applications is difficult, and not worth the effort unless large performance gains can be realized.
Scheduling
Scheduling is a key part of the workload management software which usually perform some or all of:
Cont
The difficult part of scheduling is to balance policy enforcement with resource optimization in order to pick the best job to run. Essentially one can think of the scheduler performing the following loop:
Select the best job to run, according to policy and available resources. Start the job. Stop the job and/or clean up after completion. repeat.
Cont
Scheduling can be described as either
Application level
Resource level
UNIX is fundamentally a time-share (TS) operating system [1978] TS scheduler provides illusion to each user that they are the only one accessing system resources Employs a round-robin scheduling algorithm
Round-robin
Each process is placed in a run-queue Allocated a service quantum of time (commonly set to 10 milliseconds) Processes that demand less time run without being interrupted Processes exceeding the service quantum are interrupted and returned to the back of the run-queue to await further processing
Round-robin Cont
Fair-share
Implemented on UNIX systems circa 1988 In context of computer resources, fair is meant to imply equity not equality in resource consumption
Fair-share Cont
The FS scheduler has two levels of scheduling: process and user Process level scheduling same as in standard UNIX (priority and nice values act as bias to scheduler as it repositions processes in the run-queue) User level scheduler relationship can be seen in the following (simplified) pseudo-code
user*/
Fair-share Cont
Whereas process-level scheduling still occurs 100 times a second, user-level scheduling adjustments (usage parameter) occur once every 4 seconds Also, once a second, process-level priority adjustments that were made in the previous second begin to be forgotten. This is to avoid starving a process
Commercial Products
O/S
AIX
Manager
WLM
Capping
Yes
Parameter
Targets/limits
Compaq
HP-UX IRIX Solaris
Tru64
PRM SHARE II SRM
NO
YES NO NO
Shares
Entitlements Shares Shares
Condor
Full featured batch system Used to manage clusters of beowulf nodes Based on Network Queuing System (NQS) 1986 NASA first on Cray and then ported to other architectures IEEE POSIX standard (1003.2d)
Maui
Developed to manage machines at the Maui HPCC External scheduler uses native schedulers such as PBS and LoadLeveler Included in cluster building toolkits such as Rocks and OSCAR Genetic algorithms
GAs
Condor
Meta-scheduler for condor jobs Submits jobs in an order represented by a DAG and processes the results
Condor Cont
An example input file Job A A.condor Job B B.condor Job C C.condor Job D D.condor Parent A child B C Parent B C child D #Nodes B and C are run in parallel
A
B D C
Condor Cont
Condor implements priority scheduling Priority preemption can also be used Priority calculations are based on ratios instead of absolutes
Starvation is prevented by a FS algorithm which attempts to give all users the same amount of machine allocation time over a specified interval (configurable)
PBS
Includes several built-in schedulers Default is FIFO but not strict (based on queue priority)
Decaying FS algorithm Round-robin for jobs and queues Queue priority * Job type or job ordering Strict FIFO Load balancing jobs between hosts Dedicated times/nodes
Maui
Backfill Priority FS
Two passes
Pass 1 jobs that meet soft policies Pass 2 expands list from pass 1 to include hard fairness policies
Maui applies the backfill algorithm specified by the BACKFILLPOLICY parameter, be it firstfit, bestfit, or balfit
Backfill Algorithm
Default is trivial FIFO but is weighted and combined based on service, requested resources, fair-share, direct priority, target, and bypass
Priority = serviceweight * servicefactor + resourceweight * resourcefactor + fairshareweight * fairsharefactor + directspecweight * directspecfactor + targetweight * targetfactor + bypassweight * bypassfactor
Each *weight value is a configurable parameter and each *factor is calculated from subcomponents listed above (I.e. user, group, priority, QoS etc.)
Composed of several parts which handle historical information, fairshare windows, usage, and impact All are site configurable parameters Purpose of a fairshare algorithm is to steer existing workload
GAs Cont
GAs mimic the evolutionary process. Introduced by J.H Holland 1975. In nature, organisms are always competing for scant resources such as food and space.
GAs Cont
into a form that can be processed by the GA. Fitness function - each generated string has a fitness value (or weight) assigned which determines the survival rate of the string.
GAs Cont
Operators.
copied according to their fitness values. Higher fitness values are more likely to be selected. Crossover all strings in the population are randomly paired up and may or may not be subjected to a crossover depending on a randomly generated outcome. Mutation an occasional alteration to the strings in the population and is used to safeguard against premature convergence.
Future Directions
Scheduling holds the key to performance in grid environments; however, high-performance application scheduling on computation grids represent a world in which much progress needs to be made. Part of the problem lies in the inherent difficulty of the scheduling problem, whose optimal solution is considered infeasible even in the simplest environments.
AppLeS
Prophet
Determines schedule with the minimal predicted execution time Virtual Distributed Computing Environment
VDCE
SEA
Projects continued
I-SOFT
Centralized scheduler maintains user queues and static capacities (FIFO) Offline genetic algorithm (GA) mappings indexed by dynamic parameters used to determine mapping for current iteration Determination of performance model for candidate schedules with minimal execution time Globally controlled or locally controlled load balancing
IOS
SPP(X)
DOME
Conclusion
Adaptive Dynamic
Links