IEEE PDGC OnlineVersion
IEEE PDGC OnlineVersion
net/publication/261126689
CITATIONS READS
5 158
2 authors:
All content following this page was uploaded by Taj Alam on 17 July 2016.
Abstract- To meet the objective of mlnImlzmg the job that no resources are underutilized and that the turnaround
execution time, parallel computing has to deal with a lot of time is minimized. An effective load scheduling is very
issues which crop up while working with parallel code. These important for a system to load balance a system effectively
issues can result in bottleneck and restrict the behavior of
or achieve the target quality of service while addressing
parallel program in attaining an aforesaid speedup as
issues e.g. synchronization, communication overhead, data
suggested by Amdahl Gene. The most problematic issue that
locality and scalability. Scheduling of jobs should be done
crops up is the distribution of workload in both the categories
of parallel system viz. homogeneous and heterogeneous in such a way that each computing node has its proper share
systems. This situation demands an effective load balancing of work so that eventually the job turnaround time can be
strategy to be in place in order to ensure a uniform distribution minimized. Load balancing can be treated as a subset of
of load across the board. The scheduling 'm' jobs to 'n' scheduling where such process is adopted. Load balancing is
resources with the objective to optimize the QoS parameters a methodology to distribute workload across multiple
while balancing the load has been proven to be NP-hard computers, network links, central processing units, disk
problem. Therefore, a heuristic approach can be used to design
drives, or other resources, to achieve optimal resource
an effective load balancing strategy. In this paper, a centralized
utilization, maximize throughput, minimize response time,
dynamic load balancing strategy using adaptive thresholds has
and avoid overload [3]. Load balancing results in an
been proposed for a parallel system consisting of
multiprocessors. The scheduler continuously monitors the load allocation of the system recourses to individual jobs for
on the system and takes corrective measures as the load certain time periods while optimizing the given objective
changes. The threshold values considered are adaptive in function(s). In order to achieve the above goal load
nature and are readjusted to suite the changing load on the balancing strategy must exhibit the following features [3]:
system. Therefore, the scheduler always ensures a uniform
distribution of the load on the processing elements with (i) Must create little traffic overhead
dynamic load environment. (ii) Low overhead for running the load balancing
algorithm
Keywords- Parallel and distributed systems, load balancing,
(iii) Must be fair so that heavily loaded node is
threshold, turnaround time, central scheduling.
balanced first with lightly loaded node
(iv) Load balancing should utilize minimum CPU time
I. INTRODUCTION
The load balancing strategies can be broadly classified
Parallelism in the computing systems can be viewed at
into centralized/ decentralized, static / dynamic, periodic /
two levels viz. hardware and software level. At hardware
non periodic and with threshold / without threshold [13-19].
level it can be realized in the form of multiplicity of the
A load balancing policy can be either one or a combination
processing elements and/or functional units whereas for the
of the above. Since, load on the system is bound to change
software level it can be seen as the multiple modules of the
with time; an adaptive load balancing policy is the best to
job demanding execution that can be run in parallel. The
work with as it addresses the problem of changing load.
efficiency of a parallel system is governed by the degree of
This can be ensured by defining thresholds for the workload
matching between the hardware and the underlying software
on the system. The system reacts by redistributing the load
parallelism. More is this match better is the efficiency [1-2,
as soon as these threshold values get crossed thereby
8- 10].
ensuring a uniform distribution of the workload all the time.
The primary goal of the parallel systems is to minimize Further, an adaptive threshold based load balancing policy
the job execution time and hence the turnaround time. This even ensures that the computational resources are utilized in
can be ensured by exploiting the inherent parallelism in the the most optimum way.
job by distributing the entire workload on the available
computational resources, thus allowing various modules of II. RELATED WORK
the job to run simultaneously [11-12]. Scheduling is the
The issue of load balancing has gained attention of
method by which threads, processes or data flows are given
many researchers. Therefore, a number of load balancing
access to system resources e.g. processor time,
strategies using various approaches have been reported in
communications bandwidth [I, 3-5, 9]. To effectively utilize
the literature. A dynamic load balancing mechanism for
the resources, the job should be scheduled in such a way
distributed system is proposed in [13] with adaptive Since the model follows centralized job scheduling
threshold where central node is used for maintaining load approach, of the available nodes, one node is taken as
state information and decision for balancing is taken at local central scheduler on which job has got submitted and is
nodes. Six load balancing strategies are studied in [14] with eventually used for dispatching the independent job modules
application on four problems. These schemes include to the other nodes (processing elements). Each processing
random, round robin, central load manager, threshold, element has a local queue where the allotted jobs are queued
central queue and local queue. In [15] various strategies for up and are taken up for execution one by one in the order of
dynamic load balancing are explored which include sender their arrival. The scheduler under consideration has been
initiated diffusion, receiver initiated diffusion, hierarchical shown in Fig. 1.
balancing method, gradient model, domain exchange
The model uses the centralized approach for load
method. A simple load balancing strategy for task allocation
balancing. Thus, out of the nodes selected for job execution,
in parallel machine has been proposed in [16] where load
one node is used as central scheduler which serves the two
balancing is decentralized and execution of load balancing is
objectives viz. dispatching the jobs to the remaining nodes
decided among processors using the local queue length of
and making load balancing decisions depending on the
individual processor. The processor with minimum queue
system state. The remaining nodes simply act as processing
length is given task of executing the load balancing. A
elements for jobs execution. Each processing element has a
comparison of three approaches of guided self scheduling,
local queue where jobs can be queued.
irregular parallel programs and lazy task creation without
taking data locality into consideration has been done in [17] Various jobs can be submitted to the scheduler at the
employing dynamic load balancing scheme implementing same time. Further, new jobs can be added up continuously
central queue and local queue while considering data while the older ones can keep finishing. Thus it is a
locality problem. completely dynamic job scenario with a changing load.
Each job submitted for execution, in turn, can be considered
III. SYSTEM PARAMETERS AND PROPOSED to be comprising of sub-modules which can run in parallel.
STRATEGY For a simple picture of the scheduler with one job submitted
for execution, the process starts by randomly allocating
The proposed model presents a centralized dynamic
these sub-modules of the job(s) to the processing elements.
load balancing strategy which continuously keeps a track of
This random allocation results in a possible scenario in
the load on the nodes using threshold with the aim of
which few of the nodes gets a large number of sub-jobs to
minimizing the turnaround time of the jobs submitted for
execute while some may get very less or no module to
execution.
execute thus demanding load balancing which becomes the
additional responsibility of the central scheduler. Whenever,
the model experiences an uneven distribution of load, a
Job Job Job readjustment of load is initiated to evenly distribute the load
Job, to b. ,eheduled
over the nodes till a balanced state is reached.
�in Priority
Threshold (Tupper) values which are adaptive by nature. As a
node is assigned some workload, the same enters its job
execution queue. The global queues are maintained by the
qu.u., L
central scheduler only and are implemented as maximum
priority queue for heavily loaded nodes and minimum
priority queue for lightly loaded nodes. As the load on the
system changes these thresholds are adjusted to suite the
changing load on the system making the threshold selection
adaptive i.e. the threshold values increases with increasing
load and vice versa. As the average number of jobs in local
queues of processing element increases, the threshold values
are readjusted and so the global queues regarding the normal
loaded nodes, lightly loaded nodes and heavily loaded nodes
are adjusted. The load balancing process is instantaneous.
As soon as the heavily loaded nodes and lightly loaded
nodes are reported, the central scheduler starts load
balancing.
928
2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing
Table I-Parameter Used in the Model Tlower and Tupper are set to be 90% and 110% of M
Parameters Description respectively. Thus the scheduler continues to load balance
the system to bring the average workload between ±1 0
K Number of nodes
percent of the mean M.
J Number of jobs
Ji Job identifier where I <= i <= J
Initially the values of thresholds T\ower and Tupper are
Ni Node identifier where 0 <= i <= K-I
Workload on each node Ni
taken as I and 2 respectively and are gradually adjusted
Ii
T lower Lower threshold using the node's workload sorted in the ascending order.
Tuooe," Upper threshold Now, as the load on a node increases the value of thresholds
Lower half mean of Ii for the nodes sorted in are readjusted and accordingly the number of nodes in L, H
LHM
ascending order
and X keeps on changing. Nodes belonging to L, H and X
Upper half mean of 1; for the nodes sorted in
lIHM can be decided using equation (vi), (vii) and (viii)
ascending order
Mean of Ii for the nodes sorted in ascending respectively.
M
can be calculated using Lower Half Mean (LHM), Upper Move all nodes to minimum priority queue L
LHM'LHM> =0,9M
0,9 M'LHM <0,9M ------------------------ (iv)
As the values of LHM and UHM approaches M the
L LHM<I, 0.9M < 1
system approaches balanced state with even distribution of
UHM: UHM<=I.IM workload. The process of load redistribution continues for
UHM > 1. 1 M
1. 1 M' ------------------------ (v) remaining number of nodes in L and H, reporting lightly
2: UHM<2, 1.1 M < 2 loaded and heavily loaded status until either of the queue L
or H becomes empty. Simultaneously, the threshold values
In proposed model both Tupper and Tlower are adaptive in are also adjusted with the changing queue lengths thereby
nature. LHM and UHM provide us the reference points changing the values in H, L and X as well. In this way using
using which Tlower and Tupper are set. The scheduler works work oftloading as basic load redistribution strategy, no
with the intention of bringing that state of the system in node is idle if extra load is there on any node in a system.
which both LHM and UHM (and hence Tlower and Tupper) The number of jobs that are transferred from a heavily
ranges between ± I 0% of the mean M resulting in a load loaded node to the lightly loaded node is governed using
balanced state. If LHM and UHM are outside this range equation (ix).
929
2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing
Number ofjobs to be transferred= (/ iE H- liE r)/2 --- (ix) It can be seen from the example till now that the system
started with the threshold values Tlower and Tupper as 1 and2
The algorithm for the same is presented in the box above,
respectively. Here, the mean M of the workload is 13.4,
requiring the Tlower and Tupper values to be modified to move
IV, ILLUSTRATIVE EXAMPLE
the bias towards M which acts the average workload of the
To better understand the model, an example is system. Since, the difference between LHM (3.2) and UHM
illustrated in this section to present the basic working of the (23.06) from M ( 1 3.4) is very large, it indicates that there
model in terms of the turnaround time computation. The are many nodes which are under loaded and overloaded
example considers that no new job is added to local queue necessitating the load balancing to continue. Accordingly
of a node and no job is taken away from the queue until the using equations (iv) - (v), the new value of Tlower and Tupper
load balancing is done making it static whereas in practice, can be calculated as
the model performs load balancing on the workload
Tlower = max (max (LHM, 0.9M), 1)
dynamically. In other words, the model considers the job
= max (max (3.2, 1 2.06), 1 )
service rate to be less than job arrival rate leading to
= 12.06.
removal of no job from the queue until the allotment has
Tupper = max (min (UHM, l.lM), 2)
been done.
=max (min (23.6, 1 4.74),2)
The example considers a scenario with a total number = 1 4.74.
of available nodes for execution as 1 1. As per the
The nodes that come under L and H as per equation (vi)
scheduling strategy, No acts as the central node and Nl to
- (vii) becomes
N10 acting as the processing elements for job execution.
Load on each node Ni is represented by Ii. Initially Tlower L (N), Nz N3, N4, Ns, N6)
and Tupper are assumed to be I and 2 respectively. Total 134 H (N10, N9, Ng, N7)
jobs are assumed to be submitted to the system for
It can be seen that node NI is the most lightly loaded
execution and the allotment after random distribution of the
node with NlO being the most heavily loaded node. Thus
load is as shown in the Table 2. Therefore, each entry in the
node Nl is workload balanced with NlO as per equation (ix)
table opposite to the node identifier indicates the number of
by transferring some jobs from NlO to N1• Similarly, Nz is
jobs assigned to a node. The allotment clearly suggests an
balanced with N9, N3 is balanced with Ng and N4 is balanced
unbalanced state of the system thus prompting the scheduler
with N7. This results in emptying the queue H. Therefore,
to take corrective measures.
the scheduler stops the load balancing for the moment. The
Table 2-Initial Allocation of Load resultant load on each node after redistribution is shown in
Table 3.
N, N2 N3 N. N5 N6 N7 N8 N9 N,o
Table 3: Load on Nodes after Balancing
I 2 3 4 6 9 IS 22 25 47
N, Nz N3 N. N5 N6 N7 N8 N9 N,o
Similarly, UHM is also calculated as per equation (ii) removal of new jobs for the sake of simplicity.
which is the mean of load on N6, N7, Ng, N9, and NlO and is Table 4- Sorted Nodes According to Load of Table 3
calculated as
Ns N. N. N7 N3 Nz N. N9 Nt NlO
UHM = (9+ 1 5+22+25+47)/5
=23.6.
6 9 9 10 12 13 13 14 24 24
The value of M is then calculated as per equation (iii)
and is mean of load on N), Nz N3, N4, Ns, N6, N7, Ng, N9, In the way as illustrated till now, the new values of
and NIO which is calculated as LHM, UHM and M now becomes 9.2, 1 7.6 and 13.4
respectively. It can be seen now that difference between
M = ( 1 +2+3+4+6+9+ 1 5+22+25+47)/ 1 0
LHM and UHM with M has reduced considerably indicating
= 13.4.
some load balancing which can be observed from Table 4 as
930
2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing
well where the distribution of workload is more uniform as The new values of LHM, UHM and M are 0, 0.8 and
compared to the initial state, Similarly, the values of Tlower 0.4 respectively. So the values ofTlower and Tupper are 1 and 2
and Tupper are calculated as 1 2.06 and 1 4.74 respectively. respectively. The nodes under L are NJ, N2, N3, N4, N6 and
The nodes that come under L are Ns, N4, N6 and N7 and H N7 and there is no node under H. This state presents the
are NIO and N\. The load is again balanced and the result is other extreme in which L is non empty and H is empty again
shown in Table 5 with Table 6 showing in the sorted order. indicating the balanced state. Therefore, the nodes carry on
execution till all local queues become empty. The final
Table 5-Load Redistribution of Nodes of Table 4
allocation of the workload to the nodes after complete load
Ns N4 N6 N7 N3 Nz N. No N, NIO
balancing is shown in Table 10 presenting the nodes with
exact number of jobs allocated and hence executed. It can be
IS 16 9 10 12 13 13 14 17 IS seen that this value is near 13.4 which is the mean M of the
workload.
Table IO-Nodes with Number of Jobs Allotted / Executed
N6 N7 N3 N2 N. No Ns NIO N. N,
Allotted I 2 3 4 6 9 IS 22 25 47
9 10 12 13 13 14 IS IS 16 17
Executed 13 13 13 13 13 13 13 14 14 IS
931
2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing
balancing strategy using thresholds, The threshold values [19] Drosinos, Nikolaos, Koziris, Nectarios,"Load Balancing Hybrid
Programming Models for SMP Clusters and Fully Permutable
used here are adaptive in nature i,e, as the load on the
Loops", Supercomputing IEEE 2000 Conference, 04-10 Nov. 2000,
system increases, threshold values are readjusted to suite the pp 10-10.
changing load on the system. The model works in such a
way that the thresholds tend to converge the load towards
the mean of the workload. These values becomes
approximately equal when the load becomes evenly
distributed depicting the balanced state of the system.
Moreover, the load redistribution process is fair as load is
first readjusted between heavily loaded node and lightly
loaded node through the use of max priority queue and min
priority queue. The balancing process utilizes minimum
CPU time as redistribution is only carried out when lightly
loaded and heavily loaded nodes are reported. In the present
work, it has been assumed that if average of the workload is
distributed and hence executed by the processing elements,
best results can be realized in terms of the turnaround time.
Even better solution can be obtained if the model is made
more realistic by considering other issues related to load
balancing like communication cost and data locality.
REFERENCES
932