0% found this document useful (0 votes)

146 views46 pages

Load Balancing

This document discusses various techniques for load balancing distributed systems. It describes static and dynamic load balancing, and covers algorithms like recursive bisection, diffusion-based approaches, and dimension exchange. The key goals of load balancing are to ensure all processors have work and to distribute load efficiently while minimizing communication overhead. Dynamic techniques periodically rebalance as loads change, while decentralized approaches avoid centralized bottlenecks.

Uploaded by

Lore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views46 pages

Load Balancing

Uploaded by

Lore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

CS 584

Load Balancing
Goal: All processors working all the
time

Efficiency of 1
Distribute the load (work) to meet the goal

Two types of load balancing

Static
Dynamic

Load Balancing
The load balancing problem can be
reduced to the bin-packing problem

NP-complete

For simple cases, we can do well, but

Heterogeneity
Different types of resources
Processor
Network, etc.

Evaluation of load
balancing
Efficiency

Are the processors always working?

How much processing overhead is associated
with the load balance algorithm?

Communication

Does load balance introduce or affect the

communication pattern?
How much communication overhead is
associated with the load balance algorithm?
How many edges are cut in communication
graph?

Partitioning Techniques
Regular grids (-: Easy :-)

striping
blocking
use processing power to divide load more
fairly

Generalized Graphs

Levelization
Scattered Decomposition
Recursive Bisection

Levelization
Begin with a boundary

Number these nodes 1

All nodes connected to a level 1

node are labeled 2, etc.
Partitioning is performed

determine the number of nodes per processor

count off the nodes of a level until exhausted
proceed to the next level

Levelization

Levelization
Want to insure nearest neighbor
comm.
If p is # processors and n is # nodes.
Let ri be the sum of the number of
nodes in contiguous levels i and i + 1
Let r = max{r1, r2, , rn}
Nearest neighbor communication is
assured if n/p > r

Scattered Decomposition
Used for highly irregular grids
Partition load into a large number r of
rectangular clusters such that r >> p
Each processor is given a disjoint set
of r/p clusters.
Communication overhead can be a
problem for highly irregular problems.

Recursive Bisection
Recursively divide the domain in
two pieces at each step.
3 Methods

Recursive Coordinate Bisection

Recursive Graph Bisection
Recursive Spectral Bisection

Recursive Coordinate
Bisection
Divide the domain
based on the
physical coordinates
of the nodes.
Pick a dimension and
divide in half.
RCB uses no connectivity information

lots of edges crossing boundaries

partitions may be disconnected

Some new research based on graph

separators overcomes some problems.

Ineritial Bisection
Often, coordinate bisection is susceptible
to the orientation of the mesh
Solution: Find the principle axis of the
communication graph

Graph Theory Based

Algorithms
Geometric algorithms are generally
low quality

they dont take into account

connectivity

Graph theory algorithms apply

what we know about generalized
graphs to the partitioning problem
Hopefully, they reduce the cut size

Greedy Bisection
Start with a vertex of
the smallest degree

least number of edges

Mark all its neighbors

Mark all its neighbors
neighbors, etc.
The first n/p marked
vertices form one
subdomain
Apply the algorithm
on the remaining

Recursive Graph Bisection

Based on graph
distance rather than
coordinate distance.
Determine the two
furthest separated
nodes
Organize and partition
nodes according to
their distance from
extremities.

Computationally
expensive

Can use approximation

methods.

Recursive Spectral
Bisection
Uses the discrete Laplacian
Let A be the adjacency matrix
Let D be the diagonal matrix where

D[i,i] is the degree of node I

LG = A - D

Recursive Spectral
Bisection
LG is negative semidefinite
Its largest eigenvalue is zero and the
corresponding eigenvector is all ones.
The magnitude of the second largest
eigenvalue gives a measure of the
connectivity of the graph.
Its corresponding eigenvector gives a
measure of distances between nodes.

Recursive Spectral
Bisection
The eigenvector corresponding to
the second largest eigenvalue is
the Fiedler vector.
Calculation of the Fiedler vector is
computationally intensive.
RSB yields connected partitions
that are very well balanced.

Example

RSB
299 edges cut

RCB 529 edges cut

RGB 618 edges cut

Global vs Local Partitioning

Global methods produce a good
partitioning
Local methods can then be used to
improve the partitioning

The Kernighan-Lin
algorithm
Swap pairs of nodes to decrease the cut
Will allow intermediate increases in the cut size to
avoid certain local minima
Loop

choose the pair of nodes with largest benefit of swapping

logically exchange them (not for real)
lock those nodes
until all nodes are locked

Find the sequence of swaps that yields the largest

accumulated benefit
Perform the swaps for real

The Kernihan-Lin
Algorithm

Helpful-Sets
Two Steps

Find a set of nodes in one partition and

move it to the other partition to decrease
the cut size
Rebalance the load

The set of nodes moved must be helpful

Helpfulness of node is equal to the
change in cut size if the node is moved

Helpful-Sets

All these sets are

2 - helpful

Helpful-Sets Algorithm

The Helpful-Sets Algorithm

Theory

If there is a bisection and if its cut size is not

too small then there exists a small 4helpful set in one side or the other
This 4-helpful set can be moved and will
reduce the cut by 4
If imbalance is not too large and cut of
unbalanced partition is not too small then
it is possible to rebalance without increasing
the cut size by more than 2

Apply the theory iteratively until too

small condition is met.

Multi-level Hybrid Methods

For very large graphs, time to
partition can be extremely costly
Reduce time by coarsening the graph

shrink a large graph to a smaller one

that has similar characteristics

Coarsen by

heavy edge matching

simple partitioning heuristics

Multi-level Hybrid Methods

Comparisons

(x.xx) run time in seconds

ML Multilevel (spectral on coarse KL on intermedia
IN Inertial
Party 5 or 6 different methods

Graph
airfoil

|v|
4253

|e|
12289

ML
85
(0.08)

Chaco
IN
94
(0.00)

crack

10240

30380

211
(0.16)

377
(0.01)

218
(0.05)

196
(0.14)

243
(0.10)

208
(0.44)

wave

156317 10593319542
(3.64)

9834
(0.19)

9660
(1.61)

9801
(3.50)

10361
(2.84)

9614
(11.93)

22579

13643
(0.06)

9897
(0.06)

8869
(3.45)

8869
(11.52)

lh
1443
20148
total edge weight

36376
487380 (0.33)

mat

73752

17617189359
(1.80)

DEBR

10485762097149100286
(48.99)

IN+KL
83
(0.02)

Metis
PMetis
85
(0.04)

all
94
(0.04)

all+HS
83
(0.15)

9555
(2.04)

Party

101674 172204 94272

(988.39) (16.63) (577.97)

(0.

Dynamic Load Balancing

Load is statically partitioned initially
Adjust load when an imbalance is
detected.
Objectives

rebalance the load

keep edge cut minimized (communication)
avoid having too much overhead

Dynamic Load Balancing

Consider adaptive algorithms
After an interval of computation

mesh is adjusted according to an

estimate of the discretization error
coarsened in areas
refined in others

Mesh adjustment causes load

imbalance

Dynamic Load Balancing

After refinement, node 1 ends up with more work

Centralized DLB
Control of the load is centralized
Two approaches

Master-worker (Task scheduling)

Tasks are kept in central location
Workers ask for tasks
Requires that you have lots of tasks with weak locality

requirements. No major communication between

workers

Load Monitor
Periodically, monitor load on the processors
Adjust load to keep optimal balance

Repartitioning
Consider: dynamic situation is simply a
sequence of static situations
Solution: repartition the load after each

some partitioning algorithms are very quick

Issues

scalability problems
how different are current load distribution
and new load distribution
data dependencies

Decentralizing DLB
Generally focused on work pool
Two approaches

Hierarchy

Fully distributed

Fully Distributed DLB

Lower overhead than centralized
schemes.
No global information

Load is locally optimized

Propagation is slow
Load balance may not be as good as
centralized load balance scheme

Three steps

Flow calculation (How much to move)

Mesh node selection (Which work to move)
Actual mesh node migration

Flow calculation
View as a network flow problem

Add source and sink nodes

Connect source to all nodes
edge value is current load

Connect sink to all nodes

edge value is mean load
processor communication graph

Flow calculation
Many network flow algorithms

more intense than necessary

not parallel

Use simpler, more scalable algorithms

Random Matchings

pick random neighboring processes

exchange some load
eventually you may get there

Diffusion
Each processor balances its load with
all its neighbors

How much work should I have?

wtp1 wtp

t
t

(
w

w
pq p q )

q ,{ p , q}F

How much to send on an edge?

l tpq1 pq ( wtp wqt )

Repeat until all load is balanced

log(1 / )

O
2
1

steps

Diffusion
Convergence to load balance can be
slow
Can be improved with over-relaxation

Monitor what is sent in each step

Determine how much to send based on
current imbalance and how much was sent
in previous steps

Diffuses load in

log(
1
/

O
1 2

steps

Dimension Exchange
Rather than communicate with all neighbors
each round, only communicate with one

Comes from dimensions of hypercube

Use edge coloring for general graphs

Exchange load with neighbor along a

dimension

l = (li + lj)/2

Will converge in d steps if hypercube

Some graphs may need different factor to
converge faster

l = li * a + lj * (1 a)

Diffusion & Dimension

Exchange
Can view

diffusion as a Jacobi method

dimension exchange as Gauss-Seidel

Can use multi-level variants

Divide the processor communication

graph in half
Determine the load to shift across the
cut
Recursively rebalance each half

Mesh node selection

Must identify which mesh nodes to
migrate

minimize edge cut and overhead

Very dependent on problem

Shape & size of partition may play a role
in accuracy

Aspect ratio maintenance

Move items that are further away from
center of gravity.

Load Balancing Schemes

(Who do I request work from?)

Asynchronous Round Robin

each processor maintains target

Ask from target then increment target

Global Round Robin

target is maintained by master node

Random Polling

randomly select a donor

each processor has equal probability

Diploma Module 3 PDF
100% (3)
Diploma Module 3 PDF
296 pages
Chapter 11 Cost Acctng
67% (6)
Chapter 11 Cost Acctng
11 pages
FD11A MCQ Midsemester
0% (1)
FD11A MCQ Midsemester
6 pages
Ecpe Speaking Notes Part 1 Webinar
100% (3)
Ecpe Speaking Notes Part 1 Webinar
12 pages
Suzan S. Waryoba Vs Shija Dalawa
No ratings yet
Suzan S. Waryoba Vs Shija Dalawa
11 pages
Pharmacology: Drugs Acting On The Endocrine System
No ratings yet
Pharmacology: Drugs Acting On The Endocrine System
20 pages
MA252 - Combinatorial Optimisation
No ratings yet
MA252 - Combinatorial Optimisation
9 pages
Chapter 3 - Principles of Parallel Algorithm Design
No ratings yet
Chapter 3 - Principles of Parallel Algorithm Design
52 pages
Mughira 2
No ratings yet
Mughira 2
16 pages
Elena Ferrante Paris Review
No ratings yet
Elena Ferrante Paris Review
16 pages
Lecture 8
No ratings yet
Lecture 8
18 pages
Dsap l04 PDF
No ratings yet
Dsap l04 PDF
63 pages
Recent Advances in Graph Partitioning: Abstract
No ratings yet
Recent Advances in Graph Partitioning: Abstract
42 pages
L03 Geometric Decomposition
No ratings yet
L03 Geometric Decomposition
27 pages
Project 06
No ratings yet
Project 06
14 pages
Graph Partitioning Algorithms: CME342 - Parallel Methods in Numerical Analysis
No ratings yet
Graph Partitioning Algorithms: CME342 - Parallel Methods in Numerical Analysis
47 pages
Graph MF RA Exam Prep Notes Yx
No ratings yet
Graph MF RA Exam Prep Notes Yx
25 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Course:-Cad For Vlsi Partitioning Algorithm (K-L Algorithm and F-M Algorithm)
No ratings yet
Course:-Cad For Vlsi Partitioning Algorithm (K-L Algorithm and F-M Algorithm)
9 pages
05 Notes
No ratings yet
05 Notes
30 pages
And Other Applications of Graphs and Networks: Jo Ellis-Monaghan
No ratings yet
And Other Applications of Graphs and Networks: Jo Ellis-Monaghan
34 pages
Lecture3 Partition
100% (1)
Lecture3 Partition
24 pages
Federated Learning
No ratings yet
Federated Learning
36 pages
Unit 4
No ratings yet
Unit 4
68 pages
25 Approx
No ratings yet
25 Approx
70 pages
LAC (Location JArea Code) Optimisation
No ratings yet
LAC (Location JArea Code) Optimisation
28 pages
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
No ratings yet
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
135 pages
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
No ratings yet
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
28 pages
DAA 4th Unit Notes
No ratings yet
DAA 4th Unit Notes
22 pages
Graph
No ratings yet
Graph
31 pages
How To Use This Training Template
No ratings yet
How To Use This Training Template
8 pages
Graph Algorithm 2
No ratings yet
Graph Algorithm 2
5 pages
Unit 2 - 2.2 (Basic Algorithms)
No ratings yet
Unit 2 - 2.2 (Basic Algorithms)
8 pages
Lecture6 of The Mafis Hgadd. Uyddddexcfdds
No ratings yet
Lecture6 of The Mafis Hgadd. Uyddddexcfdds
54 pages
6A - Leadership in A Lean Turnaround: Art Byrne
No ratings yet
6A - Leadership in A Lean Turnaround: Art Byrne
27 pages
Branch and Bound
No ratings yet
Branch and Bound
49 pages
Energy Efficient Scheme For Large Scale Wireless Sensor Networks With Multiple Sinks
No ratings yet
Energy Efficient Scheme For Large Scale Wireless Sensor Networks With Multiple Sinks
52 pages
PHD Synthesis
No ratings yet
PHD Synthesis
4 pages
Presentat ION: Presented by Muneeb, Huzaifa, Dayyan, Jawad & Sharjeel
No ratings yet
Presentat ION: Presented by Muneeb, Huzaifa, Dayyan, Jawad & Sharjeel
32 pages
Multi-Dimensional Balanced Graph Partitioning Via Projected Gradient Descent
No ratings yet
Multi-Dimensional Balanced Graph Partitioning Via Projected Gradient Descent
14 pages
Unit-4 and 6 Mse-2 Updated by Aishwarya Anand
No ratings yet
Unit-4 and 6 Mse-2 Updated by Aishwarya Anand
43 pages
Business Ethics and Social Responsibility
No ratings yet
Business Ethics and Social Responsibility
55 pages
Sparse 1
No ratings yet
Sparse 1
68 pages
Algorithms
No ratings yet
Algorithms
8 pages
BCS 042 RJT Notes
No ratings yet
BCS 042 RJT Notes
25 pages
Resource Management
No ratings yet
Resource Management
35 pages
D Business Combinations - IFRS 3 (Revised)
No ratings yet
D Business Combinations - IFRS 3 (Revised)
10 pages
Recap
No ratings yet
Recap
10 pages
Graph
No ratings yet
Graph
54 pages
10 Graph Algorithms Visually Explained
No ratings yet
10 Graph Algorithms Visually Explained
16 pages
2008 Basics DDM FEA
No ratings yet
2008 Basics DDM FEA
24 pages
DS Unit - 4
No ratings yet
DS Unit - 4
11 pages
Advanced Graph Theory
No ratings yet
Advanced Graph Theory
11 pages
Sarthak Tomar53 Unit-4 DAA
No ratings yet
Sarthak Tomar53 Unit-4 DAA
9 pages
Definition of Trend: 2nd Semester Weeks 1-2
No ratings yet
Definition of Trend: 2nd Semester Weeks 1-2
2 pages
Graph Partitioning Implementation Strategy Pattern Name
No ratings yet
Graph Partitioning Implementation Strategy Pattern Name
12 pages
Graph
No ratings yet
Graph
62 pages
2.introduction To Medical Humanities Notes
No ratings yet
2.introduction To Medical Humanities Notes
4 pages
Chapter 11 - PPT - PPT - Updated
No ratings yet
Chapter 11 - PPT - PPT - Updated
20 pages
Unit IV College Notes
No ratings yet
Unit IV College Notes
12 pages
Graph Algorithm
No ratings yet
Graph Algorithm
10 pages
Ds Unit 4
No ratings yet
Ds Unit 4
5 pages
Adaptive Mesh Applications: Sathish Vadhiyar
No ratings yet
Adaptive Mesh Applications: Sathish Vadhiyar
32 pages
Introduction To Graph Partitioning
No ratings yet
Introduction To Graph Partitioning
5 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
Skrip Project 2
No ratings yet
Skrip Project 2
8 pages
Secured Borrowing and A Sale of Receivables
100% (2)
Secured Borrowing and A Sale of Receivables
1 page
Interrupt and Precise Exception: Computer System Architecture
No ratings yet
Interrupt and Precise Exception: Computer System Architecture
21 pages
A Study To Handle Dynamic Graph Partitioning
No ratings yet
A Study To Handle Dynamic Graph Partitioning
5 pages
Optimization Report
No ratings yet
Optimization Report
9 pages
Operations Research
No ratings yet
Operations Research
118 pages
Graph
No ratings yet
Graph
2 pages
DBDC - Assessment 3
No ratings yet
DBDC - Assessment 3
7 pages
GT Finall Report
No ratings yet
GT Finall Report
7 pages
Analysis and Investigation of Nearest Neighbor Algorithm For Load Balancing
No ratings yet
Analysis and Investigation of Nearest Neighbor Algorithm For Load Balancing
3 pages
IEEE802
No ratings yet
IEEE802
5 pages
IEEE802
No ratings yet
IEEE802
5 pages
Basic Graph
No ratings yet
Basic Graph
8 pages
Com Org Laureta
No ratings yet
Com Org Laureta
66 pages
Advanced Algorithms Course. Lecture Notes. Part 4: Using Linear Programming For Approximation Algorithms
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 4: Using Linear Programming For Approximation Algorithms
5 pages
Weighing The Truth: Feeding On Failure
No ratings yet
Weighing The Truth: Feeding On Failure
2 pages
Moderation Proforma For Assignment Briefs
No ratings yet
Moderation Proforma For Assignment Briefs
2 pages
Crown of Corruption - 1d4chan
No ratings yet
Crown of Corruption - 1d4chan
1 page
The Kite Runner Essay Good
No ratings yet
The Kite Runner Essay Good
8 pages
145
No ratings yet
145
4 pages
Law Admission Test (LAT) Past Papers July 2019
No ratings yet
Law Admission Test (LAT) Past Papers July 2019
12 pages
Hsslive Xi Maths QB 9. Sequences and Series
No ratings yet
Hsslive Xi Maths QB 9. Sequences and Series
3 pages
Đề thi vào lớp 6 CLC Trường Thanh Xuân đề 1
No ratings yet
Đề thi vào lớp 6 CLC Trường Thanh Xuân đề 1
4 pages
Song - The Hall of Fame
No ratings yet
Song - The Hall of Fame
1 page
Barangay Ned: Republic of The Philippines Province of South Cotabato Municipality of Lake Sebu
No ratings yet
Barangay Ned: Republic of The Philippines Province of South Cotabato Municipality of Lake Sebu
2 pages
SSLC Result 2024 25 Division Wise
No ratings yet
SSLC Result 2024 25 Division Wise
7 pages
Unit 1: Language and Communicatio N
No ratings yet
Unit 1: Language and Communicatio N
45 pages

Load Balancing

Uploaded by

Load Balancing

Uploaded by

CS 584

Two types of load balancing

For simple cases, we can do well, but

Are the processors always working?

Does load balance introduce or affect the

Number these nodes 1

All nodes connected to a level 1

determine the number of nodes per processor

Recursive Coordinate Bisection

lots of edges crossing boundaries

Some new research based on graph

Graph Theory Based

they dont take into account

Graph theory algorithms apply

least number of edges

Mark all its neighbors

Recursive Graph Bisection

Can use approximation

D[i,i] is the degree of node I

RCB 529 edges cut

RGB 618 edges cut

Global vs Local Partitioning

choose the pair of nodes with largest benefit of swapping

Find the sequence of swaps that yields the largest

Find a set of nodes in one partition and

The set of nodes moved must be helpful

All these sets are

The Helpful-Sets Algorithm

If there is a bisection and if its cut size is not

Apply the theory iteratively until too

Multi-level Hybrid Methods

shrink a large graph to a smaller one

heavy edge matching

Multi-level Hybrid Methods

(x.xx) run time in seconds

101674 172204 94272

Dynamic Load Balancing

rebalance the load

Dynamic Load Balancing

mesh is adjusted according to an

Mesh adjustment causes load

Dynamic Load Balancing

After refinement, node 1 ends up with more work

Master-worker (Task scheduling)

requirements. No major communication between

some partitioning algorithms are very quick

Fully Distributed DLB

Load is locally optimized

Flow calculation (How much to move)

Add source and sink nodes

Connect sink to all nodes

more intense than necessary

Use simpler, more scalable algorithms

pick random neighboring processes

How much work should I have?

How much to send on an edge?

l tpq1 pq ( wtp wqt )

Monitor what is sent in each step

Comes from dimensions of hypercube

Exchange load with neighbor along a

Will converge in d steps if hypercube

Diffusion & Dimension

diffusion as a Jacobi method

Can use multi-level variants

Divide the processor communication

Mesh node selection

minimize edge cut and overhead

Very dependent on problem

Aspect ratio maintenance

Load Balancing Schemes

Asynchronous Round Robin

each processor maintains target

Global Round Robin

target is maintained by master node

randomly select a donor

You might also like