1 Module 1 Introduction To Multiprocessors September 29 2024
1 Module 1 Introduction To Multiprocessors September 29 2024
Measures of Performance
speedup
p
What Speedups Can You Get?
✓ Linear speedup
– implicitly means a 1-to-1 speedup per processor.
– (almost always) as good as you can do.
✓ Sub-linear speedup: This is more normal due to overhead of
startup, synchronization, communication, etc.
speedup linear
actual
p
Scalability
✓ Roughly speaking, a program is said to scale to a certain number of
processors p, if going from p-1 to p processors results in some
acceptable improvement in speedup (for instance, an increase of 0.5).
Amdahl’s Law
✓ If 1/s of the program is sequential, then you can never get a
speedup better than s.
– (Normalized) sequential execution time = 1/s + (1- 1/s) = 1
– Best parallel execution time on p processors = 1/s + (1 - 1/s) /p
– When p goes to infinity, parallel execution =1/s
– Speedup = s.
2. Anti-dependence
3. Output Dependence
Static Dynamic
1D 2D HC
Bus-based Switch-based
11
Interconnection Networks
✓Multiprocessor interconnection networks (INs) can be classified based on a numbers of
criteria:
(1) Mode of operation (synchronous versus asynchronous),
(2) Control Strategy (centralized versus decentralized),
(3) Switching Techniques (Circuit versus packet), and
(4) Topology (static versus dynamic).
✓ Consists of N processors, each having its own cache, connected by a shared bus.
✓ The use of local caches reduces the processor-memory traffic.
✓ All processors communicate with a single shared memory.
✓ Typical size of such system varies between 2 to 50 processors.
✓ The actual size is determined by the traffic per processor and the bus bandwidth (defined as the maximum
rate at which the bus can propagate data once transmission has started).
✓ The single bus network complexity, measured in terms of the number of buses used, is O(1), while the time
complexity, measured in terms of the amount of input to output delay is O(N).
Machine Name Maximum # Processor Clock rate Maximum Bandwidth
processors Memory
HP 9000 K640 4 PA-8000 180 MHz 4,096 MB 960 MB/sec
IBM RS/6000 R40 8 PowerPC 604 112 MHz 2,048 MB 1800 MB/sec
Sun Enterprise 6000 30 UltraSPARC 1 167 MHz 30,720 MB 2600 MB/sec
Interconnection Networks
✓ Multiple Bus Systems
o A multiple-bus multiprocessor system uses several parallel buses to interconnect multiple
processors and multiple memory modules.
o Among the possibilities are
▪ multiple-bus with full bus-memory connection (MBFBMC),
▪ multiple-bus with single bus-memory connection (MBSBMC),
▪ multiple-bus with partial bus-memory connection (MBPBMC), and
▪ multiple-bus with class-based memory connection (MBCBMC).
o Illustrations of the multiple bus is shown below.
Mj
S( p p .....p p ) = p p .....p p p
m −1 m − 2 1 0 m−2 m−3 1 0 m −1
E(Pm-1 Pm-2 ….P1 P0) = Pm-1 Pm-2 ….P1 P0^--
o Example
In an 8-input single stage Shuffle-Exchange if the source is 0 (000) and the destination
is 6 (110), then the following is the required sequence of Shuffle/Exchange operations
and circulation of data:
The network complexity of the single stage interconnection network is O(N) and the time complexity is O(N).
001 001
010 010
011 011
100 100
101 101
110 110
111 111
✓ The figure shows an example of an 88 MIN that uses the 22 SEs described before.
✓ This network is known in the literature as the Shuffle-exchange network (SEN).
✓ The settings of the SEs in the figure illustrate how a number of paths (but NOT all) can be established
simultaneously in the network.
✓ Example:
o The figure shows how three simultaneous paths connecting the three pairs of input/output can be
established.
Interconnection Networks
o Example: The Banyan Network
000 1 5 9 000
001
001
010 2 6 10 010
011 011
100 3 7 11 100
101
101
110 4 8 12 110
111 111
P C
M M M M
C0 = 0 1 2 3 4 5 6 7
C1 = 0 1 2 3 4 5 6 7
❑ Consider the number of popular static topologies: (a) linear array, (b) ring, (c) mesh, (d) tree,
(e) hypercube.
Control Unit
P1 P2 P3 Pn-1 Pn
Interconnection Network
M1 M2 M3 Mn-1 Mn