Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
Platforms
and
Interconnection Networks
Flynn’s Taxonomy
In 1966, Michael Flynn classified systems according to
numbers of instruction streams and the number of data
stream.
Data stream
Single Multiple
SISD SIMD
Instruction stream
Single
Uniprocessors
Processor arrays
Pipelined vector
processors
MISD MIMD
Multiple
Interconnection network
SIMD Machine (3)
time
Store C(1) call subroutine1(i)
Next instruction Next instruction
CPU 0 CPU 1
Image recognition.
Image partitioned into
16 sections, each being
analyzed by a different
CPU. (Tanenbaum,
Structured Computer
Organization)
Shared-Memory MIMD Machine (II)
Bus-based shared-memory architecture
Good News: Cache coherence is achieved at the hardware level through snoopy protocol etc.
Distributed-Memory MIMD Machine (I)
• Each processor has its own private memory.
• A communication network is built to connect inter-processor memory
• No concept of global address space of memory across all processors
• No cache coherence concept
• Data exchange is through message passing
From https://computing.llnl.gov/tutorials/linux_clusters/
Nodes
• 1D torus (ring)
Interconnection Network (IV)
• K-dimensional mesh: nodes have 2k neighbors
Interconnect
All Carver nodes are interconnected by 4X QDR InfiniBand technology, meaning
that 32 Gb/sec of point-to-point bandwidth is available for high-performance
message passing and I/O. The interconnect consists of fiber optic cables
arranged as local fat-trees within a global 2D mesh.
Additional Reference
• Using InfiniBand for a scalable compute infrastructure.
Technology brief, 4th edition
• https://computing.llnl.gov/tutorials/linux_clusters/
• http://www.nersc.gov/
• A.S. Tanenbaum, Structured Computer Organization
Chapter 5. Cloud Access and Cloud
Interconnection Networks
Contents
1. Clouds and networks.
2. Packet-switched networks.
3. Internet.
4. Relations between Internet networks.
5. Transformation of the Internet.
6. Web access and TCP congestion
7. Named data networks
8. Interconnection networks for computer clouds.
9. Clos networks, Myrinet, InfiniBand, fat trees.
10. Storage area networks.
11. Data center networks.
12. Network management algorithms.
13. Content delivery networks.
14. Overlay and scale-free networks.
Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 2
1. Clouds and networks
◼ Unquestionably, communication is at the heart of cloud computing.
Interconnectivity supported by a continually evolving Internet made cloud
computing feasible.
A cloud is built around a high-performance interconnect, the servers of a
cloud infrastructure communicate through high-bandwidth and low-latency
networks.
◼ Cloud workloads fall into four broad categories based on their
dominant resource needs: CPU-intensive, memory-intensive, I/O-
intensive, and storage-intensive. While the first two benefit from, but
not do not require, high-performing networking, the last two do.
Networking performance directly impacts the performance of I/O- and
storage-intensive workloads
◼ The designers of a cloud computing infrastructure are acutely aware
that the communication bandwidth goes down and the communication
latency increases the farther from the CPU data travels.
◼ Every server should be able to communicate with every other server with
similar speed and latency.
◼ Applications need not be location aware.
◼ It also reduces the complexity of the system management.
◼ In a hierarchical organization true location transparency is not feasible and
cost considerations ultimately decide the actual organization and
performance of the communication fabric.
Link Switch
1 1
2 2
User n 3 Connection 3
of inputs
User n – 1 …
…
to outputs
User 1
N N
15
Circuit Switch Types
◼ Space-Division switches
Provide separate physical connection between inputs and outputs
Crossbar switches
Multistage switches
◼ Time-Division switches
Time-slot interchange technique
Time-space-time switches
◼ Hybrids combine Time & Space switching
16
Crossbar Space Switch
⚫ N x N array of
crosspoints
1
⚫ Connect an input to
2
an output by closing
…
a crosspoint
⚫ Nonblocking: Any N
input can connect to
idle output …
1 2 N –1 N
⚫ Complexity: N2
crosspoints
17
Multistage Space Switch
◼ Large switch built from multiple stages of small switches
◼ The n inputs to a first-stage switch share k paths through intermediate
crossbar switches
◼ Larger k (more intermediate switches) means more paths to output
◼ In 1950s, Clos asked, “How many intermediate switches required to
make switch nonblocking?”
2(N/n)nk + k (N/n)2 crosspoints
…
…
…
nk kn
N/n N/n
N/n N/n
18 k
Clos Non-Blocking Condition: k=2n-1
⚫ Request connection from last input to input switch j to last output in output switch m
⚫ Worst Case: All other inputs have seized top n-1 middle switches AND all other
outputs have seized next n-1 middle switches
⚫ If k=2n-1, there is another path left to connect desired input to desired output
…
n-1
busy N/n x N/n
Desired nxk n-1 kxn Desired
j m
input output
n-1
N/n x N/n
n+1 busy
…
…
# internal links =
N/n x N/n 2x # external links
2n-2
nxk kxn
N/n
Free path N/n2n-1
x N/n Free path N/n
19
Minimum Complexity Clos Switch
C(n) = number of crosspoints in Clos switch
( ) ( n)
2 2
= 2 Nk + k N = 2 N (2n − 1) + (2n − 1) N
n
Differentiate with respect to n:
dC 2N 2 2N 2 2N 2 N
0= = 4N − 2 + 3 4N − 2 n
dn n n n 2
The minimized number of crosspoints is then:
N 2 N
C = 2N +
*
2 − 1
4 N 2 N = 4 2 N 1.5
N / 2 2
This is lower than N2 for large N
20
Example: Clos Switch Design
1152 outputs
Aggregate Crossbar chip throughput: 1
1
450 Gbps 8x16 16x8
2 2
144x144
1152 inputs
◼ Clos Nonblocking Design for 1152x1152 8x16 2 16x8
switch 3 3
…
…
N=1152, n=8, k=16
…
N/n=144 8x16 switches in first stage
8x16 16x8
16 144x144 in centre stage
144 N/n
144 16x8 in third stage 144x144
16
Aggregate Throughput: 3.6 Tbps!
(a) (b)
(a) (b)
◼ InfiniBand supports:
Quality of service guarantees.
Failover - the capability to switch to a redundant or standby system.
◼ The data rates.
single data rate (SDR) - 2.5 Gbps in each direction per connection.
double data rate (DDR) - 5 Gbps.
quad data rate (QDR) – 10 Gbps.
fourteen data rate (FDR) – 14.0625 Gbps.
enhanced data rated (EDR) – 25.78125 Gbps.