0% found this document useful (0 votes)
42 views70 pages

Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2

The document discusses Flynn's taxonomy for classifying computer architectures and describes different types of parallel computer architectures including SISD, SIMD, MISD, and MIMD machines. It then provides details on shared-memory and distributed-memory MIMD machines as well as examples of interconnection networks used in parallel systems.

Uploaded by

Abdul Barii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views70 pages

Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2

The document discusses Flynn's taxonomy for classifying computer architectures and describes different types of parallel computer architectures including SISD, SIMD, MISD, and MIMD machines. It then provides details on shared-memory and distributed-memory MIMD machines as well as examples of interconnection networks used in parallel systems.

Uploaded by

Abdul Barii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Lecture 27

Platforms
and
Interconnection Networks
Flynn’s Taxonomy
In 1966, Michael Flynn classified systems according to
numbers of instruction streams and the number of data
stream.
Data stream

Single Multiple

SISD SIMD
Instruction stream

Single

Uniprocessors
Processor arrays
Pipelined vector
processors
MISD MIMD
Multiple

Systolic arrays Multiprocessors


Multicomputers
SISD Machine
Example: single CPU
computers (serial
computer)
• Single instruction:
Only one instruction
stream is acted on by
CPU during one clock
cycle
• Single data: Only one
data stream is used
as input during one
clock cycle
• Deterministic
execution
SIMD Machine (I)
• A parallel computer
• It typically has a single CPU devoted exclusively
to control, a large number of subordinate ALUs,
each with its own memory and a high-
bandwidth internal network.
• Control CPU broadcasts an instruction to all
subordinate ALUs, and each of the subordinate
ALUs either executes the instruction or it is idle.
• Example: CM-1, CM-2, IBM9000
SIMD Machine (2)
Control CPU

ALU 0 ALU 1 ALU p

Mem 0 Mem 1 Mem p

Interconnection network
SIMD Machine (3)

From Introduction to Parallel Computing


MIMD Machine (I)
• Most popular parallel computer architecture
• Each processor is a full-fledged CPU with both a
control unit and an ALU. Thus each CPU is capable
of executing its own program at its own space.
• Execution is asynchronous. Processors can also be
specifically programmed to synchronize with each
other.
• Examples: networked parallel computers,
symmetric multiprocessor (SMP) computer.
MIMD Machine (II)

Load A(1) call func


Load B(1) X = Y*Z
C(1) = A(1)*B(1) Sum = X^2

time
Store C(1) call subroutine1(i)
Next instruction Next instruction
CPU 0 CPU 1

Further classification according to memory access:


• Shared-memory system
• Distributed-memory system (Message-passing)
Shared-Memory MIMD Machine (I)
• Multiple processors can operate independently, but share the same
memory resources (a global address space).
• Change in a memory location made by one processor is visible to all other
processors.
• Two classes of shared-memory architecture based on network connecting
memory modules: Bus-based shared-memory architecture (SGI Challenge
XL ); Switch-Based architecture (Convex SPP1200).
• Classes of shared-memory systems based on time taken by a processor to
access any memory: uniform memory access (UMA), NUMA.

Image recognition.
Image partitioned into
16 sections, each being
analyzed by a different
CPU. (Tanenbaum,
Structured Computer
Organization)
Shared-Memory MIMD Machine (II)
Bus-based shared-memory architecture

In principle, at a time, only one message is allowed to be


sent. So poor performance.
Shared-Memory MIMD Machine (III)
• Cache coherence
For any shared-memory architecture that allows the caching of
shared variables, if processor A update a shared variable 𝒙𝒙 in its
cache, how to make sure values of all copies of 𝒙𝒙 are current.

Good News: Cache coherence is achieved at the hardware level through snoopy protocol etc.
Distributed-Memory MIMD Machine (I)
• Each processor has its own private memory.
• A communication network is built to connect inter-processor memory
• No concept of global address space of memory across all processors
• No cache coherence concept
• Data exchange is through message passing

Image recognition. The image is split


among the 16 memories. (Tanenbaum,
Structured Computer Organization)
Case Study: LLNL Linux cluster architecture

From https://computing.llnl.gov/tutorials/linux_clusters/
Nodes

Front view of compute nodes from LC Opteron cluster

Quad-core, quad-socket Opteron compute node


Frames/Racks
An SU consists of: Nodes (compute, login,
management, gateway)
First stage switches that connect to each node
directly
Miscellaneous management hardware
Frames sufficient to house all of the hardware
Additionally, a second stage switch is also needed
for every 2 SUs in a multi-SU cluster (not shown).
Interconnect overview

Two-stage interconnect, Atlas, Juno – 8SU

Second stage switch:


Adapter card: Voltaire 288-port Switch, back.
Processing node’s All used ports connect to
link to interconnect First stage switch: Voltaire 24- first stage switches .
port Switches and Nodes, Back
Interconnection Network (I)
• Dynamic network switch
Degree of the switch = number of ports on a
switch
Switch functions:
-- mapping from input to output ports
-- internal buffering (when the requested output
port is busy)
-- routing (to reduce network congestion)
-- multicasting (same output on multiple ports)
-- non-blocking: disjoint paths between each pair
of independent inputs and outputs
Network interface
• Network interface is to handle the connectivity
between the node and the network
• It has input and output ports that pipe data from
and to the network
• Function:
--- packetizing data
--- computing routing information
--- buffering incoming and outgoing data
--- error checking
Interconnection Network (II)
• Static network (direct network): point-to-point communication
links between computing nodes
• Dynamic network (indirect network): built using switches and
communication links. Communication links are connected to one
another dynamically by switches to establish paths.

Static network Dynamic network


Interconnection Network (III)
• Linear array: each node has two neighbors

• 1D torus (ring)
Interconnection Network (IV)
• K-dimensional mesh: nodes have 2k neighbors

2D mesh 2D mesh with wrapround 3D mesh


link (2D torus)
Cray T3E

• Use Alpha 21164A processor with 4-way superscalar


architecture, 2 floating point instruction per cycle
• CPU clock 675 MHz, with peak rating 1.35 Gigaflops, 512 MB
local memory
• Parallel systems with 40 to 2176 processors (with modules of 8
CPUs each)
• 3D torus interconnect with a single processor per node
• Each node contains a router and has a processor interface and
six full-duplex link (one for each direction of the cube)
• IBM BlueGene/L

In 1999, IBM Research


announced a 5-year $100M project,
named Blue Gene, to develop a petaflop
computer for research in computational
biology

• IBM BlueGene/L uses a three-dimensional (3D) torus network in


which the nodes (red balls) are connected to their six nearest-
neighbor nodes in a 3D mesh. In the torus configuration, the ends of
the mesh loop back, thereby eliminating the problem of
programming for a mesh with edges. Without these loops, the end
nodes would not have six near neighbors.
Interconnection Network (V)
Hypercubes: the topology has two nodes along each dimension
and log2 p dimensions.

• D-dimensional cube is constructed


by connecting corresponding nodes
of two (D-1) dimensional cube
• D-dimensional cube: P nodes in
total, D = log2 P
• Distance between any two nodes is
at most log P.
Interconnection Network (VI)
Binary Trees

Distance between any two nodes is no more than 2 log p


• Problem
--- Messages from one half tree to another half tree are routed through the top
level nodes
—links closer to root carry more traffic than those at lower levels.
Interconnection Network (VII)
Fat Tree Network
• Increased the number of communication links and switching
nodes closer to the root.
• The fat tree is suitable for dynamic networks

A fat tree network of 16 processing nodes


Interconnection Network (II)

• Metrics for static network


Diameter: longest distance between two nodes – Indication of
maximum delay that a message will encounter in being
communicated between a pair of nodes.
Connectivity: a measure of multiplicity of paths between any
two nodes.
Arc conectivity: minimum number of arcs that must be removed from
the network to break it into two disconnected networks.
Bisection width: minimum number of communication links that
must be removed to partition the network into two equal halves
– minimum volume of communication allowed between any
two halves of the network.
Metrics for static network topology
Cost Model for Communicating Messages
Time to communicate a message (𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ) between two
nodes in a network: sum of the time to prepare a message for
transmission and the time taken by the message to traverse the
network to its destination.
• Startup time (𝑡𝑡𝑠𝑠 ): the time required to handle a message at sending
and receiving nodes.
• Pre-hop time (𝑡𝑡ℎ ): the time taken by the header of a message to
travel between two directly-connected nodes, which is also known
as node latency. This time is directly related to the latency within
the routing switch for determining which output buffer or channel
the message should be forward to.
• Per-word transfer time (𝑡𝑡𝑤𝑤 ): If the channel bandwidth is 𝑟𝑟 words
per second, then each word takes time 𝑡𝑡𝑤𝑤 = 1/𝑟𝑟 to traverse the
link.
𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 needs to send 𝑚𝑚 words to traverse 𝑙𝑙 communication
link:
• Store-and-forward switching:
When a message traverses a path with multiple links, each
intermediate node on the path forwards the message to the
next node after it has received and stored the entire
message.
𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡𝑠𝑠 + 𝑡𝑡ℎ + 𝑡𝑡𝑤𝑤 𝑚𝑚 𝑙𝑙
• Cut-through switching:
A message is broken into fixed size units call flow control
digits of flits. After the connection from source to
destination is established, the flits are sent one after the
other.
𝑡𝑡𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 𝑡𝑡𝑠𝑠 + 𝑙𝑙𝑡𝑡ℎ + 𝑡𝑡𝑤𝑤 𝑚𝑚
Nersc Carver
• Carver, a liquid-cooled IBM iDataPlex system, has 1202
compute nodes (9,984 processor cores). This represents a
theoretical peak performance of 106.5 Teraflops/sec.
Type of Node Number Cores/Node Mem/Node Mem/Core

Nehalem 2.67GHz 960 8 24GB 1333MHz 3 GB


"smallmem"
Nehalem 2.67GHz 160 8 48GB 1066MHz 6 GB
"bigmem"
Westmere 2.67GHz 80 12 48GB 1333MHz 4 GB
Nehalem-EX 2 32 1TB 1066MHz 32 GB
2.00GHz

Interconnect
All Carver nodes are interconnected by 4X QDR InfiniBand technology, meaning
that 32 Gb/sec of point-to-point bandwidth is available for high-performance
message passing and I/O. The interconnect consists of fiber optic cables
arranged as local fat-trees within a global 2D mesh.
Additional Reference
• Using InfiniBand for a scalable compute infrastructure.
Technology brief, 4th edition
• https://computing.llnl.gov/tutorials/linux_clusters/
• http://www.nersc.gov/
• A.S. Tanenbaum, Structured Computer Organization
Chapter 5. Cloud Access and Cloud
Interconnection Networks
Contents
1. Clouds and networks.
2. Packet-switched networks.
3. Internet.
4. Relations between Internet networks.
5. Transformation of the Internet.
6. Web access and TCP congestion
7. Named data networks
8. Interconnection networks for computer clouds.
9. Clos networks, Myrinet, InfiniBand, fat trees.
10. Storage area networks.
11. Data center networks.
12. Network management algorithms.
13. Content delivery networks.
14. Overlay and scale-free networks.
Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 2
1. Clouds and networks
◼ Unquestionably, communication is at the heart of cloud computing.
 Interconnectivity supported by a continually evolving Internet made cloud
computing feasible.
 A cloud is built around a high-performance interconnect, the servers of a
cloud infrastructure communicate through high-bandwidth and low-latency
networks.
◼ Cloud workloads fall into four broad categories based on their
dominant resource needs: CPU-intensive, memory-intensive, I/O-
intensive, and storage-intensive. While the first two benefit from, but
not do not require, high-performing networking, the last two do.
Networking performance directly impacts the performance of I/O- and
storage-intensive workloads
◼ The designers of a cloud computing infrastructure are acutely aware
that the communication bandwidth goes down and the communication
latency increases the farther from the CPU data travels.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 3


8. Interconnection networks for computer clouds

◼ While processor and memory technology have followed Moore's


Law, interconnection networks have evolved at a slower pace.
◼ From 1997 to 2010 the speed of the ubiquitous Ethernet network
has increased from 1 to 100 Gbps. This increase is slightly slower
than the Moore's Law for traffic which predicted, 1 Tbps Ethernet
by 2013.
◼ Interconnection networks are a major factor in determining the
overall performance and cost of the system.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 4


Basic concepts
◼ A network consists of nodes and links or communication channels.
◼ An interconnection network can be:
 Non-blocking if it is possible to connect any permutation of sources and
destinations at any time.
 Blocking if this requirement is not satisfied.
◼ Switches and communication channels are the elements of the
interconnection fabric.
 Switches→ receive data packets, look inside each packet to identify the
destination IP addresses, then use the routing tables to forward it to the
next hop towards its final destination.
 An n-way switch→ has n ports that can be connected to n communication
links.
◼ The degree of a node is the number of links the node is connected to.
◼ Nodes → could be processors, memory units, or servers.
◼ Network interface of a node → hardware connecting it to the network.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 5


Interconnection networks
◼ Interconnection networks are distinguished by:
 Topology - is determined by the way nodes are interconnected
 Routing - routing decides how a message gets from source to destination.
 Flow control - negotiates how the buffer space is allocated.
◼ The topology of an interconnection network determines:
 Network diameter - the average distance between all pairs of nodes
 Bisection width - the minimum number of links cut to partition the network
into two halves. When a network is partitioned into two networks of the
same size the bisection bandwidth measures the communication bandwidth
between the two.
◼ There are two basic types of network topologies:
 Static networks where there are direct connections between servers; for
example: (a) Bus; (b) Hypercube; (c) 2D-mesh: (d) 2D-torus.
 Switched networks where switches are used to interconnect the servers.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 6


Static networks: (a) Bus; (b) Hypercube; © 2D-mesh; and (d) Torus.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 7


Switched networks.
(Left) An 8 x 8 crossbar switch. 16 nodes are interconnected by 49
switches represented by the dark circles.
(Right) An 8 x 8 Omega switch. 16 nodes are interconnected by 12
switches represented by white rectangles.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 8


Cloud interconnection networks
◼ While processor and memory technology have followed Moore's
law, the interconnection networks have evolved at a slower pace
and have become a major factor in determining the overall
performance and cost of the system.
◼ The networking infrastructure is organized hierarchically: servers
are packed into racks and interconnected by a top of the rack router;
the rack routers are connected to cluster routers which in turn are
interconnected by a local communication fabric.
◼ The networking infrastructure of a cloud must satisfy several
requirements:
 Scalability.
 Low cost.
 Low-latency.
 High bandwidth.
 Provide location transparent communication between servers.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 9


Location transparent communication

◼ Every server should be able to communicate with every other server with
similar speed and latency.
◼ Applications need not be location aware.
◼ It also reduces the complexity of the system management.
◼ In a hierarchical organization true location transparency is not feasible and
cost considerations ultimately decide the actual organization and
performance of the communication fabric.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 10


Store and forward and cut-through networks

◼ Store and-forward networks → an entire packet is buffered and its


checksum is verified in each node along the path from the source to
the destination.
◼ Cut-through (wormhole) networks → packet is forwarded to its next
hop as soon as the header is received and decoded. This decreases
the latency, but a packet can still experience blocking if the outgoing
channel expected to carry it to the next node is in use. In this case
the packet has to wait until the channel becomes free.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 11


Routers and switches
◼ The cost of routers and the number of cables interconnecting the
routers are major components of the cost of interconnection network.
◼ Better performance and lower costs can only be achieved with
innovative router architecture → wire density has scaled up at a slower
rate than processor speed and wire delay has remained constant .
◼ Router – switch interconnecting several networks.
 low-radix routers – have a small number of ports; divide the bandwidth into
a smaller number of wide ports.
 high-radix routers - have a large number of ports; divide the bandwidth into
larger number of narrow ports
◼ The number of intermediate routers in high-radix networks is reduced;
lower latency and reduced power consumption.
◼ The pin bandwidth of the chips used for switching has increased by
approximately an order of magnitude every 5 years during the past two
decades.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 12


Network characterization

◼ The diameter of a network is the average distance between all


pairs of nodes; if a network is fully-connected its diameter is equal
one.
◼ When a network is partitioned into two networks of the same size,
the bisection bandwidth measures the communication bandwidth
between the two.
◼ The cost.
◼ The power consumption.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 13


A side-by-side comparison of performance and cost figures of several
interconnection network topologies for 64 nodes.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 14


Network: Links & switches

◼ Circuit consists of dedicated resources in sequence


of links & switches across network
◼ Circuit switch connects input links to output links
⚫Switch
⚫Network
Control

Link Switch
1 1
2 2
User n 3 Connection 3
of inputs
User n – 1 …


to outputs
User 1

N N
15
Circuit Switch Types

◼ Space-Division switches
 Provide separate physical connection between inputs and outputs
 Crossbar switches
 Multistage switches
◼ Time-Division switches
 Time-slot interchange technique
 Time-space-time switches
◼ Hybrids combine Time & Space switching

16
Crossbar Space Switch

⚫ N x N array of
crosspoints
1
⚫ Connect an input to
2
an output by closing


a crosspoint
⚫ Nonblocking: Any N
input can connect to
idle output …
1 2 N –1 N
⚫ Complexity: N2
crosspoints

17
Multistage Space Switch
◼ Large switch built from multiple stages of small switches
◼ The n inputs to a first-stage switch share k paths through intermediate
crossbar switches
◼ Larger k (more intermediate switches) means more paths to output
◼ In 1950s, Clos asked, “How many intermediate switches required to
make switch nonblocking?”
2(N/n)nk + k (N/n)2 crosspoints

nk N/n  N/n kn


1 1
1
nk kn
2 2
N N/n  N/n N
inputs nk 2 kn outputs
3 3



nk kn
N/n N/n
N/n  N/n
18 k
Clos Non-Blocking Condition: k=2n-1
⚫ Request connection from last input to input switch j to last output in output switch m
⚫ Worst Case: All other inputs have seized top n-1 middle switches AND all other
outputs have seized next n-1 middle switches
⚫ If k=2n-1, there is another path left to connect desired input to desired output

nxk N/n x N/n kxn


1 1
1


n-1
busy N/n x N/n
Desired nxk n-1 kxn Desired
j m
input output
n-1
N/n x N/n
n+1 busy


# internal links =
N/n x N/n 2x # external links
2n-2
nxk kxn
N/n
Free path N/n2n-1
x N/n Free path N/n
19
Minimum Complexity Clos Switch
C(n) = number of crosspoints in Clos switch

( ) ( n)
2 2
= 2 Nk + k N = 2 N (2n − 1) + (2n − 1) N
n
Differentiate with respect to n:

dC 2N 2 2N 2 2N 2 N
0= = 4N − 2 + 3  4N − 2  n 
dn n n n 2
The minimized number of crosspoints is then:
 N 2  N 
C =  2N +
*

 2 − 1 
  4 N 2 N = 4 2 N 1.5

 N / 2  2 
This is lower than N2 for large N

20
Example: Clos Switch Design

◼ Circa 2002, Mindspeed offered a Crossbar


chip with the following specs:
 144 inputs x 144 outputs, 3.125
Gbps/line 8x16 144144 16x8
1

1152 outputs
 Aggregate Crossbar chip throughput: 1
1
450 Gbps 8x16 16x8
2 2
144x144

1152 inputs
◼ Clos Nonblocking Design for 1152x1152 8x16 2 16x8
switch 3 3



 N=1152, n=8, k=16


 N/n=144 8x16 switches in first stage
8x16 16x8
 16 144x144 in centre stage
144 N/n
 144 16x8 in third stage 144x144
16
 Aggregate Throughput: 3.6 Tbps!

 Note: the 144x144 crossbar can be


partitioned into multiple smaller switches
21
9. Clos networks, InfiniBand, Myrinet, fat trees
◼ Clos → Multistage nonblocking network with an odd number of stages.
 Consists of two butterfly networks. The last stage of the input is fused with
the first stage of the output.
 All packets overshoot their destination and then hop back to it; most of the
time, the overshoot is not necessary and increases the latency, a packet
takes twice as many hops as it really needs.
◼ Folded Clos topology → the input and the output networks share switch
modules. Such networks are called fat trees.
 Myrinet, InfiniBand, and Quadrics implement a fat-tree topology.
◼ Butterfly network → the name comes from the pattern of inverted
triangles created by the interconnections, which look like butterfly wings.
◼ Transfers the data using the most efficient route, but it is blocking, it
cannot handle a conflict between two packets attempting to reach the
same port at the same time.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 22


Output network
Input network

(a) (b)

◼ a) A 5-stage Clos network with radix-2 routers and unidirectional channels;


the network is equivalent to two back-to-back butterfly networks.
◼ (b) The corresponding folded-Clos network with bidirectional channels; the
input and the output networks share switch modules.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 23


in0 out0 in0 out0
S0 S1 S2 S3 out1 S’0 out1
in1 in1

in2 out2 in2 out2


S’1
in3 out3 in3 out3

in4 out4 in4 out4


S’2
in5 out5 in5 out5

in6 out6 in6 out6


S’3
in7 out7 in7 out7

in8 out8 in8 out8


S’4
in9 out9 in9 out9

in10 out10 in10 out10


S’5
in11 out11 in11 out11

in12 out12 in12 out12


in13 in13 S’6
out13 out13

in14 out14 in14 out14


S’7
in15 out15 in15 out15

(a) (b)

(a) A 2-ary 4-fly butterfly with unidirectional links.


(b) The corresponding 2-ary 4-flat flattened butterfly is obtained by combining the
four switches S0, S1, S2, and S3, in the first row of the traditional butterfly into a
single switch S0‘, and by adding additional connections between switches

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 24


Fat trees
◼ Optimal interconnects for large-scale clusters and for WSCs.
◼ Servers are placed at the leafs.
◼ Switches populate the root and the internal nodes of the tree.
◼ Have additional links to increase the bandwidth near the root of the
tree.
◼ A fat-tree network can be built with cheap commodity parts as all
switching elements of a fat-tree are identical.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 25


Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 26
A 192 node fat-tree interconnection network with two 96-way and twelve
24-way switches in a computer cloud. The two 96-way switches at the root
are connected via 48 links. Each 24-way switch has 6 x 8 uplink
connections to the root and 6 x 16 down connections to 16 servers.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 27


A192 node fat-tree interconnect with two 96-way and sixteen 24-way switches.
Each 24-way switch has 2 x 6 uplink connections to the root and 12 down
connections to 12 servers.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 28


InfiniBand
◼ Interconnection network used by supercomputers and computer clouds.
 Has a switched fabric topology designed to be scalable.
 Supports several signaling rates.
 The energy consumption depends on the throughput.
 Links can be bonded together for additional throughput.
◼ Offers point-to-point bidirectional serial links intended for the connection
of processors with high-speed peripherals, such as disks, as well as
multicast operations. Advantages.
 high throughput, low latency.
 supports quality of service guarantees and failover - the capability to switch
to a redundant or standby system

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 29


InfiniBand (Cont’d)

◼ InfiniBand supports:
 Quality of service guarantees.
 Failover - the capability to switch to a redundant or standby system.
◼ The data rates.
 single data rate (SDR) - 2.5 Gbps in each direction per connection.
 double data rate (DDR) - 5 Gbps.
 quad data rate (QDR) – 10 Gbps.
 fourteen data rate (FDR) – 14.0625 Gbps.
 enhanced data rated (EDR) – 25.78125 Gbps.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 30


The evolution of the speed of several high speed interconnects. The data rates supported by InfiniBand:
SDR, DDR, QDR, FDR, and EDR

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 31


Architecture of a Computer Cluster

Copyright © 2012, Elsevier Inc. All rights reserved.


1 - 32
InfiniBand System fabric

Target Channel Adapter (TCA)

Copyright © 2012, Elsevier Inc. All rights reserved.


1 - 33
Example : InfiniBand (1)
◼ Provides applications with an easy-to-use
messaging service.
◼ Gives every application direct access to the
messaging service without need not rely on the
operating system to transfer messages.
◼ Provides a messaging service by creating a
channel connecting an application to any other
application or service with which the application
needs to communicate.
◼ Create these channels between virtual address
spaces.
Example : InfiniBand (2)

• InfiniBand creates a channel directly connecting an


application in its virtual address space to an
application in another virtual address space.
• The two applications can be in disjoint physical
address spaces – hosted by different servers.
InfiniBand Architecture
◼ HCA – Host Channel Adapter. An HCA is the point at
which an InfiniBand end node, such as a server or
storage device, connects to the InfiniBand network.
◼ TCA – Target Channel Adapter. This is a specialized
version of a channel adapter intended for use in an
embedded environment such as a storage appliance.
◼ Switches – An InfiniBand Architecture switch is
conceptually similar to any other standard networking
switch, but molded to meet InfiniBand’s performance and
cost targets.
◼ Routers – Although not currently in wide deployment, an
InfiniBand router is intended to be used to segment a
very large network into smaller subnets connected
together by an InfiniBand router.
Example: InfiniBand System Fabric

Copyright © 2012, Elsevier Inc. All rights reserved.


1 - 37
Myrinet
◼ Myrinet → interconnect for massively parallel systems developed at
Caltech. Features:
 Robustness ensured by communication channels with flow control,
packet framing, and error control.
 Self-initializing, low-latency, cut-through switches.
 Host interfaces that can map the network, select routes, and translate
from network addresses to routes, as well as handle packet traffic.
 Streamlined host software that allows direct communication between
user processes and the network.
 The network is scalable, its aggregate capacity grows with the number
of nodes
◼ Supports high data rates.
◼ A Myrinet link consists of a full-duplex pair of 640 Mbps channels
and has regular topology with elementary routing circuits in each
node.. Simple algorithmic routing avoid deadlocks and allow
multiple packets to flow concurrently through the network.

Dan C. Marinescu Cloud Computing. Second Edition - Chapter 5. 38

You might also like