0% found this document useful (0 votes)
17 views104 pages

Switches, Routers and Networks

Uploaded by

kundanla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views104 pages

Switches, Routers and Networks

Uploaded by

kundanla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 104

Switches, Routers and Networks

Muriel Medard EECS MIT


• Routing and switching:
Switch fabrics :
Basics of switching
Blocking
Interconnection examples
Complexity
Recursive constructions

Interconnection routing
Buffering -input and output
Local area networks (LANs)
Metropolitan area networks (MANs)
Wide area networks (WANs)
Trends
Introduction

• Data networks generally evolve fairly independently for different


applications and are then patched together – telephony, variety of
computer applications, wireless applications
• IP is a large portion of the traffic, but it is carried by a variety of
protocols throughout the network
• Voice is still the application that has determined many of the
implementation issues, but its share is decreasing and voice is increasingly
carried over IP (voice over IP)
• Voice-oriented networks are not very flexible, but are very robust
• IP very successful because it is very flexible, but increasingly there is a
drive towards enhancing the reliability of services
• How do all of these network types and requirements fit together?
• The two main purposes of a networks are:
Transmission across some distance: this involves amplification or
regeneration (generally code-assisted)
The establishment of variable flows: switching and
Switching and Routing

Switching is generally the establishment of connections on a circuit


basis
Routing is generally the forwarding of traffic on a datagram basis
• Routing requires switching but not vice-versa – routing uses
connections which are permanently or temporarily set up to in order to
forward datagrams (those datagrams may be in circuit form, for instance
VPs and VCs)
1. A packet switch consists of a routing engine (table look-up), a switch scheduler,
and a switch fabric.
2. The routing engine looks-up the packet address in a routing table and determines
which output port to send the packet.
1. Packet is tagged with port number
2. The switch uses the tag to send the packet to the proper output port
3. Line cards buffer the packets
4. Line card send packets to proper output
5. – Bus bandwidth must be N times LC speed (N ports)
In general a switch fabric replaces the bus
Switch fabrics are created from certain building blocks of smaller
switches arranged in stages
Simplest switch is a 2x2 switch, which can be either in the through or
crossed position
of outputs; connections are either point-to-point or multicast
• Basic switch building blocks are:
the distributor

the concentrator

0
0
11 – the 2x2 2-state point-to-point switch (switching cell) with a set of
interconnection lines such that
every node is an object with an array of inputs and an array of outputs
an interconnection line leads from an output of one node to an input of
another node
every I/O of a node is incident with at most one interconnection line
an I/O is called external if it is not incident with any interconnection
line

• A route from an external input to an external output is a chain of distinct


(a0, b0, a1, b1, …, ak, bk) where a0 and bk are external, bj-1 is
interconnected to aj
MIT
when:
every node qualifies to be a switch through proper specification of
connection states
the network is routable (there exists a route from every external input
to every external output)
an ordering is specified on external inputs and on external outputs

Unique routing interconnection networks: all routes from an external


input to an external output are parallel, that is (a0, b0, a1, b1, …, ak, bk)
and (a0, b’0, a’1, b’1, …, a’k, bk) are such that aj, a’j reside on the same
nodes and bj, b’j reside on the same node
Otherwise: alternate routing
MIT
Blocking

• A mxn unique routing network is called a nonblocking

if for any integer k < min(m,n)+1, any k external inputs, any k


external outputs and pairing between these external I/O, there exist k
disjoint routes for the matched pairs
For a routable network, the same property is that ot a rearrangeably
nonblocking, or rearrangeable network
An interconnection network is strictly non-blocking if requests for routes
are always granted under the rule of arbitrary route selection, wide-sense
non-blocking if there exists an algorithm for route selection that grants all
requests
property is given by the following theorem:
A switching network composed of non-blocking
switches is rearrangeable iff it constructs a non-
blocking switch
• A common means of building interconnection networks is to use a multi-
stage architecture:
every interconnection line is between two stages
every external input is on a first-stage node
every external output is on a final-stage node
nodes within each stage are linearly ordered
MIT
• Notice the order of inputs into a stage is a shuffle of the outputs from
the previous stage: (0,4,1,5,2,6,3,7)
• Easily extended to more stages
• Any output can be reached from any input by proper switch settings

Not all routes can be done simultaneously


Exactly one route between each OD pair
MIT
• Built using the basic 2x2 switch module
• Recursive construction

Construct an N by N switch using two N/2 by N/2 switches and a new stage of N/2
basic (2x2) modules
N by N switch has Log2(N) stages each with N/2 basic (2x2)
Complexity issues

• There are many different parameters that are used to consider the
complexity of an interconnection network
• Line complexity: number of interconnection lines
• Node (cell) complexity: number of small nodes (mxn where m < 3 and
n < 3)
• Depth: maximum number of nodes on a route (assuming an acyclic
interconnection network)
• Entropy of a switch: log of the number of connections states
• What relations exist between complexity and the capabilities of a
switch?
Complexity

• The depth of a mxn routable interconnection network is at least


max(log(m), log(n)).
• Proof: for a depth d, there are at most 2d external outputs. Since we
have routability, n< 2d+1 and m< 2d+1 .
• When a switching network is composed of 2-state switches, the
component complexity of the network is at least the entropy of the switch
• Proof: for E the number of switches, there are 2E ways to form a
combination of one connection state in every node. Each combination
corresponds to at most one connection state in the node.
Complexity

When a nxn rearrangeable network is composed of small nodes, its


component complexity is at least log(N!)
Proof: if we take every small node to be replaced by a 2-state point-to-
point switch, then we have a non-blocking switch. Thus, there is a
different connection state for everyone of the n! one-to-one mapping
between the n inputs and the n outputs. We now use the relation for
networks composed of 2-state switches.
Note: using Stirling’s formula, we can obtain an approximate simple
bound for component complexity
Component complexity:
n n n!�
(
)

�e � n

log(2J )
( )= n log n -1.44n + log(
� log n! ()
)+

�2 �2 so component
complexity is bounded from below by () -1.44n + Q(()
n logn logn )

• Relation between line and component complexity: component


complexity +mn = line complexity +m + n
Complexity

• If a mxn nonblocking network is composed of n12 1x2 nodes, n21


2x1 nodes, n22 cells, plus possibly crosspoints (edges), then n12 + n21 + 4
n22 = 2mn - m - n
• Corollary: a nxn non-blocking network composed of small nodes has
component complexity at least 0.5(n2 - n)
• Note: directed acyclic graphs can be seen as a special case of a
network - a crosspoint network.
• We have basic complexity properties, but how do we build networks?
Recursive 2-stage construction

2-stage interconnection with parameters m and n is composed of n


mxm input nodes and m nxn output nodes interconnected by a coordinate
interchange (static)
Constructions using trees:

16x16 60x60

4x4
4x4 6x6
10x10
2x2

2x2 2x2

2x2 2x2

3x3 5x5

2x2
Divide and conquer
• Basic blocks need not be 2x2, trees need not be balanced
• A three stage approach in which we use as the middle stage two
networks of size 2n-1 x 2n-1 to build a network of size 2n x 2n
2n-1
cells

2n-1
cells
• We denote by [nxm, rxp, mxq] the 3-stage network with r nxm input
nodes, m rxp middle nodes, p mxq output nodes such that – output y of
input node x is linked to input x of of middle node y – output u of middle
node y is linked to input y of output node u
• Rearrangeability theorem: the 3-stage network is rearrangeable iff

m > min{max {n,q}, nr, pq}


• It is strictly non-blocking iff
m > min{n + q -1, nr, pq}
Algorithms for finding maximum matching exist
The best known algorithms takes O(N2.5) operations

– Too long for large N


• Alternatives
Sub-optimal solutions
Maximal matching: A matching that cannot be made any larger for a
given backlog matrix
For previous example:

(1-1,3-3) is maximal (2-1,1-2,3-3) is maximum


• Fact: The number of edges in a maximal matching ≥ 1/2 the number of
edges in a maximum matching
• Use a tag: n bit sequence with one bit per stage of the network
– E.g., Tag = b3b2b1
• Module at stage i looks at bit i of the tag (bi), and sends the packet up
if bi=0 and down if bi=1
• In omega network, for destination port with binary address abc the tag
is cba

Example: output 100 => tag = 001


Notice that regardless of input port, tag 001 will get you to output 100

• What happens when packets cannot be forwarded to the right output


for the given setting of the switching fabric?
• If two packets want to use the same port one of them is dropped
• Suppose switch has m stages
• Packet transmit time = 1 slot (between stages)
• New packet arrival at the inputs, every slot
Saturation analysis (for maximum throughput)
Uniform destination and distribution independent from packet to
packet
P(m+1) = 1 – P(no packet on stage m+1 link (link c) ) = 1 – P(neither
inputs to stage m+1 chooses this output)
Each input has a packet with probability P(m) and that packet will
choose the link with probability 1/2. Hence,

1 2
P(m + 1) = 1-(1 - P(m))
2
• We can now solve for P(m) recursively
• For an m stage network, throughput (per output link) is P(m), which is
the probability that there is a packet at the output
• Modular Architecture
• Switch buffers: None, at input, or at output of each module Switch
fabric consists of many 2x2 modules
• Solution: Buffering
Distributed buffer
Contention and buffering
Buffers increase delay
Tradeoff between delay and throughput

Advantages: modular, scalable, bus (links) only needs to be as fast as


the line cards
• Disadvantages
– Delays for going through the stages
• Cut-through possible when buffers empty
– Decreased throughput due to internal blocking
• Alternatives: Buffers that are external to the switch fabric
1. Output buffers
2. Input buffers
As soon as a packet arrives, it is transferred to the appropriate output buffer
Assume slotted system (cell switch)
During each slot the switch fabric transfers one packet from each input (if available)
to the appropriate output

Must be able to transfer N packets per slot


Bus speed must be N times the line rate
No queueing at the inputs

• Buffer at most one packet at the input for one slot


• If external arrivals to each input are Poisson (average rate ), A each output
queue behaves as an M/D/1 queue
2

X=X=1
– packet duration equaling one slot
• The average number of packets at each output is given by (M/G/1
formula):
2
2 A -( A)
N=
Q
2(1 -A)
Note that the only delay is due to the queueing at the outputs and none
is due to the switch fabric
Advantages: No delay or blocking inside switch
Disadvantages:
– Bus speed must be N times line speed
• Imposes practical limit on size and capacity of switch
• Shared output buffers: output buffers are implemented in shared
memory using a linked list

Requires less memory (due to statistical multiplexing)


Memory must be fast
• During each slot, the scheduler established the crossbar connections to
transfer packets from the input to the outputs
– Maximum of one packet from each input
Through
put analysis of input queued switches
Head of line (HOL) blocking – when the packets at the head of two or
more input queues are destined to the same output, only one can be
transferred and the others are blocked
HOL blocking limits throughput because some inputs (consequently
outputs) are kept idle during a slot even when they have other packet to
send in their queue
Consider an NxN switch and again assume that inputs are saturated
(always have a packet to send)
Uniform traffic => each packet is destined to each output with equal
probability (1/N)
Now, consider only those packets at the head of their queues (there are
N of them!)
Let Qbe the number of HOL packets destined to node i
i

i ii

at the end of the mth slot Q= max(0, Q+ A -1)


mm -1 m
• Where
i
Am = number of new HOL messages addressed to node i that arrive to
the HOL during slot m. Now,
C
-
im -1 l Cm-1 l

P( Am = l) = () (1/ N)(1 -1/ N)


�l�
• WhereC = number of HOL messages that departed during the m-1
m -1

slot = number of new HOL arrivals


i

As N approaches infinity, Am becomes Poisson of rate C/N where C is


the average number of departures per slot
In steady-state, Qi behaves as an M/D/1 of rate and, as
A
before,

( A )2
i
Q
=
2(1 -A)
• Notice however that the total number of packets addressed
N
to the outputs is N (number of HOL packets). Hence, i =>

( A )2
i
LQ = N Q =
=1
i =1

2(1 -A)
• We can now solve, using the quadratic equation to obtain:

A = utilization = 2 -

2 � 0.58
• The maximum throughput of an input queued switch, is limited by HOL
blocking to 58% ( for large N)

– Assuming uniform traffic and FCFS service

• Advantages of input queues:


Simple
Bus rate = line rate

Disadvantages: Throughput limitation


If inputs are allowed to transfer packets that are not at the head of their
queues, throughput can be substantially improved (not FCFS)
Overcoming HOL blocking

Example:
How does the scheduler decide which input to transfer to which output?
31 233 0
2 0 0
1 input 2 0 0 2

• Each entry in the backlog matrix represent the number of packets in input i’s queue
that are destined to output j
• During each slot the scheduler can transfer at most one packet from each input to
each output

The scheduler must choose one packet (at most) from each row, and column of the
backlog matrix
This can be done by solving a bi-partite graph matching algorithm
The bi-partite graph consists of N nodes representing the inputs and N nodes
representing the outputs

MIT
Bi-partite graph representation

There is an edge in the graph from an input to an output if


there is a packet in the backlog matrix from that input to that output
For previous backlog matrix, the bi-partite graph is:

A matching is a set of edges, such that no two edges share a node: a


matching in the bi-partite graph is equivalent to a set of packets such that
no two packets share a row or column in the backlog matrix
A maximum matching is a matching with the maximum possible
number of edges: a maximum matching is equivalent to the largest set of
packets that can be transferred simultaneously
Algorithms for finding maximum matching exist
The best known algorithms takes O(N2.5) operations

– Too long for large N


• Alternatives
Sub-optimal solutions
Maximal matching: A matching that cannot be made any larger for a
given backlog matrix
For previous example:

(1-1,3-3) is maximal (2-1,1-2,3-3) is maximum


• Fact: The number of edges in a maximal matching ≥ 1/2 the number of
edges in a maximum matching
– Must look beyond one slot at a time in making scheduling decisions
Definition: A weighted bi-partite graph is a bi-partite graph with costs
associated with the edges
Definition: A maximum weighted matching is a matching with the
maximum edge weights
Theorem: A scheduler that chooses during each time slot the maximum
weighted matching where the weight of link (i,j) is equal to the length of
queue (i,j) achieves full utilization (100% throughput)
– Proof: see “Achieving 100% throughput in an input queued switch” by N. McKeown, et. al., IEEE Transactions on
Communications, Aug. 1999.
General
relation with bipartite matching

Stability of infinite input-buffered switch iff we can decompose the


traffic as a convex linear combination of 0,1 sub-stochastic matrices
Birkhoff-von Neumann principle
This links packets and flows to circuits
Corollary: if we know the traffic matrix well, then we can provide
stable service through a TDM schedule
Delay effects?
Robustness to poor knowledge of the traffic?
The driver behind LANs can be roughly thought of as increasing the
reach and sharing of a bus
• Traditional Ethernet: CSMA/CD, shared
• Other approach: token ring, for instance Fiber data distributed
interface (FDDI) Shared ring
Switched networks: Lines are not shared but go through a
router/switch
IEEE/ANSI 802 standards

Distributed Integrated queue


Demand priority
• PLS: physical signaling,
Manchester encoding
• • AUI (attachment unit
interface) manages data in
(DI), data out (DO) and
control in (CI)
• Ethernet s not
• Medium attachment unit RG
(MAU): transmits and receives
completely 802.3, but a data, loops data back from DO 58
close approximation to DI to indicate valid Tx and COA
Rx path detects collisions,
(there are some sends signal quality error X
differences in the signal, performs jabber
packet) function, checks link integrity MIT
• Ethernet node:
• • MAC enforces
CSMA/CD and performs:
• – Transmit and receive
message data encapsulation:
• Framing
• Addressing
• Error detection
• – Media access
management:
• Medium allocation (collision
avoidance)
• Contention resolution
(collision handling)
configuration acting Etherswitch was marketed by
as a virtual shared Kalpana to boost LAN
medium (also performance rather than as a
traditional 10Base5 bridge to interconnect
and Cheapernet different LANs and in 1993
10BASE2 on thick full-duplex interconnect was
and thin coax, also introduced by Kalpana
Increasing respectively were Still each port could only
Ethernet laid out) deliver 10 Mbps, the option
bandwidth – the 10Base-T over for higher (100 Mbps)
first step fiber was connection was FDDI, which
The first developed, was expensive
Ethernet went up to extending the Standardization was done
10 Mbs – 10BASE- distance between by the Fast Ethernet Alliance,
T, over phone grade MAUs to 2 km while the IEEE struggled
twisted pair, with a instead of 500 m in between 802.3 and a
repeater in the coax demand-priority camp, which
middle of a star 1990: the created the 802.12 group
Later 803.2u
standardized
100BASE-T
Main differences
between 10BASE-T
and 100BASE-T:
terminal equipment or the media independent sublayer,
repeaters added a reconciliation sublayer
Shorter distances – (going from bit-derial to nibble-
100 m for Cat 5, Cat 3 serial), went from Manchester
and 130 m for fiber (160 encoding to NRZ
No more mixing
m if all fiber network)
segments (coax with
Kept the MAC but • 10 GigE is emerging as
multiple devices
changed elements below a new standard
attached), all cabling is
to adapt ot 100 Mbps - http://www.10gea.org/Tec
point to point between
replaced the AUI with hwhitepapers.htm
10 Gigabit Ethernet

10 GigE is emerging as a new standard


The standard is being developed with SONET interoperability in mind
with a view towards expansion in the MAN and WAN end-to-end Ethernet
arena
In particular, the load will be be matched to OC-192 loads
Task force 802.3ae is in charge of developing 10 GE standard
Evolution to switched LANs

• VLANS were introduced to allow for smaller broadcast group:


the standardization efforts have not yet yielded interoperable VLANs, they are still
proprietary solutions
VLANs require a frame extension (802.3ac) to convey VLAN information via
tagging (802.1Q) (2 tags of 16 bits each), approved in 1998
• Layer 3 switches implement some routing in hardware:
Routers were generally used for interconnecting LANs and for remote WAN
connections
Switches traditionally had little intelligence but were very fast
Layer 3 switches still perform layer 2 switching but also some routing functionality
in ASICs
They also implement VLANs
Generally support only IP
• The Gigabit Ethernet Alliance (May 1996) started the push for Gigabit
Ethernet, mostly standardized as 802.3z in 1998
• • Main characteristics:
The MAC itself was modified so that there is 200 m network span with a single
repeater
The MII was changed to GMII, Tx and Rx data paths widened to 8 bits
Adoption of 8bit/10bit fibre channel encoding
Carrier extension: extending or padding from 64-byte minimum to 512-byte
minimum to maintain compatibility
Frame bursting to enhance efficiency: worst-case efficiency for 100 Mb/s CSMA/CD
is
76% for
Preamble length
1000 Mb/s with CSMA/CD is
Inter-frame gap
512
= 12%
4096 + 96 + 64
MIT
Minimum packet length 512 =
76% 512 + 96 + 64
Preamble length

• For 1000 Mb/s with CSMA/CD is


512
= 12%
Slot time

4096 + 96 + 64

• If we allow n frames to be transmitted in a burst after the first frame then


worst-case efficiency is

512(n + 1) 4096 + n(512 + 96 + 64)


• Efficiency gains beyond 65,536 bits is minimal and is about 72% at that
value
MIT
system interface (SCSI), which transfers data in blocks

• SCSI drawbacks:
Two or more I/O controllers cannot easily share SCSI devices on the same I/O bus,
so a single server controls connections between users and their data
Address on an I/O bus: 8 or 16 addresses depending on implementation
Distance 25 m
bus, the requirement for more storage has driven extending the SCSI
interface to many devices and eventually replacing a single storage
device with a full network, the storage area network (SAN)
• Based on Fibre Channel protocol (FC) fiber channel:
Gigabit per second bandwidth (1063 Mbps) and theoretically up to 4
Gbps
Allows SCSI in serial form rather than the parallel form usually found
in SCSI (also supports HIPPI and IPI I/O protocols)
Distance of up to 10 km
24-bit address identifier – up to 16 million ports
FC

• Upper level protocols include application, device drivers, operating systems


• Common services are striping, hunt groups, multicast
• Framing: frames of up to 2112 bytes, sequences (one or more frames), exchanges
(uni or bidirectional set of non-concurrent sequences, packets (one or more exchanges)

Port Level
NodeLevel
• Arbitrated loop topology:
up to 126 devices in a serial loop configuration
Each port discovers when it has been attached
No collisions
Fair access: every port wanting to initiate traffic gets to do so before
another port gets a

second shot
Differen
t types of FC SAN architectures

• Fabric topology
FC switch
This is not a shared bus!

Commerzbank Brocade set-up


Directly attached storage attached by SCSI directly, possibly shared
among servers
Network attached storage is in front of the server, directly attached to
the network, rather than behind the server as a SAN

Protocol is generally NFS vs. FC for SAN


Network is Ethernet vs. FC for SAN
Source and target are client/server or server/server vs. server/device
for SAN
Transfers files vs. device blocks for SAN
Connection is direct on network vs. I/O bus or channel on server for
SAN
Has an embedded file system
MIT
High availability in the enterprise
LANs or simply as the last leg of a WAN
• Certain protocols are particularly oriented towards MANs, such a
DQDB, dual bus either folded or not folded :
Exhibited certain issues with utilization fairness
Not very flexible in its layout architecture
Resilient Packet Ring

Rings for packet access in the MAN


Resilient packet ring alliance (RPR) and IEEE working group

802.17 (started December 2000)


Oriented towards IP
Recovery is done using traditional self-healing ring approach
Maintains the same architecture as SONET rings and FDDI, but
changes the MAC
WANs are predominantly implemented over optical networks
The underlying protocol is SONET (synchronous optical network) or
SDH in Europe and Japan (synchronous digital hierarchy)
Synchronous, so framing is in terms of timing
Lowest-speed SONET runs at STS-1, 51.84 Mbps
STS frames may be concatenated with a single header, which contains
pointers to the different headers of the STS frames
SONET provides very tight requirements on reliability
Typical implementations are UPSR or BLSR
Recovery must occur within 50 ms, detection of a problem occurs
within 2.4 microseconds
WANs are increasingly dense and require extensive network
management
Provisioning across WANs in short time is a growing as the reselling
market becomes more fluid
WANs are increasingly called upon to perform functions heretofore
reserved for LANs or MANs, so there is increasing convergence
Speed per wavelength is now 0C-48 (2.5 Gbps), OC-192 (10 Gbps)
possibly going towads 40 Gbps
WANs
WANs
Access to the Optical Infrastructure

Two trends in optical access:


IP, GE being pushed closer to the core
streaming media pushing core-type traffic closer to the edge

• How should access be architected:


role of network management
types of nodes

You might also like