0% found this document useful (0 votes)
12 views53 pages

H - CSN1 - Switching in Core Networks

Uploaded by

yhy20020805
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views53 pages

H - CSN1 - Switching in Core Networks

Uploaded by

yhy20020805
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Chapter 9 (Module H.

CSN1):
Switching Technologies in Core
Networks – From Circuit Switching to
MPLS and SDN
We will now turn to the topic of core networks and in particular, the various switching
technologies in the core networks. To this end, two basic approaches to transfer data via a
core network, namely circuit-switching and packet-switching, will be studied. For each
switching technique, its basic characteristics, operations, advantages, and drawbacks are
elaborated. In addition, the basic switching architectures that can be used for implementing
the switching fabric in a switching device are described and discussed, including two broad
categories: single stage switching fabric and multi-stage switching fabric.
Next, we will shift our focus to one of the most widely used switching technology in core
network today, namely MultiProtocol Label Switching. Starting from the concepts of packet-
switching, MultiProtocol Label Switching resolves many drawbacks in the packet-switching
with traditional routing protocols. By replacing the complex longest-prefix-match of long IP
addresses by the exact match of short labels, MultiProtocol Label Switching can improve the
switching performance, as well as support many additional applications such as Virtual
Private Network and Traffic Engineering. To this end, this chapter will give an overview on
the packet switching in MultiProtocol Label Switching and its label distribution mechanism.
Then, this chapter also summarizes the principles, signalling mechanisms and the switching
of labeled packets in Virtual Private Network and Traffic Engineering applications.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 1


Last but not least, we will look at the emerging Software Defined Networking technology
which promises great potentials for modern networking. First, a discussion will be laid out
to demonstrate the drawbacks of the existing traditional networking with distributed
intelligence and to motivate the development of Software Defined Networking. Then, the
main characteristics, architecture design, basic operations and its OpenFlow standard will
be described, discussed and illustrated through some networking scenarios to highlight
Software Defined Networking’s benefits and potentials. Finally, the compatibility and
operationality of Software Defined Networking with the latest technologies such as network
slicing and Network Function Virtualization are elaborated to bring more flexibilities and
innovative applications to the network operators.

9.1 Circuit‐Switching

Figure 1: Circuit-switching.
Different from LANs, the core network consists mainly of networking devices that do
not generate nor consume data, but rather forward data from one link to another from the
source to the destination. Since their main function is to switch data as fast as possible, they
make up a communication network that is normally referred to as a switched network. In
a switched network, the two most fundamental methods for forwarding data are circuit‐
switching and packet‐switching.
Circuit switching is the first method that was used for communicating over long
distances and will be our first focus in this chapter. The first crucial characteristic of circuit-
switching is that the two communicating hosts setup a dedicated channel through the various
nodes in the switched network before the actual data are sent, as illustrated in Figure 1. This
communication channel is maintained during the communication session and is terminated
by one of the two involved parties at the end of the session. In early versions of circuit-
switching, each channel is actually a physical link, or circuit; hence, making the name of the

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 2


scheme. Due to the setup of a dedicated channel, communication resources are reserved and
committed only to this channel even during idle periods of the information transfer. As such,
the communications between the two hosts function as if the two hosts are physically
directly connected to each other; hence the communicating performance is guaranteed.
Figure 2 illustrates a lifetime of a circuit-switching connection.
Circuit-switching was initially used for telephone network and is well-suited to this
application because the voice traffic is normally constant in rate and the network delay is
minimal once the circuit is setup. However, circuit-switching is inefficient and hence a costly
method of communication for other types of traffic, such as digital data transfer. Once a
circuit is setup, 100% of the circuit’s capacity is dedicated to the communication session even
though no data is transmitted. In addition, as described in the previous chapters, Internet
traffic is busty in its nature, where demands come in bursts of very large chunk of data. To
this end, the rigid constant capacity offered by circuit-switching is ill-suited.

Figure 2: Lifetime of a circuit-switching connection.

9.2 Packet‐Switching
Packet switching divides the transmitting data into small chunks of data, namely
packets. A small header is added to each packet to specify its desired destination. These
packets are sent over the network individually and are forwarded on at each intermediate
node, or routers, along the path to their destinations. In packet switching, the transmission
links are shared among multiple connections. If the capacity on a link is available, it can be

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 3


used for packet forwarding, instead of being blocked due to other reserved but idle
connections. As such, packet-switching is more efficient than circuit-switching.
In addition, each router in a packet switching network is equipped with a buffer as
described in Chapter 5. These built-in buffers in the network allow temporary bursts of data
that exceed the link capacity to be accepted in the network. This capability is well-suited for
transporting Internet traffic with bursty pattern. However, this benefit comes at a cost of
lengthen the delay of those packets that transit this link. This phenomenal leads to an
important characteristic of packet switching that is the variable and unpredictable delay
through the network. This delay depends on the current traffic volume in the network and
hence, can change drastically from one moment to another. Moreover, due to the finite
amount of available buffer in each router in the network, if the data burst lengthens, these
buffers will eventually overflow, leading to packet loss.

Figure 3: Store-and-forward in packet switching.


Last but not least, it is important to remark that the packet forwarding in packet-
switching network normally follows a store‐and‐forward manner. In this mechanism, a
router along the path has to receive the entire packet before it can forward that packet on to
the next link as illustrated in Figure 3. This behaviour introduces further delays that depend
on the number of intermediate routers on the path. This way of transmitting information

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 4


across the network is distinctively different from that of the transparent data transferring in
circuit-switching as depicted in Figure 2. To this end, in packet switching, it may not be
beneficial to transmit the entire data in a single packet and by dividing these data into
multiple smaller chunks, the transmission delay may be reduced. In particular, let the
number of links between the source and a destination be 𝑁, assuming those links are
identical, having a transmission rate of 𝑅bps and a propagation delay of 𝑑 . Furthermore, let
the length of the data be 𝐿 bits, the header length is ℎ bits long and is fixed for all packets.
The amount of data is partitioned into 𝑛 equal size packets (the size of each packet is 𝐿/𝑛
ℎ). Excluding the queuing delay and the processing time at each router, the total
transmission delay, 𝐷, for end-to-end transmission of the entire data volume is expressed as
in equation (9.1).
End-to-end delay

Figure 4: Relationship between packet size and end-to-end delay.


𝐿 𝑁 𝑛 1 𝐿
𝐷 ℎ 𝑑 𝑁 ℎ (9.1)
𝑛 𝑅 𝑅 𝑛
In equation (9.1), the first two terms specify the end-to-end delay for transmitting the
first packet through the network. As the previous packet is transmitted onto the next link
along the path, the input interface of the router can start receiving the next packet. The last
term in this equation specifies the transmission time of the last 𝑛 1 packets on the last
link of the path.
Figure 4 illustrates the relationship between the packet size and the end-to-end delay.
This figure verifies our previous intuition that keeping the packet size very large is not a good
idea and as we reduce the packet size, the end-to-end delay reduces quickly. However, if we

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 5


keep on reducing the packet size, this end-to-end delay starts increasing again. This increase
is due to the fact that in packet switching, each packet has to be accompanied by a header to
specify its destination. This header is normally of a fixed size but as we are reducing the
packet size, the inefficiency due to this header overhead kicks in, increasing the end-to-end
delay. As such, an optimal packet size can be identified. Nevertheless, this optimal packet size
depends on other factors such as the header size and the transmission rate. In Figure 4, it
can be seen that as the header size increases, this optimal point is in favour of larger packet
size to reduce the inefficiency. On the other hand, when the transmission rate increases, this
optimal point tends to shift to the region of smaller packet size so as to decrease the waiting
time for an entire packet before forwarding it onto the next link.
There are two approaches in realizing packet-switching, namely datagram switching
and virtual circuit switching as described below:

Figure 5: Datagram switching.


 Datagram switching: in this approach, no connections is established before data
transmission. The routers in the network treat each packet they receive independently from
each other based only on the information embedded in the header of each packet. This
approach for packet switching is very simple to implement and is very efficient as the
network would accept all the loads that are fed into it. However, as there is no pre-setup
connection for the packets in the same flow to follow, they may take different paths through
the network and arrive at the destination out of order as illustrated in Figure 5. It is the
responsibility of the destination node to reorder the received packets. In addition, since no
connection is setup before data transmission, the sending node does not have any knowledge
about the available capacity of the network at the time of transmission and just blindly

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 6


dumps packets into the network. If packet transmission happens when congestion occurs in
the network, these newly added packets may be dropped due to queue overflow. Besides,
even the communication session starts when the network is idle, during its lifetime, newly
arrived packets from other sessions would compete with the current packets in the network
and may interfere with it, causing packet drops. As such, this approach of packet switching
does not guarantee any type of quality of service rather than a best-effort packet forwarding
service.

Figure 6: Virtual circuit switching.


 Virtual circuit switching: this approach tries to merge the benefits of circuit-
switching on top of a packet-switching network. It mimics the circuit-switching mechanism
in the sense that a connection path must be established through the network before any data
transfer taking place. Once the communication path is established, all packets within this
communication session simply follow this path from the source to the destination. However,
different from circuit-switching, the available resources along the path are not dedicated to
only this virtual circuit but are rather shared between many other data flows as illustrated
in Figure 6.
In comparison to datagram switching, virtual circuit switching has many advantages.
First, becuase all packets of the same flow are transmitted via the same path, there is no issue
with the packet re-ordering. Second, because the transmission path is pre-established, the
packet switching performance can be improved. In particular, an identifier can be assigned
to each virtual circuit that passes through a router at each input of the router. Since these
identifiers need only to be uniquely identifiable at each input interface, the same identifiers
can be reused on other input interfaces as well as on other routers in the network. As a result,

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 7


the number of bits that are required for these identifiers can be significantly smaller than an
addressing scheme that needs to uniquely identify all hosts on the network. Then, each
packet in a virtual circuit switching scheme needs only to be identified with the virtual circuit
identifier that it belongs to, which potentially leads to a reduce in packet header length. For
switching decision, each router needs only to maintain a switching table that maps the input
identifier according to each input interface to the appropriate output interface, following the
connection setup. As such, hardware-based lookup can be implemented to improve the
packet processing time at the routers in comparison with the more complex software-based
routing table lookups. Last but not least, with connection setup, not only the travelling path
is defined, the reservation of communication resources can also be done on the routers along
the traversed path so that quality of service can be enforced.
However, virtual circuit switching is not without its drawbacks. First, for the connection
to be setup, each router in the network has to maintain the connection details of all virtual
circuits that pass through it. This leads to a scalability issue when the number of virtual
circuits become large. Moreover, when there is an incident happens in one of the routers or
links along the path of the virtual circuit, the virtual circuit must be re-established, which
leads to the interruption of ongoing data.

9.3 Switching structures

Figure 7: Block diagram of a switching device.


Regardless of employing either circuit switching or packet switching, a switching device
comprises four main components as illustrated in Figure 7. The multiple inputs and output
interfaces connect to links to receive and send traffic onto physical links. The switching fabric
operates under the control of the controller to transfer the incoming traffic from an input
interface to an output interface in a shortest duration of time. Within the scope of this sub-
section, only the switching fabric is elaborated as the other components were already
presented in Chapter 5.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 8


When the number of input and output interfaces grows, more traffic will be passed on
the switching fabric making it the bottleneck in the router structure. To this end, an
important attribute of a switching fabric is its ability to connect all inputs to all outputs in
any combination. If a switching fabric can connect any of its input to any of the available
output regardless of the existing connections through the switch is called non‐blocking. On
the other hand, blocking arises if the switch cannot connect an input port to an available
output port due to the lack of connection capacity within the switching fabric. The design of
the switching network is the factor that governs if the switch is blocking or not. As a result,
many switching structures were proposed for the implementation of switching fabric. In this
sub-section, the design of the switching fabric is divided into two main categories: single
stage and multi-stage switching fabric.
9.3.1 Single stage Switching Fabric
Shared memory switching fabric

Figure 8: Shared memory switch fabric.


A shared memory switching fabric uses a shared memory space among the input and
the output interfaces. To transfer user traffic to the desired output interface, the input
interface writes the input data into a location of the shared memory. The output interface
can then read the data content from this location as illustrated in Figure 8. It can be seen that
this method is straightforward to implement; however, its performance is limited. In a
shared memory switch fabric, because the shared memory can be accessed only once at a
time, parallelism is not possible, i.e., it is infeasible to switch multiple input traffic at the same
time. In addition, for each switching operation, two memory operations are required, one
write, and one read. As a result, the maximum switching rate of this type of switching fabric
is only half of that of the memory access speed.
Shared bus switching fabric
In a shared bus switching fabric, instead of using a shared memory, the switching fabric
employs a shared bus that connects the input and the output interfaces. In this design, the
shared bus is used to send an input traffic to its desired output interface directly as
illustrated in Figure 9. Since the switching bus is shared among the output interfaces, once
an input interface places an input traffic on this shared bus, it will be received by all output
interfaces. To determine which output interface is intended for this traffic, a small header
can be added to the input traffic at the input interface, specifying the destination interface,

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 9


before transmitting onto the bus. Based on this header, the corresponding output interface
is identified. The output interface must remove this added header before any further
processing.

Figure 9: Shared bus switch fabric


In this approach, since the bus is shared, only one input traffic can be transmitted on the
bus at a time, parallelism is also not possible, and the throughput of the switching device is
limited by the speed of the bus. As a result, this structure is not scalable as the bus speed for
each connected interface will reduce with the number of connected interfaces.
Time‐slot interchange (TSI) switching fabric
In this switching structure, multiple input signals are multiplexed in time onto a high-
speed transmission line. To support communication between a pair of input-output
interfaces, a time‐slot interchange (TSI) switching mechanism is used.
Figure 10 illustrates the block diagram of a time-division switch with a TSI switching
mechanism. At the input, the different input traffic flows are time-multiplexed together
where each flow is associated with a fixed time-slot, for example, input a to time-slot 1, input
b to timeslot 2 and so on. At the heart of a time-division switch is a TSI switching mechanism
where each time-slot is written sequentially into memory slots in a RAM. Then, to enable
switching between inputs and outputs, the TSI mechanism manipulates the readout order
from the RAM. For instance, to enable communications between input a and output b, the TSI
mechanism interchanges the readout order of time-slot 1 and 2 in the TSI output multiplexed
data stream. As a result, data from input a are now placed in timeslot 2, which is pre-assigned
to output b at the demultiplexer, effectively switching traffic from input a to output b.
Similarly, it can be verified that the interchanging of time-slots in Figure 10Error! Reference
source not found. allows full-duplex communications between a and b as well as between c
and d.
The TSI mechanism is simple to implement and requires no moving parts. However, due
to the interchanging of memory slots, there is a delay of up to one full time frame. In addition,
due to the delay in memory access, there is a limit to the scalability of this mechanism. As
describe above, for each time-slot, two memory accesses are required, one for writing and
one for reading. Assuming that the memory access durations for both actions are the same,
𝜏, and the frame rate for each input traffic flow is 𝑅, the maximum number of channels that
can be switched through TSI without causing significant delay is expressed as in equation
(9.2).

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 10


1
Number of channels (9.2)
2𝑅𝜏

Figure 10: Time-slot Interchange switching structure.


Crossbar switching fabric
Different from the previous switching fabric architectures, a crossbar switching fabric
allows parallelism, i.e. it can support multiple packets transmitted on the switching fabric at
the same time by dedicating a separate physical path is to each of the connections between
input and output interfaces. As the connections are separated in space by the different
physical paths, this switching structure is also referred to as space division switching.
Figure 11 illustrates a crossbar switch, which includes 𝑁 input and output interfaces,
each connects to a separate horizontal and vertical line accordingly. To setup connection
between input and output ports, crosspoints are placed at each intersection of the horizontal
and the vertical lines. When an input interface 𝑖 needs to be connected to an output interface
𝑗, the crosspoint at the intersection of the vertical line 𝑖 and the horizontal line 𝑗 is activated,
connecting the two vertical and horizontal lines and hence, allowing signals to transmit from
the appropriate input to the output ports.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 11


Figure 11: Crossbar switching fabric.
It can be easily verified that the crossbar switch is non-blocking as the switching
network can support a connection from any input to any free output port via the appropriate
crosspoint. However, this non-blocking characteristic does not come cheap. For a crossbar
switch with 𝑁 input and output ports, 𝑁 crosspoints are required; hence, the complexity of
a crossbar switch scale with the square of its ports (𝑂 𝑁 ). In addition, among these 𝑁
crosspoints, at most 𝑁 can be used at a time, when all 𝑁 input and output ports are connected
together simultaneously, making the crosspoint utilization very poor.
9.3.1 Multiple-stage Switching Fabric
Clos‐based switching fabric
To reduce the complexity of crossbar switches, a one-stage crossbar switch can be
broken-down into multiple stages, where each stage consists of a number of much smaller
crossbar switches connecting to those in other stages. Figure 12 illustrates a Clos network
equivalent of a 𝑁 𝑁 crossbar switch with three stages: an input stage consists of 𝑁/𝑛
crossbar switches of the size 𝑛 𝑚, a middle stage with 𝑚 crossbar switches of the size
𝑘 𝑘, where 𝑘 𝑁/𝑚, and an output stage with 𝑁/𝑛 crossbar switches of the size 𝑛 𝑚.
The output stage is a mirror of the input stage. In this setup, each of the outputs of an input-
stage switch is connected to a different middle stage switch. The same principle applies for
the output stage where each of the inputs of an output-stage switch is connected to a
different middle-stage switch. In this way, when a middle-stage switch or some of its ports
are no longer available, the data routing between input and output switches is still feasible
via other alternate paths.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 12


It is noted that a multiple-stage switch such as depicted in Figure 12 is not always non-
blocking. In particular, a Clos-based switch is non-blocking only when 𝑚 2𝑛 1. Let 𝑚
2𝑛 1, the total number of crosspoints needed for a multiple-stage switch as in Figure 12 is
expressed in equation (9.3).
𝑁 𝑁 𝑁
Number of crosspoints 2 𝑛 2𝑛 1 2𝑛 1
𝑛 𝑛 𝑛
𝑁 (9.3)
2𝑛 1 2𝑁
𝑛
For instance, when 𝑁 1000, the number of crosspoints in a crossbar switch is 𝑁
10 . When a three-stage Clos network is used with 𝑛 10, the number of crosspoints
calculated based on equation (9.3) is 228 10 , significantly reducing the required
crosspoints by a factor of more than 4. It is noted that, this reduction in complexity does not
come free. As the number of switches in the multiple-stage network increases, the
complexity of controlling and coordinating such network becomes harder.

Figure 12: A 3-stage switch based on Clos network.


Hybrid Space and Time‐Division Switching
It can be seen that space-division switching, such as cross-bar design, is instantaneous
but is complex to implement while time-division switching, such as the TSI switching fabric,

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 13


is simple to implement but incurs a delay due to the interchanging of time slots. To get the
best of both worlds, a hybrid space and time division switching technique was developed.
This switching mechanism is inspired from the multiple-stage switch, such as the Clos
network in Figure 12, where the inputs and output stages are replaced by the time-division
switches while the middle stage is realized by only one space-division crossbar switch. The
result is a hybrid switch design as illustrated in Figure 13.
In Figure 13, the input stage consists of 𝑁/𝑛 time-division switches, which transform
the 𝑛 inputs into a time frame of 𝑚 time-slots, where the time-slots can be interchanged
following the user’s demand. The time frames from all the input TSI switches are connected
to an input port at the middle stage crossbar switch. Within this middle stage, at each time-
slot, the crossbar switch will switch the input data to the appropriate output. As such, this
technique is also referred to as time-multiplexed switching. At the output stage, the time
frame is de-multiplexed to appropriate output ports.

Figure 13: Hybrid Space and Time-Division Switch.


It can be seen that this hybrid design combines the best of both worlds. In terms of
complexity, since only one crossbar switch is utilized, the number of crosspoints required is
. In terms of delay, since smaller TSI switches are used, the associated delay can be
significantly reduced.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 14


Banyan‐based Self‐routing Fabric

Figure 14: A Banyan switch

Figure 15: A 8 8 Banyan switching fabric.


Another important switching fabric structure that we will discuss in this sub-section is
the self‐routing switching fabric. Different from a crossbar switch, self-routing scheme
does not require external controls on the configurations of the interconnections. The path
that a packet takes through the switching fabric is based on additional information that is
appended to the traffic at the input port in form of a routing header. After traversing the
switching fabric, this header will be removed at the output interface. A self-routing switching

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 15


fabric can be built from an interconnection of simple 2 2 switches in a multi-stage manner.
The simplest implementation of this structure is a Banyan network.
A basic element in a Banyan network is a Banyan switch that consist of two inputs and
two outputs. A Banyan switch will move the packet to its upper output if the appropriate bit
in the appended header is 0 and to its lower output if the bit is 1 as illustrated in Figure 14.
A Banyan switch also allows parallelism when the two input packets that are destined for
different outputs. When both inputs want to be transferred to the same output, a collision
occurs.
In order to realize a Banyan switching fabric for 𝑁 inputs and 𝑁 outputs, log 𝑁 stages
are required where each stage composes of 𝑁/2 Banyan switches. To reduce the chance of
collisions, the interconnection between stages in a Banyan fabric is arranged in a special
pattern. This pattern allows packets to be transferred to all 𝑁 outputs simultaneously
without any collision given that the packets are sorted in an ascending order at the inputs. A
8 8 Banyan network structure is illustrated in Figure 15. This Banyan network consists of
log 8 3 stages where each stage comprises of 8/2 4 Banyan switches. For each input
traffic, its desired output is expressed in terms of a binary number and is appended to the
input traffic as a routing header. At stage 𝑖, the 𝑖 th bit of the routing header is processed to
determine the output of the switch to route this traffic to. For example, an input data destined
for output 4 (100 in binary) upon arriving at input 5 will be routed to the lower output of the
input switch in the first stage and the upper output of the connected switches in the second
and the last stage, effectively moving the data to the correct output.
In this structure, the inputs of the fabric are interconnected to the inputs of the first
stage switches so that consecutive inputs (differ from each other in only 1 bit) are distributed
evenly among the 2 2 switches in the first stage to avoid collision. The connections from
the first stage to the second stage redirect the input traffic to the correct half corresponding
to the outputs where connections are also distributed to prevent collision. Finally, the
connections from the second stage to the last stage redirect the input traffic to the correct
output switch. It can be verified that this structure allows all 8 input packets to be transferred
to their desired outputs simultaneously without collision given that no output is destined by
more than one input and all the input packets are arranged in an ascending order upon
arrival to the Banyan network.
The complexity of this Banyan-based self-routing switching fabric is 𝑂 𝑁 log 𝑁 which
is significantly lower than that of a crossbar structure. Hence, self-routing structure is highly
scalable and is widely used in high-performance switching devices.

9.4 MultiProtocol Label Switching


MultiProtocol Label Switching (MPLS) [1] is a packet switching technique that is
commonly used today in core networks, especially in ISPs and big enterprises. In MPLS,
packets are switched based on short labels that predetermines the paths that the packets
should take through the network. Different from traditional network-layer routing where
network addresses must be read and looked up to determine the next hop, MPLS compresses
all the routing requirements and decisions inside a short label. As such, MPLS can be used to
transport multiple network protocols and allows the coexistence and transport of multiple

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 16


network protocols inside the same network. This feature makes up the “MultiProtocol” part
in its name.

Figure 16: Longest prefix match in traditional IP forwarding.


MPLS was first motivated to enhance the switching capability in packet switching
network. At around 1990, when the traffic demand increased, there was an increasing
demand for bandwidth. While the link bandwidth could be augmented by using better
transmission techniques and transmission media such as optical fiber, the switching
capability of routers lacked behind and became a bottleneck of the network. For instance, in
a traditional IP network, at each router, the output interface for each packet must be
determined by an IP lookup that is based on “longest prefix match” as described in Chapter
5. As illustrated in Figure 16, when there are multiple entries of related prefixes in the
router’s routing table, it is not straightforward to determine which is the longest prefix
match for the destination address. For a long time, this lookup must be done in software,
which introduces a noticeable delay; hence, significantly slowdowns the packet forwarding
at routers. This processing time could be prolonged when additional care in handling the
packets must be considered such as accommodating different dropping priorities. In
addition, these expensive lookups must be done for each packet even though they travel on
the same path, such as when these packets are destined for different nodes in the same
destination subnet. Moreover, these steps must be repeated at each router along the path,
which imposes a compelling in-network processing. As a result, this inefficiency poses a

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 17


severe issue in scalability when routers need to switch an enormous number of packets
between high-speed links.
MPLS answers to this problem by introducing a short, fixed-length label that is attached
to the front of each packet. This label is assigned to a group of packets that share the same
path through the network, which is predetermined before packet forwarding. At each router,
along the path, the output interface can be determined by an exact match lookup, as
illustrated in Figure 17, instead of a choosing between many related matches. In MPLS,
packets are switched based on exact match, the lookup procedure can be implemented easily
in hardware via indexing techniques. The ability of using hardware lookup speedups the
packet processing time significantly; hence greatly improves the router’s performance. In
addition, since the forwarding path is predetermined based on all the requirements,
communication resources can be pre-allocated before the packets are ingested into the
network. Hence, intermediate routers need just to forward the packets based on labels,
without any deep packet inspection or classification latter on at each hop.

Figure 17: Exact match with label switching in MPLS


It is noted that, this first motivation for MPLS is no longer true for modern routers. With
the use of FPGA and ASICs, hardware routing table lookup can now be integrated at each
input interface, allowing wire-speed packet switching. However, MPLS approach of exact
matching of labels is still cheaper for implementation, to some extent.
Despite the irrelevance of performance improvement in the original proposal, MPLS is
still widely used in core network today due to its many benefits. First, the ability of
abstracting network-layer information inside a single label allows the separation of
coexisting traffic on a common network. For instance, in a core network of an ISP, there may

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 18


be many traffic flows coexist from multiple customers. By abstracting each flow of traffic
behind labels, they can be transported on the same network even though they are sharing
the same private IP address space. From the perspective of the ISP, this is an economical
implementation since MPLS can use the same network to transport traffic from multiple
customers simultaneously instead of leasing a line for each connection. From the perspective
of the customers, these connections through the ISP network are transparent and act as if
they were dedicated lines rather than passing through a series of links and routers. This
application is normally referred to as MPLS Virtual Private Network (MPLS VPN).
Second, MPLS can be used to override the routing decisions from traditional routing
protocols. In the traditional routing protocols, only the best path is used for routing of
packets in the network according to some metrics. This strategy of always uses the best route
may not be ideal in many cases. For instance, when multiple non-overlapped paths exist
between the source and the destination but only the best path is used. This approach could
lead to traffic overload on this best path while the other paths are under-utilized. This traffic
unbalance is not desirable because this best path may not have enough communication
resources, such as bandwidth, to accommodate all the traffic flows. By distributing the load
on multiple paths, not necessary the best path, more flows can be supported on the common
core network and the available resources can be utilized in a more efficient and economical
way. On the other hand, it’s never good to put all the eggs on a single basket, if one link or
one device on this best path fails, all the traffic flows through this path will be affected. This
application is often referred to as Traffic Engineering.
9.4.1 MPLS Label Structure and Placement

Figure 18: MPLS label structure.


So far, we mentioned labels many times without explicitly explaining what it is. In this
sub-section, we will investigate the structure of an MPLS label and its position within the
exchanged PDU.
Figure 18 illustrates the structure of a MPLS label [2], which is 4 bytes long and include
4 fields as follows:
 Label (20 bits): this field contain the value of the label. This label is used for
switching decision at the next hop on the path to the destination. The first 16 values, from 0
to 16 are reserved for special uses.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 19


 Traffic Class (TC – 3 bits): this field contains information regarding the QoS level of
the enclosed packet. This information can be used to determine if the packet should be
prioritized or dropped upon arriving at a congested link.
 Bottom of stack (S – 1bit): in MPLS, one label or multiple labels can be used for one
packet. When multiple MPLS labels is used, they are stacked together making a stack of label.
In this case, the last label in the stack will have this bit set to 1 to signify the end of the label
stack. In case the label stack consists of just one label, it is also the bottom of the label stack,
and hence, having its S bit set to 1.
 Time to Live (TTL – 8 bits): this field has the same function as the TTL field found in
the IP header, i.e. to prevent the labeled packet from looping forever in the network. It is
decreased at each hop along the path and if it reaches 0, the packet is dropped. This field can
be configured to propagate the IP TTL, in such case, the length of the internal-MPLS path
would affect the packet lifetime. If TTL propagation is not configured, the propagation of
packets inside the MPLS network is transparent as if the two connecting ends are
communicated directly via a direct line.
Label stacking

Figure 19: MPLS label stack.


In many cases, a single label is not adequate for the intended application, multiple MPLS
label must be used together forming a stack of labels as illustrated in Figure 19. One example
for a stack of depth 2 is when multiple flows of traffic may share the same path through a
common core network. In this case, each flow of traffic is already assigned with a label before
entering the core network to specify their destinations. However, within the core network,
since they share the same path, they can all be merged together in the same flow and be
treated the same way on the core network to simplify traffic management.
As such, at the ingress point of the core network, another label is added to the front of
the existing label, forming a label stack as illustrated in Figure 20. When a label stacked is
formed, the S bit in all labels except for the last one is always set to 0; the S bit of the
innermost label is set to 1. Furthermore, when there are multiple labels in a label stack, the
switching decision always depends on the outermost one. For instance, in Figure 20 when
being switched in the core network, only the outermost label (Label 2) is used. At the egress
point of the core network, this outermost label is removed, making it possible to differentiate
the individual flows again based on inner label of the stack (Label 1). In this way, a label stack
of depth greater than 2 is also possible. In practice, label stacks are used very often to

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 20


combine multiple applications of MPLS, for instance to implement MPLS VPN together with
traffic engineering at the same time.

Figure 20: Example of label stacking.


MPLS Label stack placement
As the MPLS label and its label stack structure is understood, the next question is where
is this label stack inserted inside a transferring PDU? As described above, the label stack is
newly introduced and is totally alien to the existing headers that we already studied.

Figure 21: MPLS label stack placement.


Figure 21 illustrates the placement of the MPLS label stack in an IEEE 802 frame. It can
be seen that the MPLS label stack is inserted between the frame header and the IP header.
To make it possible for the routers to differentiate between a MPLS frame and a normal IP
frame, some amendments to the frame header must be made. For instance, in the case of
Ethernet, a newly defined Type value of 0x8847 is dedicated for specifying the part that
comes after the frame header is the MPLS label stack.
As a MPLS label stack is inserted between the frame and the network header, MPLS is
normally referred to as a layer-2.5 protocol. This half layer mapping may be confusing to
many as all the protocols we investigated so far neatly reside in one or more layers. However,
MPLS is a special case where it is independent of both layer 2 and layer 3. On one hand, MPLS
does not depend on a particular data link technology, and it governs the switching of packets
which spans more than one link; hence, it is not a layer 2 protocol. On the other hand, MPLS
provides an abstraction of network-layer information, which makes it capable of

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 21


transporting multiple types of network protocol traffic; hence, it is not a layer 3 protocol
either. In fact, it is one of the rare cases showing that the layered structure is not the Swiss
knife that can fit everything neatly into its stack.
9.4.2 Packet switching in a MPLS network

Figure 22: Packet switching in a MPLS network.


Before we dive deeper into the world of MPLS, let’s take a step back and revise some
basic concepts and terminologies in MPLS. A MPLS domain includes a contiguous network of
MPLS-enabled routers, namely Label Switch Routers (LSR), which are capable of switching
labeled packets. The LSRs that operate at the edge of a MPLS domain must be able to switch
both labeled and unlabeled packets and are referred to as Label Edge Routers (LER). In a
MPLS domain, packets that are to be forwarded and treated in the same manner are grouped
into a class, naming a Forwarding Equivalence Class (FEC). A FEC can be as general as all
packets that destined for a single destination subnet or a single LER. A FEC can also be as
specific as all packets that flow between two particular hosts that require a certain amount
of communication resources. In MPLS, each FEC is specified with a pre-determined sequence
of LSRs that its packets follow through the network from one LER to another LER. This
sequence of LSRs is called a Label Switched Path (LSP) and is unidirectional.
For each FEC, labels are assigned for each LSR along the LSP. When forwarding packets
along an LSP, LSRs and LERs can do three operations: push, swap and pop. In particular, upon
receiving an IP packet, at the ingress point of the network, the LER encapsulates the received
packets by adding a label according to the packet’s FEC. This action of appending labels to
the incoming packet is referred to as a push operation. The packet is then forwarded to the
MPLS network according to the LSP. Inside the MPLS domain, each LSR along the LSP simply
switches the packet accordingly to the pre-determined path. Each LSR also substitutes the

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 22


incoming label with the pre-allocated label on the next link towards the destination. This
action of changing labels is referred to as a swap operation. Eventually, the packet arrives at
the egress point of the MPLS network. Here, the egress LSR removes the label from the packet
and forwards it out of the MPLS domain. This action of removing the label from the packet is
referred to as a pop operation.
Figure 22 helps to summarize all the above concepts. In this figure, a simplified MPLS
domain consists of 3 routers is illustrated. In this domain, a LSP is setup between LER1 and
LER2 via the path LER1-LSR-LER2. This LSP is setup for the FEC which constitutes of all
packets that destine for the subnet 10.0.0.0/8. Upon receiving an ingress packet that destines
for this subnet, LER1 pushes a label with value 20 on top of the packet and forwards it into
the MPLS network. The LSR switches incoming packets based on only their labels. When LSR
receives the packets with label 20, it looks up into its label forwarding table and learns that
these packets should be switched to interface 1 with label 21. LSR then swaps label 20 with
label 21 and forwards the packets to LER2. LER2 pops label 21 out of the received packet
and then sends the packets out of the MPLS domain.
From the above example, it can be seen that for all LSRs, the expensive longest prefix
match is replaced by a much simpler exact match with the incoming label. In addition, since
the LSRs forward packets base on only labels, they have little knowledge about the network-
layer details. As such, MPLS forwarding is protocol independent, which justifies the
“MultiProtocol” part in its name. Besides, it is noted that the packets that come in or get out
of the MPLS domain may be already-labeled packets. In this case, the pushed label constitutes
a label stack as illustrated in Figure 19. Last but not least, in some implementation, an
improvement, naming penultimate hop popping (PHP) is used. In this mechanism, the
label is popped off at the hop before the egress LER. The rationale behind this mechanism is
that the pop operation is quite expensive and a LER may be the egress point of many LSPs,
which may consume a lot of its resources just for only this operation. As a result, PHP helps
to lift the heavy load on the egress LER by distribute this operation among the last-hop
neighbouring router on the LSP.
9.4.3 MPLS Label Distribution
So far, we simply skip through the issue of label allocation and just explain how labeled
packets can be forwarded in a MPLS network. Curious readers may already ask a question of
how can the LSR in Figure 22 comes to know which label it should use for a particular LSP?
To answer this question, we will study the label distribution of MPLS.
In its essence, label distribution in MPLS refers to the process of binding of labels to each
LSR’s interfaces along an LSP. Recall that each LSP is associated with a FEC, which may be
defined with different constraints such as QoS levels and bandwidth constraints. As such, a
label distribution scheme must be able to capture these constraints and distribute labels on
links which satisfies these constraints. To this end, the MPLS architecture does not constrain
itself with a single label distribution protocol but it allows the adoption of multiple label
distribution protocols such as Label Distribution Protocol (LDP) [3], and MPLS Resource
Reservation Protocol (RSVP) [4]. In this sub-section, we focus on the simplest case where
an FEC is a destination prefix, and the label distribution protocol is LDP.
LDP depends on the routing information obtained by an underlying IGP for label
allocation. In particular, in LDP, two modes of label allocations are defined, namely

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 23


unsolicited downstream and downstream on demand label distribution. In downstream on
demand label distribution, an LSR can explicitly request for a label for a FEC from its next-
hop neighbour on the path to the destination FEC as determined by some routing protocol.
An example of this downstream on demand distribution approach will be illustrated in the
MPLS-TE section. On the other hand, in unsolicited downstream mode, LSRs in a MPLS
network advertise their FEC to label mappings to all of its neighbours without waiting for an
explicit request from their neighboring LSRs.
Figure 23 a) illustrates an example of unsolicited downstream binding follows the
example in Figure 22. When MPLS and LDP is configured, LER2 allocates a label to each of its
prefix in its IGP routing table, such as the prefix 10.0.0.0/8 and advertise this label binding
to all of its directly connected LDP-enabled neighbours. Upon receiving this label binding,
LSR is aware that it can forward traffic to the prefix 10.0.0.0/8 using labeled packet with the
label value of 21. LSR then advertise this knowledge to its all its neighbouring LDP-enabled
neighbours with another label at its convenience, such as 20. Then, when receiving this
advertisement, LER1 knows that it can forward packets destine for the prefix 10.0.0.0/8 to
LSR using labeled packets with a label value of 20. To this point, the LSP between LER1 and
LER2 is successfully setup and labeled packets can be switched using labeled packets with
the assigned values as illustrates in Figure 23 b). It can be seen that this LSP is unidirectional,
i.e., it shows how labeled packets destined for 10.0.0.0/8 can be switched from LER1 to LER2,
but no information is given for the reverse direction. For bidirectional communication,
another LSP must be established by LDP binding from LER1 to LER2 in a similar way. In this
mode of label distribution, the LDP is setup voluntarily from the downstream router (LER2)
to the upstream routers; hence, making up its name.

Figure 23: LDP unsolicited downstream bindings.


It is noted that this prefix-label binding has a local significant on each of the LDP-enabled
routers only and can be reused at other LSR without causing any confusion. For example, the

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 24


LDP binding from the LSR to the LER1 in Figure 23 can reuse label 21 without causing any
confusion in packet forwarding. This label reusability in MPLS allows the realization of a
compact label space of only 20-bit long, instead of a lengthy globally unique addressing
scheme such as IP addresses.
9.4.4 MPLS VPN

Figure 24: MPLS VPNs.


One of the most widely used applications of MPLS is MPLS-VPN, which is used to
transport multiple types of traffic over the same backbone network. MPLS-VPN is commonly
used when a customer wants to connect its many sites together and wants to lease
connectivity service from a service provider as illustrated in Figure 24. To answer to this
need, the service provider can lease a dedicated physical connection for this customer.
However, this solution is expensive for the provider to implement and is not scalable with
the increasing number of customers and their many sites.
A more scalable and economic solution is that the provider builds up a common core
network and allows traffic from multiple customer sites to share this same core network.
However, in doing so, the provider must ensure that there is enough isolation between these
traffic flows via its core network so that a change in configuration of one of its customers
should not affect the other. For instance, the IP addressing plan of one customer should not
affect that of the other, such as when they both use the same private IP addressing plan. In
other words, the provider must support transparent traffic transfer within each offered
connection, as if the two sites are connected together via a dedicated line, regardless of what

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 25


happens on the other connections and the hop length of the actual physical path of the
connection through the core network. At the end of the day, it is not fun to have some traffic
on a connection ends up arriving at the other connection.
Luckily, the MultiProtocol aspect of MPLS is a well fit for this purpose. By abstracting
the transferring traffic behind a label, MPLS can support isolation between multiple types of
traffic on a common network. In addition, each LSP previously mentioned can be considered
as a separate tunnel that ensures the traffic delivery between two arbitrary LERs in the MPLS
core network. The combination of these two characteristics allow the virtualization of the
available resources in the existing network to support the implementation of VPNs over
MPLS. To this end, there are two types of VPN provided by MPLS: layer 2 MPLS VPN and layer
3 MPLS VPN.
Layer 2 MPLS VPN

Figure 25: Layer 2 MPLS VPN data forwarding.


In layer 2 MPLS VPNs, the service provider provides a transparent tunnel from one
customer device to another. From the perspective of the customer, the communication over
this type of VPN is similar to that happens on a dedicated line that is directly connected
between the two ends. As such, this type of VPN is also referred to as pseudo wire
emulation [5]. This is the most straight forward MPLS VPN mechanism to understand and
to implement where no routing information should be exchanged between the provider and
the customer devices.
The basic principle of layer 2 MPLS VPN is illustrated in Figure 25. To provide
transparent data transferring service, the entire layer 2 frame is encapsulated inside a MPLS
packet before sending onto the shared core network (that’s why this type of MPLS VPN is
called layer 2). Then, an LSP tunnel is constructed from the ingress LER to the egress LER to
transport these labeled packets through the core network. To this end, a label stack of two
labels is required to setup a layer 2 MPLS VPN tunnel. The first label is used to forward the
labeled packets within the core network along the pre-determined LSP, from the ingress to
the egress point. This label is referred to as a tunnel label. In addition, at the egress point, the
egress LER can be the termination point of many layer 2 VPNs. As such, when the tunnel label
is stripped off at the end of the tunnel as shown in Figure 25, it is impossible for the egress
LER to know which customer this data should be forwarded to. As a result, a second label is
needed to determine which interface the frame should be forwarded to. This bottom label is

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 26


referred to as the Virtual Circuit (VC) label. As it is associated with the particular customer
circuit, this label can also be thought as a multiplexing/de-multiplexing label at the two ends
of the provider network. The last piece of the puzzle is to make the egress LER aware of the
tunnel label at the ingress LER to ensure the outgoing frame to be sent to the correct
customer device. To achieve this, an explicit LDP neighbouring relationship must be
explicitly declared between the LERs at the two ends. At such, an additional LDP session is
established between the LERs at the two ends of the tunnel. Through this tunnel, the VC label
is distributed and synchronized.
Putting everything together, let’s consider the traffic forwarding between customer
devices through a layer 2 MPLS VPN. At the ingress point, the customer router treats the L2
MPLS VPN connection as if it is a dedicated link and sends the data frames directly as is to
the provider ingress LER. The provider ingress LER pushes two labels to the top of the
received frame. The bottom (VC) label helps to identify the output interface at the egress
LER, which is pre-agreed between these two LERs. The top (tunnel) label is used for
forwarding the packet throughout the MPLS core via a pre-determined LSP. At each LSR
along the LSP, the top label is swapped as described in the previous section. Eventually, the
packet reaches the egress LER at end of the LSP and the top label is popped out. The egress
LER then uses the VC label to determine the correct customer device that it should forward
this incoming traffic to. The egress LER then pops the VC label and sends the encapsulated
frame to the appropriate port.
It can be seen that all provider’s routers except the LERs at the two ends forward
received packets based on the tunnel label where the layer 2 details are not touched. As such,
layer 2 MPLS VPN can accommodate an arbitrary layer technology, such as Ethernet, Frame
Relay, or ATM, on this tunnel where the LSRs may not be able to support such technology
natively. In addition, this type of VPN is inherently simple to configure at both the customer
sites and the provider networks, essentially just a plug and play network deployment. There
are no interactions between the customer’s devices and those of the provider. Consequently,
the two networks are isolated from each other. The service provider’s network is
transparent to the customer while the provider routers have no visibility in the customer
data.
Layer 3 MPLS VPN
In layer 2 MPLS VPN, customers are provided with a simple transparent data forwarding
service, no deeper packet manipulation is possible. Besides, since customer’s sites are
connected together as if they are on the same LAN, the traffic management between sites is
very hard to be done. To support a more scalable solution that can have a finer tuning on the
traffic forwarding, layer 3 MPLS VPN [6] can be implemented.
Different from layer 2 MPLS VPN, in layer 3 MPLS VPN, labels are used to encapsulate
layer 3 IP packets. As a result, the LSRs of the service provider must be aware of the IP
addressing scheme of the customer network and take an active role in maintain the routing
information between the customer’s sites for packet forwarding. However, as with the layer
2 MPLS VPN, a service provider may support multiple customers at the same time, which
may cause a routing interference between its customers. The trick to resolve this problem is
that each LER maintains a separate routing table according to each VPN it carries.
Accordingly, each layer 3 MPLS VPN is associated with an independent routing instance,

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 27


namely a Virtual Routing/Forwarding (VRF) and each LER maintains a separate VRF table
for forwarding decision within this VPN.

Figure 26: Route exchange in layer 3 MPLS VPNs.

Figure 27: Layer 3 MPLS VPN data forwarding.


As illustrated in Figure 26, the interface that is connected to the customer device is
associated to the according VRF. To facilitate IP routing of customer packets at the provider
LERs, routing information must be shared between the customer router and its neighbouring
LER via either an IGP or an eBGP session. The next step is to make this routing information
available at the other LERs on the same VPN. This task can be done by running Multi-Protocol
iBGP (MP-iBGP) between these LERs. MP-iBGP allows the distribution of IP prefixes between
peers even though they are not unique; hence, it supports the routing isolation between
different VRFs and between customers. The use of iBGP to share routing information
between the LERs also minimizes the routing information needed on the LSRs inside the
provider network, which just need to know how to forward traffic between the LERs and not
the routing information of the customer’s networks. An additional subtle step needs to be
done to bind these two together: on the LERs, the routing information from the customer

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 28


must be distributed into MP-iBGP on the according VFR to enable end-to-end delivery of
routing information.
For packet forwarding, an LSP is constructed between the LERs sharing the same VPN.
Similar to the layer 2 VPN, a label stack of two labels is also required for a layer 3 MPLS VPN.
The first label is used to forward the labeled packets between the LERs on the pre-
determined LSP. This label is called the tunnel label. As with layer 2 VPN, the tunnel label is
popped out at the egress LER while extra information must be available for the LER to
distinguish the particular VPN that the packet belongs to. For this purpose, an additional
label, namely VPN label is used. As shown in Figure 27, at the ingress LER, two labels are
pushed into the received IP packet, the bottom label is the VPN layer to specify the VRF that
the packet belongs to and the top label is the tunnel label for packet switching along the LSP.
At the LSR along the LSP, the top label is swapped with the other label according to the pre-
defined label distribution scheme. At the end of the tunnel, the egress LER pops out the
tunnel label and uses the VPN label to determine the according customer’s VPN to forward
the packet to. It then pops out this VPN label and forward the IP packet to the appropriate
customer device.
In the case of layer 3 MPLS VPN, since the provider forward incoming packets from the
customers based on their network layer information, it is a more scalable solution in the long
term than the simple layer 2 pseudo wire approach. In this approach, many more intelligent
services can be supported. For instance, the LER can inspect and differentiate the various
types of ingested traffic and treat each type of traffic differently while forwarding in the
transit core network. The customer can deploy both best-effort and real-time traffic over the
same VPN and depends on the provider to enforce the required QoS for each traffic type on
the VPN. It is recalled that layer 2 VPN treats incoming traffic as a black box and such
intelligent behaviour is not possible. In terms of traffic control, layer 3 VPN is definitely more
efficient than the layer 2 approach as it can evade from various layer 2 issues such as
broadcast storm. In addition, this layer 3 approach can support complex interconnections
between many customer’s sites rather than a flat topology with its layer 2 counterpart.
However, layer 3 MPLS VPN is not without its disadvantages. First, there is a need of
interacting between the customer and provider devices. This requires extra configurations,
which complicate the network deployment and necessitate a decent amount of networking
expertise for a correct implementation. Second, these extra interactions also impose some
constraints in the hardware and software of the customer devices, which potentially increase
the cost. Last but not least, the biggest drawback of layer 3 MPLS VPN is perhaps its
obligation in exposing the internal routing information of the customer network to the
provider to make routing feasible. The customer also depends on the provider to distribute
the routing information between its sites. In many cases, for example, for large enterprises,
this exposure of routing information and this dependency on routing distribution are out of
the question.
9.4.5 MPLS Traffic Engineering
To understand the need for Traffic Engineering (TE), it is best to take a step back and
look at the drawbacks of existing routing protocols. In traditional routing protocol, only one
best path is determined between the source and the destination. This best path is then used
to forward the traffic towards the destination network regardless of the other links in the

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 29


network. This best path selection prioritizes only the traffic from one flow of traffic and
despite its appealing name; it may not the best solution for routing traffic many flows
network wide.

Figure 28: Traffic unbalance with traditional routing.

Figure 29: Traffic steering with traffic engineering.


Figure 28 illustrates a scenario where this best path selection approach leads to
inefficiency. In this scenario, two traffic flows, each of 30Mbps from two pairs of routers
(LER1-LER2 and LER3-LER4) need to transit through a core network made of 5 routers,
LSR1 to LSR5. Assume that the bandwidth of each link in this topology is all 50Mbps. It is
easy to infer from this figure that there are two paths in the core network that can
accommodate the two traffic flows: LSR1-LSR2-LSR3 and LSR1-LSR4-LSR5. Between these
two routes, the LSR1-LSR2-LSR3 route is a shorter route and hence is the best route. Using a
traditional routing approach, the traffic from both flows will be routed according to this best
route. This leads to two issues. First, when always using the best route, the link utilization in
the network is not efficient. In this case, all traffic between LSR1 and LSR3 would be routed
on the best route, leaving the longer route LSR1-LSR4-LSR5-LSR3 idle. Worse, when all

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 30


traffic flows are concentrated on the best path, its offered bandwidth may not be enough to
support all flows, which degrades the provided QoS. In this case, the combined required
bandwidth of both flows is 60Mbps, overwhelms the offered bandwidth of 50Mbps on the
LSR1-LSR2-LSR3, meaning that only one flow can be supported according to its requested
QoS level while unused communication resources are still available on the other path.
Aiming to resolve these issues, TE techniques were developed to override the best path
selection in traditional routing with the ability to steer traffic around the network to
maximize the network utilization and to ensure communication resource availability to end-
to-end traffic flows. As illustrated in Figure 29, if we could somehow steer the LER3-LER4
traffic flow through the second core route LSR1-LSR4-LSR5-LSR3, both traffic flows can be
accommodated and transmitted through the core network with sufficient QoS levels. In this
case, the network utilization is clearly more efficiently utilized as there is no underutilized
or overutilized link.
In MPLS, the ability of switching traffic based on labels makes it a promising candidate
for TE. In the utmost essence, the network administrator just needs to establish LSPs
according to the input flow requirements, then the input traffic would be labeled and
forwarded accordingly. To this end, the most straightforward approach to understand MPLS
TE is to configure these LSPs manually. In this method, the network administrator explicitly
defines the path for each LSP. For instance, in Figure 29, the network administrator can
manually assign an LSP for LER1-LER2 flow as LER1-LSR1-LSR2-LSR3-LER2. This approach
is easy to understand but is clearly not scalable. As the number of traffic flows increases and
the network topology gets more complex, a more automated approach is desired. To answer
this question, MPLS RSVP-TE [4] (or MPLS-TE for short) was proposed as a solution for
automatic bandwidth reservation and label path establishment across an arbitrary network.
To accomplish the above requirements, MPLS-TE must take a few steps as described
below:
 Configuration of available and required communication resource: to lay a
foundation for path determination and bandwidth reservation, each link in the network must
be configured with its available resource for communications and each LSP subjected to TE
must be configured with its requirements for communication resource. The most commonly
used communication resource for MPLS-TE is bandwidth.
 Determination of the shortest path that meets the required communication
resource: a resource-aware routing protocol must be implemented among the routers in the
network, taking into account of the available communication resource in the network and
the requirements of the new flow, to determine a path that the traffic flow could take. To this
end, Constraint‐based Shortest Path First (CSFP) was proposed to extend existing
shortest path first protocols such as OSPF with awareness of communication resource
constraints [7] to basically prune the paths which cannot offer enough communication
resource for the given traffic flow. The resultant path is not necessarily according to the best
path in traditional IGP but rather the shortest path with enough communication resource to
support the traffic flow. It is noted that the calculation in this step must take into account the
resource consumption of the existing traffic flows within the network.
 Label allocation and resource reservation: if such path is available, labels must be
assigned on this path to allow labeled packets to follow the calculated link. In addition,

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 31


communication resource must also be reserved on each of the traversed link to
accommodate this traffic flow. The reserved resource must then be removed from the
available resource on the traversed link to accurately reflect the available resource for the
next flow to use MPLS-TE. To this end, MPLS-TE uses RSVP to signal resource reservation.
As illustrated in Figure 30, RVSP signalling starts at the most downstream router of the
MPLS-TE LSP. This router sends a RVSP PATH message to its upstream router along the
shortest path, which was determined previously. This message includes a request for label,
and the explicit resultant route from the CFSP and the requested resource. This upstream
router forwards this PATH message to the next router on the path. Upon receiving this PATH
message, the router at the end of the route construct a RESV message that includes its
assigned label for the LSP. This message is forwarded in the reverse direction as in the PATH
message, at each receiving router on the way, the router reserves the appropriate resource
required by the traffic flow and assigns a label for this LSP. This process continues until the
most downstream of the LSP receives the RESV message. At this point, the resource for the
traffic flow is successfully reserved and all labels along the LSP are allocated. Traffic is ready
to be transported through this LSP.

Figure 30: RVSP signalling for MPLS-TE.


It is noted that the label allocation is the result of the requests from the downstream
routers, and hence, this type of label allocation belongs to downstream on demand label
distribution method. Furthermore, it is important to see that, as with LSP, this label
allocation and resource reservation is unidirectional. To enable two-way resource allocation,
a similar procedure has to be done on the reverse path. Since communication resource may

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 32


be used differently on each direction of a link, the reverse path may traverse through a totally
different series of routers in comparison with the forward path.
A final remark on the applications of MPLS is that they can be used simultaneously
rather than just individual applications. For example, MPLS VPN can be integrated with
MPLS-TE so that the service provider can support a transparent data transport service for
the customer with a leased bandwidth. At the packet transport level, a MPLS-TE label is
stacked with the MPLS VPN labels to support both services. Once again, it can be seen that
the label stacking mechanism offers MPLS a scalable way to be expanded to support future
applications in a modular way with ease.

9.5 Software Defined Networking


In the previous section, we have looked at many very interesting applications based on
MPLS that are widely used today. However, all these applications are still based on the idea
of distributed intelligence where network devices, despite interacting with each other,
independently act on the incoming traffic. This idea of distributed intelligence is
fundamental to traditional networking and can be traced back to the beginning of ARPANET
when one of the important design criteria was survivability, especially at times of war. In
other words, it is essential that the network can still operate even if a significant part of the
network and devices is no longer available. This design principal was kept ever since till
these days.
As the traffic demand increases and the emerging of more and more network-based
applications, more and more intelligence is needed to be integrated into these network
devices. For example, a handful number of protocols is needed to support MPLS-TE as
illustrated in the previous section. These demands put an unprecedented increasing stress
on the capability of network devices, pushing the current networking philosophy to a
breaking point where future services and applications cannot be integrated easily or may
not even be sustained. In particular, despite not being an exhaustive list, some main
challenging issues with the distributed intelligence design are enumerated as the followings.
First, with the increasing number of network devices in the core network, it becomes
extremely difficult to monitor and manage. For instance, the number of network devices in
an average service provider can easily fall in the range of tens of thousand. With such a huge
number of devices, ensuring the consistency of policies and configurations is an
overwhelming task. For each new policy, the networking team has to log in to individual
device and change its configuration using usually manual methods such as command line
interface. This mechanism makes the deployment of new services a lengthy process and
more importantly, an error-prone one.
Second, when each device learns about the network and acts against the changes
individually, the network response time to changes is slow. Take an example of the best path
finding process, when a change is detected, it must be propagated to all devices in the
network. Then, based on the new information, each device goes through a path finding
procedure such as Dijkstra’s algorithm to determine the best path. After the best paths to all
destinations are determined, they are integrated to the router’s forwarding table so that
packet forwarding can be done accordingly. According to this process, when the number of
network devices increases, the time it needs to propagate the new information also

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 33


lengthens, making the network converges slowly. In addition, the capability of the network
devices can differ greatly from one to another, further delaying the network convergence.
This issue gets worse in the context of data centers where the add/remove/scaling of virtual
machines can take place in the matter of minutes while the network provisioning may take
up to days or even weeks to be ready.
Third, with the current design of distributed intelligence, while it is possible to get a
feasible solution to a network application, it is hard to achieve an optimal performance. Take
the example of traffic engineering application, without a global view of the network status,
local decisions may allocate too much traffic on one path, making inefficient use of network
resources. This behaviour may take place especially when changes occur in the network such
as link failure or with the add/remove/modification of incoming traffic into the network. In
addition, when the network intelligence depends on the individual devices, the difference in
capabilities or implementations among the devices in the network may cause a huge negative
impact on the overall network performance. For example, if there are two versions of a
protocol available, the lack of support of the latest version on one device (because of lack of
RAM, CPU, etc.) may cause the entire network domain to revert to the older version or some
adaptation schemes or configuration tricks need to be applied. Either cases, the resulting
performance is not expected to be optimal.
Last but not least, when the intelligence is built into each network device, as new
network services and protocols are introduced, these devices need to be constantly
upgraded and this task unavoidably becomes more and more complicated. This increase
complexity makes the network devices more and more expensive. This complexity and cost
pose a huge obstacle on the networking innovation where an unsupported device can cease
the operation of the entire service. Moreover, the increasing device complexity makes the
upgrade path more challenging and disruptive as there is no trivial way to have the entire
network upgraded quickly to support the new service with traditional networking.
With all these enormous challenges, it’s perhaps time to steer away from the
incremental addition, take a step back and look for an innovative approach that we can
depend on to build up our future network.
9.5.1 Software Defined Networking - characteristics
Acknowledging all of the above obstacles, a research group at Stanford University
studied an innovative approach to build a networking environment so that experiments,
innovations and testing of new protocols can be executed in a non-disruptive way. The
results were the foundation of Software Defined Networking (SDN) and the birth of the
OpenFlow protocol. This effort was so appreciated not only in the research community but
also by the industry that in 2011, the Open Network Foundation (ONF) [8] was found to
standardize and accelerate the development and adoption of SDN.
Intuitively, it can be seen that despite the various network applications that are
implemented in a networking device, the received data PDU is processed based on a lookup
against the device’s forwarding table, depending on its header fields. As such, the key idea
behind SDN is to separate the device functions into two planes: the forwarding plane and the
control plane. The forwarding plane is just in charge of moving the data PDU between the
device’s ports as fast as possible based on some forwarding table while the control plane is
in charge of the network intelligence that consumes the network current status and

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 34


applications’ requirements to build up the forwarding tables for the forwarding plane as
shown in Figure 31. The forwarding will depend on high-performance switching hardware
while the control plane will depend on computation resources, algorithms and protocols.

Figure 31: The Control plane and data plane within a network device.
This separation untangles the required functionalities and allows innovations to be
executed independently in each plane without affecting the operations in the other plane.
More importantly, the plane separation and the difference in the required resources for the
two planes allow the possibility to move the control plane out of the device to an external
node given that there is an adequate communication mean between the control and
forwarding plane. This communication mean is required so that the forwarding table in the
forwarding plane can be updated with respect to the changes in the network in a timely
manner. Because the control plane is moved out of the device, there is a possibility to gather
all functionalities on the control plane from all of the devices in one place and process all the
network intelligences in a centralized manner. This approach of centralized intelligence and
dumb network devices forms the foundation of SDN concept. In particular, the SDN is
different from the traditional approach of distributed networking in three basic
characteristics:
 Control and data plane separation. The first key difference between SDN and
traditional networking is the separation of control and data planes. To this end, all the
networking logics and intelligence are performed by the control plane. The control plane will
take the network status and perform optimization with respect to the constrains pose by the
traffic requirements on a high-performance computing platform. The outputs of this
operation are the flow tables (SDN equivalence of the forwarding table in traditional
networking) for the data plane. The data plane will run on a high-speed switching hardware
to move the received traffic between its ports based on the flow tables obtained from the
control plane. Based on the flow tables the incoming traffic can be forwarded to a port,
replicated to many ports, or dropped by the network device. In addition, header
modifications can be executed on the data plane to modify existing header fields or even
changing the header structure of the incoming PDU. Examples of the PDU header

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 35


manipulations include decreasing of TTL, changing the header checksum, rewriting of IP
addresses (in NAT), push/pop of VLAN tags and MPLS headers, etc.
 Centralized network intelligence. The second distinct characteristic of SDN is the
centralization of control plane functions to an external network entity. In other words, the
control plane is removed from the network devices and collectively placed in a centralized
node. This centralization is not simply the decoupling the brain from the muscle but opens
the opportunities for realize a more efficient and flexible network.
In particular, the centralization of control plane functions makes it possible for the
“network brain” to have a network-wide view of the entire topology and traffic flows so that
traffic routing can be decided in a globally optimized and systematic way rather than basing
on the fractional views of individual device in traditional networking. This centralized
network intelligence also makes it easy to implement and maintain network policies in a
consistent way and in a timely manner. In addition, the centralization of network intelligence
allows the creation of an abstraction layer from the vendor-specific implementations in the
devices, making network management, configuration and monitoring an easier task. For
example, with this centralization of control plane functionalities, upgrading,
adding/removing of functions can be done easily, quickly and consistently across the entire
network within minutes and without disrupting the current traffic. These capabilities are
rather too complex to the point that they may not be possible to do quickly with traditional
networking. Imagine that you want to change the routing metric for the entire network, with
this centralized approach, it’s possible to implement within minutes while there is no trivial
way to do this quickly with traditional networking. An important note at this point is that the
centralized intelligence of SDN does not necessarily reside on a single physical server on the
network but rather be a logical centralized node. This logical centralized node can be a
collection of servers working as either a single virtual node or multiple nodes at different
places in the network working in a harmonic way.
 Programmable networking. Another unique feature of SDN is the possibility of
making the network programmable. The abstraction of the hardware-specific functions from
the network features opens the opportunity to implement automations of functions in the
network. Functions such as adding VLANs, creating a VPN between endpoints can be
executed within a click of a button rather than logging in and configuring of individual device
in the network. This feature does not only reduce the network provisioning time and reduce
the chance of making configuration errors but also making the network more dynamic and
flexible. The bottleneck in the network can be removed and network provisioning can now
keep up with the dynamicity in data centers. The programmability of the network also makes
it possible to implement new network services and applications entirely based on
programming and roll them out in an ever-increasing pace. It is also possible for network
administrators to flexibly customize existing services to fit their particular circumstance.
These abilities of easy to implement and fast to deploy of network services and applications
lay the foundation to trigger the tide of innovations in the following networking area. In fact,
the “Software Defined” part of SDN comes from this programmability feature.
9.5.2 Software Defined Networking – architecture and basic operations
As shown in Figure 32, the architecture of SDN includes three main components: the
SDN network devices (or SDN switches), the controller and the applications.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 36


Figure 32: SND architecture.
 SDN network devices. These are basically stripped-down version of the existing
networking devices, retaining only the data plane for switching purpose, some flow tables
for forwarding decisions and a programmable interface to interact with the controller. As
the main purpose of SDN devices is data plane switching, these devices are also referred to
as SDN switches. The flow tables consist of decision rules needed for processing and
forwarding the incoming traffic. These rules are normally represented in the form of match‐
action pairs. The incoming traffic is compared against the match conditions of the rules in
the order of the rules’ priorities. A match condition can be expressed in the form of a wildcard
for more generic matching conditions. For example, a match condition could be matching all
traffic destinated to a specific subnet regardless of the protocols or matching all DNS traffic
regardless of the IP addresses. If a match is found, the accordingly action will be executed. If
no match is found, depending the SDN implementation, the incoming traffic will normally be
forwarded to the controller for further analysis. Whenever a match is found, and the
appropriate action is executed, the SDN switch also keep track of the statistics for those rules.
Another important component of the SDN switches is the programmable interface that
allows the interactions with the controller. Through this interface, the SDN switches

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 37


report the network changes and statistics to the controller and receive the flow tables. This
interface is also used to send traffic that does not match any rule in the flow tables to the
controller for further analysis.
 Controller. The controller can be though of a centralized management service and
database that oversee the entire network, maintain a consistent, global view of the network
topology as well as manage the devices, traffic flows and statistics in the network. The
controller comprises of three main components, namely the southbound interface, the
northbound interface and the management service and database modules. The southbound
interface provides the means for the controller to communicate with the SDN devices.
Through this interface, the controller receives information about the changes in the network
state such as link failure, the addition of a link, a node or the receipt of unknown traffic into
the network. This interface is also used for the controller to update the flow tables in the SDN
switches and collect statistics regarding the traffic flows in the network. Currently, the only
opensource, non-propriety southbound protocol available is OpenFlow; we will discuss
more about OpenFlow in the next sub-section. The northbound interface of a controller
facilitates the communication means for different network services and applications to
interact with the network through the controller. Unlike the standardized southbound
interface, there is no standard method for implementing this northbound interface today.
The most common approach to this northbound interface is through the use of
Representational state transfer (REST) APIs. At the core of an SDN controller is the various
management services and database modules. These modules ensure the consistent global
view of the network through tracking of changes in the topology and the availability of the
SDN switches. Some of the worth noticing modules in a controller include device registration,
management and tracking, topology database, flow management and statistics and the
copies of all of the flow tables in the network.

Figure 33: Application – Controller interactions


 Applications. So far, the actual intelligence of the network was not mentioned. This
intelligence is embedded into the various applications. In this text, the network applications
are separated from the controller for the ease of understanding. In practice, these

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 38


applications can be either built-in inside the controller, or resided entirely outside of the
controller. Normally, a common controller is implemented using a hybrid approach with
some basic built-in applications and exposed APIs for external applications to be hooked in.
This modular approach allows future network services and applications to be easily and
quickly integrated into the network without disrupting existing traffic. This method also
permits network administrators to selectively install the right applications only while
removing all the unnecessary ones. Some commonly applications include routing, traffic
engineering, VPN, security and load balancers. An application interacts with the controller
through the controller’s northbound interface. Through this interface, the controller updates
the applications with events in the networks and receives instructions from the controller
through well-defined methods on how to react against these changes. These interactions are
illustrated as shown in Figure 33.

Figure 34: Basic operations in SDN.


With these components in place, the basic operations of SDN can be illustrate in Figure
34 and summarized as follows:
 The SDN switches register and update the physical network conditions to the
controller.
 The controller reports any topology changes to the appropriate applications.
 The applications process the input events received from the controller, do necessary
optimizations with respect to the current network conditions and possibly administrative
controls, then send back instructions to the controllers. The most common instruction is
information on how to modify flow tables on SDN switches.
 The controller then updates the SDN switches with appropriate flow table entries.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 39


 SDN switches examine the headers of incoming traffic with respect to their configured
flow tables and take appropriate actions if a match is found. If no match is found, the SDN
switch will forward the input traffic to the controller for advice on what should be done with
this unknown traffic.
To get a clearer idea of how the several components in the SDN architecture interact
with each other to achieve the goal of a particular application, let’s have a look at two
examples of SDN operations: routing (best path determination) and URL filtering.
SDN operation example – Routing (Best path determination)
Best path determination or routing is perhaps the most commonly found application in
SDN as well as in traditional networking. In traditional networking, the routers in the
network will exchange routing information with each other and then determine the best path
based on some algorithm such as Dijkstra’s algorithm as we already studied in Chapter 5.
Let’s see how such application can be materialized with SDN.
In SDN routing application, the SDN switches send updates about topology changes to
the controller. These changes are updated in local databases inside the controller and then
forwarded to the routing application. The routing application consults the controller for the
network status and process this information to determine the best paths based on some
metrics and policies defined by the network administrator. The routing application then
advices the controller with the best path information for flow table updates. The controller
then forwards these updated flow tables to the SDN switches for traffic forwarding. These
basic operations are illustrated in Figure 35.

Figure 35: Routing application realization in SDN.


In SDN realization, it is noted that the routing updates are sent to the controller only
rather than to other network devices. This approach significantly reduces the update delay
and overhead in comparison to traditional networking such as in link-state protocols where

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 40


an update needs to be propagated to the entire network. In addition, since the best path
determination is done in a centralized manner with the global view of the network, the policy
consistency can be easily guaranteed network wide. Besides, the flow tables of all network
devices are updated coherently, ensuring that the network-wide convergence. In traditional
networking, depending on the update propagation time and the capability of the network
devices, each device may finish its best path determination at a different time, making the
network behaviour unpredictable during this time difference.
Another benefit of SDN in this application is the centralization of configuration
parameters. As shown in Figure 35, examples of these administrative are the routing metric
and the policies. With SDN, changing these parameters such as network metric can be done
quickly and accurately with a click of a button on the application. In contrast, changing of
these parameters in traditional networking requires log in into each device and manually
making necessary changes, a time-consuming and error-prone process.
SDN operation example – URL filtering
The second example illustrate a security application that filter traffic based on its URL.
This is a common security measure to blacklist traffic to harmful or unappropriated sites.
This feature is normally not available in traditional networking devices but can be easily
materialized with SDN. This traffic filtering can be even applied at access points, which is
located near the users, to cut off violated traffic at its source rather than waiting for it to
arrive at a firewall latter in the network.

Figure 36: URL filtering application realization in SDN.


The basic operations of this application are illustrated in Figure 36. The SDN switches
are configured with a flow table to forward all DNS requests to the controller. The controller
then sends these requests to the URL filtering application. The application then consults a
blacklist database to determine that the requested URL should be accepted or not and sends

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 41


the appropriate instructions to the controller for flow tables updates. The controller then
informs the SDN switches to either allow or deny these requests.
9.5.3 OpenFlow
OpenFlow communication channel
With the separation and centralization of the control plane from the data plane, the
communication between these two planes is crucial to the consistent operation throughout
the network and the efficient usage of network resources. To this end, the OpenFlow protocol
was proposed by the ONF for this purpose and to date, it is the only standardized southbound
protocol in SDN. The latest OpenFlow at this time of writing is version 1.5.1. Within the scope
of this subsection, we won’t go into the raw details of OpenFlow, but rather present an
overview and the basic operations of the protocol. Interested readers can refer to [9] for the
complete OpenFlow specifications. It is noted that in this sub-section the OpenFlow
terminologies are adopted where a packet is used to refer to a PDU in general and depending
the context can be understood as a Datalink, Network or Transport layer PDU.
Due to the cruciality of the communications between the controller and its SDN
switches, this communication channel is implemented over TCP with the aide of Transport
Layer Security (TLS) for cryptographic protection. Over this secure communication channel,
three message types are defined in OpenFlow: Controller-to-Switch, Async and Symmetric.
Each message type is then further divided into multiple sub-types.
 Controller‐to‐Switch. Like its name indicates, this type of message is sent from the
controller to its SDN switches. Some main message sub-types in this category include:
o Features: this message sub-type is used by the controller to request the
capabilities of the connected SDN switch and is normally used at the initialization of the
channel. Since different SDN switches may support different OpenFlow versions and within
the same version, each SDN switch may has different set of supported features, this message
is crucial for the controller to ensure a seamless operation on the network.
o Modify‐State: this message sub-type is issued from the controller to manage the
flow tables, i.e. add/remove/modify flow entries and flow tables, at its SDN switches.
o Read‐State: this message sub-type is used by the controller to collect
configurations and statistics from the SDN switches.
o Packet‐out: this message sub-type is used by the controller to instruct the
received SDN switch to forward a packet through a specific port. This message should
include the reference or the full packet under consideration.
 Async. This type of message is sent from an SDN switch to its controller to report
events in the network. Some main sub-types in this category include:
o Packet‐in: this message sub-type is used to send a data packet to the controller.
This message can be a result of a configured action in the flow table or a table-miss where
the packet does not match any conditions in the flow tables of the SDN switch. A Packet-in
message should include the full packet under consideration or the packet headers and a
reference identification if the SDN switch supports packet buffering.
o Port‐Status: this message sub-type is used to inform the controller of a change on
a port, for example, when a port goes down.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 42


o Flow‐Remove: this message sub-type is used to inform the controller when a flow
entry is removed from a flow table. For example, this message can be triggered by an explicit
Modify-State request from the controller or when the timeout of a flow is expired.
 Symmetric. This type of message can be sent in either direction for reasons such as
connection setup, to verify the liveness of an SDN peer, to measure network-related
parameters or to inform about a problem.
OpenFlow flow tables
OpenFlow controls traffic by configuring flow tables at the SDN switches. A flow table
consists of one or more flow entries, each includes a set of matching conditions based on the
incoming packet headers, a set of actions that will be executed when a match is encountered
and a set of statistics counters as described below:
 Matching conditions: these conditions include the incoming port and the headers of
the packets. The OpenFlow 1.0 specifications support 12 header fields that can be used to
match a flow including the input port and the major fields of the VLAN tag, Ethernet, IP and
TCP/UDP headers. This set of conditions is then extended considerably in the recent versions
of OpenFlow and can be used to match virtually any packet header field combination
including MPLS label, IPv6, ICMP headers, etc. The matching conditions in OpenFlow can be
combined with wildcard rules for a more flexible matching. For example, a wildcard match
of 136.208.101.* can be used to match all IP addresses with the prefix of 136.208.101.0/24.

Figure 37: OpenFlow pipeline with multiple flow tables.


 Action set: each matching condition in a flow table is associated with a set of actions
that needs to be executed on that traffic flow. The possible actions can be forward the packet
to one or more ports/to the controller, drop the packet, modify the packet headers (including
adding/removing of headers) or sending the packet to another flow table for further
processing. As a result, the flow tables in an SDN switch can be chained together in a pipeline
manner to allow a more complex and flexible packet processing as illustrated in Figure 37.
For example, upon receiving a multicast or broadcast packet the SDN switch should forward
the packet to a set of output ports but each forwarded packet can be encapsulated in a
different way such as in a bare IP, a VLAN tagged and a MPLS packet. It is noted that the

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 43


packet can be also forwarded/dropped based on the current flow statistics to adopt
mechanisms such as traffic shaping on the SDN switches.

Figure 38: Example of flow table entries.


Figure 38 illustrates an example of flow table entries in which all TCP traffic sending to
136.208.101.0/24 will be forwarded to port 1 while the UDP traffic to the same set of
destinations will be forwarded to port 2. In this way, the two types of traffic can be
differentiated and treated differently such as provisioning with different bandwidth and
delay guarantees. This example shows the capability of OpenFlow to fine-tune the traffic
steering inside the network. Such configuration is not easily done with traditional
networking where traffic forwarding is based primarily on just the destination IP address.
 Statistics: each flow table entry is also associated with a counter to maintain its
statistics. OpenFlow defines a wide range of statistics that can be maintained for each flow
from the number of packets/bytes received to the time since the last received packet.
It is noted that each flow entry can be programmed in a pro-active way rather than
merely static configurations from the controller. For example, a controller can setup the
validity of a flow entry with an idle timeout; if there is no traffic for that flow entry for more
than the idle timeout, the flow entry will be automatically deleted from the flow table.
9.5.4 SDN, network slicing and Network Function Virtualization
So far, we have learnt about the basics, operations and the benefits of SDN. In the context
of modern networking, SDN is quite often accompanied with technologies such as network
slicing and Network Virtualization Function. As such, it is important that we understand the
differences between SDN and these technologies and how SDN can work side-by-side with
these to bring about new benefits to the network operators.
SDN and network slicing
Network slicing is a technique that allows multiple virtualized and independent
networks to reside and operate on the same physical network infrastructure [10], [11].
Network slicing is an economical solution that allows many services to co-exist and share
the same network resources without affecting each other. Network slicing is critical in
modern networks such as cellular networks where multiple traffic types are served with
different QoS levels. Another major application domain of network slicing is in datacenter
environments where several services can be dynamically changed within minutes while the

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 44


predefined Service Level Agreement (SLA) must be met at all times. An important factor of
network slicing is the independence between the coexisting services and traffic. Each
network slice is used to serve a service which operates as if over its own dedicated network,
not being aware of the physical network nor the coexisted services and traffic.

Figure 39: Physical and logical views of network tunneling.


In traditional networking, network slicing is mainly materialized by tunnelling
techniques. In these techniques, traffic is encapsulated inside an external header at the
tunnel entry endpoint and is transferred to the tunnel exit endpoints across a transit
network. At the tunnel exit endpoint, the received traffic is de-capsulated, and the receiver
node receives the data as if it was transmitted directly from the transmitter. In other words,
the transit network is transparent to both the senders and receivers, and they see each other
as if communicating on the same LAN segment. The concept of network tunneling, and its
physical and logical views are illustrated in Figure 39. Example of network tunneling
techniques that we already covered are VLANs, MPLS VPN. Besides, many other techniques
are being used for this purpose including Virtual Extensible Local Area Network (VXLAN)
[12], Network Virtualization using GRE (NVGRE) [13] and Stateless Transport Tunneling
(STT) [14].
Although network virtualization and network slicing are possible, it is neither flexible,
easy nor fast to implement with traditional networking. In particular, while tunneling
techniques can be a transparent to network users, it is hard to guarantee QoS levels for all
the shared traffic. Even when it is properly configured, it is hard to use the existing network
resources in an optimal manner with the distributed intelligence. In addition, these
techniques require manual configurations on the tunnel endpoints, which is error-prone,
slow and hard to manage. For example, in datacenters, one virtual machine can be spun
up/down or scaled in a matter of minutes while network configuration and provisioning may
take days or even weeks, greatly affect the service deployment time.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 45


Figure 40: Network slicing with SDN.
With SDN, network slicing can be better executed than traditional network with three
main advantages: traffic classification and steering, network-wide optimization and ease of
expansion as well as automation. Regarding the traffic classification and steering, traffic
entering the network can be classified and steered with the manipulation of flow tables on
SDN switches. In particular, from the previous sub-section, it can be seen that SDN and
OpenFlow support very fine-grained level of control over every traffic flow entering the
network by using almost any header field from the incoming packet. These traffic flows can
then be labeled with the push/pop or modification of headers such as MPLS labels, or QoS
fields so that they can be differentiated differently in network core with the mix with other
flows. This level of control permits SDN to easily separate traffic flows into slices, steer traffic
from each slice in a flexible and independent manner. These manipulations are ways beyond
that can be supported by traditional networking. Regarding the network‐wide
optimization aspect, with the centralized intelligence and the global, network-wide view, it
is easy for SDN to execute appropriate planning to support the QoS levels of the network
slices in an optimal way. Since the SDN controller can be installed on high-performance
servers, not only existing resource allocation schemes can be executed much faster, but it is
quite feasible to effortlessly deploy highly complex algorithms. Last but not least, since the

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 46


network intelligence is decoupled from the individual devices, it is possible to support
automation at the network-wide level. To this end, the network provisioning is no longer
the bottleneck as it can be easily coupled with the spinning up/down and scaling operations
of virtual machines. As a result, the combination of SDN and network slicing technology, the
network infrastructure becomes an agile entity that can adapt and response quickly to
changes and make an efficient use of its resources with minimal human interaction.
Figure 40 illustrates a network slicing with SDN where the physical network
infrastructure is divided into two network slices. Each network slice is not only different in
the virtual connected topology but also the traffic requirements. For example, slice 1 is used
to serve autonomous vehicles while slice 2 is used to serve telemetry IoT applications.
SDN and NFV

Figure 41: NFVs and SDN.


Network Function Virtualization (NFV) is a closely related technology with respect to
SDN. While SDN focuses on the decoupling of control and data planes and centralized
intelligence, NFV emphasizes on the ability of enclosing network functions and deploy them
on standard, multi-purpose servers. Examples of network functions that can be deployed
with NFV are firewall, load balancer, Intrusion Detection and Prevention, etc.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 47


Before NFV, these functions are normally available on specialized hardware devices. As
a result, it is very costly to integrate these services on the network. A common solution to
this problem is to deploy these devices at only strategic locations in the network. However,
this approach may not be the optimal solution as these deployments have to be trade-off
with the resource usage efficiency. For example, a content filter firewall may be deployed
near the gateway to external network to ensure the filtering of all network traffic at one
place. However, this placement allows the violated traffic to travel all the way through the
entire network to the gateway just to be dropped there, greatly hindering the network
performance. A more optimized way is to deploy this filter function as close to the user as
possible to actively eliminate this traffic before it even enters the network. However, this
deployment leads to plentiful of firewalls to be deployed, unacceptably increase the
implementation cost.
Moreover, these network functions are normally chained together, for example a traffic
flow may undergo antivirus scan, content filtering, Intrusion Detection and Prevention
services and NAT services. With traditional networking, by placing these network functions
at strategic places in the network, all types of traffic should be subjected to the same service
chain despite of their different needs. This issue leads to an inefficient use of these network
functions, unnecessary increase the cost for network deployment and prolonged the delay
that the transit traffic has to suffer.
With NFV, these network functions can be installed on general purpose computing
platforms and easily deployed anywhere in the network. These functions are basically
installed as software that can be flexibly spun up/down and scaled with respect to the
available computing resources. In this way, NFV gives the network administrators the
flexibility to optimize the placement of these functions at where they are needed the most.
For example, antivirus scanning, and content filtering functions can be placed close to the
users to preemtively cut off violated traffic right at its input.
The flexibility of NFV can be further extended by complemented by SDN. With the
presence of SDN, traffic flows can be directed to the right NFV within a click of a button
according to the implemented policies of the administrators. Moreover, the service chain can
be optimized for each traffic flow so that network functions do not need to be over-
provisioned while the processing delay can be reduced to the minimum. In addition, the
software-controllable advantage of the duo SDN and NFV gives the network the agility it
needs to accelerate the time-to-market of new services. For instance, an Internet service
provider can generate revenue with innovative services by providing its customers with a
variable number of on-demand and scalable value-added network functions. These services
can be materialized with just a barebone generic switch serving as the border device at the
customer site while numerous NVFs are allocated on-demand in its core network with a
proper traffic steering over an SDN infrastructure as illustrated in Figure 41. This model does
not only alleviate the border device cost, which is normally expensive due to the need for
complex device in a traditional networking model, but also speed up the service roll off since
almost no configuration nor staff is needed at the customer site as all the intelligence is
centralized in the provider network. To this end, the network device at the customer site is
called a virtual Customer Premise Equipment as only its body (the barebone switch) is
presented at the customer place, its soul (the NFV intelligence) is resided in the provider
network. In addition, with the software-based design in both the NFV and SDN, provisioning

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 48


automation can be implemented to the degree that the customers can freely customize their
services according to their needs and be charged based on their actual usage rather than
some fixed plans that are available today.

9.6 Conclusion
In this chapter, we studied the major switching technologies that support the
forwarding of information in the core network. The first technique that we investigated was
circuit-switching which is suitable for transmitting both analog and digital information. In
particular, we looked at the basic characteristics of this switching technique including circuit
establishment, and dedicated channel usage
The second major switching technique that was studied was packet-switching. In this
scheme, information is divided into small packets before transferring onto the network.
First, a discussion is given on how packet-switching can enhance the data transferring
efficiency in the core network as well as the basic features of this switching technique. Then,
a summary of its two variations, naming datagram switching, and virtual circuit switching is
presented, along with the advantages and drawbacks of each method.
Then, the basic structures for implementing the switching fabric for both techniques are
surveyed and discussed including the concept of blocking and nonblocking and the
elaboration of two broad categories of switching fabric: single stage and multi-stage
switching fabrics.
Next, an overview of MPLS, which is widely used in many core networks today. First, a
discussion on the drawbacks of traditional routing approaches regarding packet forwarding
was given to motivate the development of MPLS with switching decision based merely on
short labels. Second, the MPLS label and its placement were described. Then, based on these
foundations, the packet switching procedure in a MPLS network was elaborated, followed by
a label distribution mechanism supported by LDP. Last but not least, two important and most
commonly used applications of MPLS, namely MPLS VPN and MPLS-TE were studied. For
each application, its basic principles, MPLS signalling and the switching of labeled packets
were summarized.
Finally, the chapter was closed with an introduction to SDN. To this end, the motivation
for SDN was presented by exploring the issues of traditional networking with respect to the
requirements of modern technologies. Then, the key attributes, architecture and
fundamental operations of SDN were described and discussed and compare with the
traditional networking through two network applications of routing and URL filtering. Next,
the southbound OpenFlow protocol was studied to illustrate how the controller can
communicate with the SDN switches to configure flow tables. Last but not least, with the
introduction of new technologies such as network slicing and NFV, it is necessary to learn
how SDN can be integrated and operated in harmony with these. The discussion showed that
not only SDN can work well with these technologies but also it complements their
functionalities to support an agile and reactive network.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 49


Review questions
1. What are the two fundamental methods for data forwarding? Briefly explain them.
2. List the advantages and disadvantages of time-division switching and space-division
switching. Which switching mechanism can provide the trade-off among aforementioned
switching technologies?
3. What is the difference between circuit switching and virtual circuit switching
mechanisms?
4. Briefly express MultiProtocol Label Switching (MPLS). What is the size of MPLS label
structure? List all fields with their sizes inside MPLS label.
5. Why it is necessary to have a two-label stack in Layer 2 MPLS VPN?
6. In Layer 3 MPLS VPN, why different VRF tables must be maintained for each VPN at the
LERs? Illustrate one scenario where Layer 3 MPLS is not possible without separate VRF
tables for each VPN?
7. For the traffic engineering (TE), which steps are required to be taken by MPLS-TE
(MultiProtocol Label Switching Traffic Enginnering)?
8. What are the types of virtual private networks provided by MultiProtocol Label
Switching (MPLS)? Briefly, explain them.
9. List two advantages of traffic engineering in comparison with best path forwarding.
10. What is the store-and-forward mechanism in packet switching? How does it affect the
delay in the network? How can we further reduce the delay?
Problems
1. Briefly explain the circuit switching. Consider a crossbar switch structure with N
input/output ports.
a. For N=4, illustrate the connection between the 2nd input and 3rd output by
drawing the crossbar switch.
b. For N=4, find the number of crosspoints and maximum number of simultaneous
connections.
c. Briefly discuss about the efficiency of the crossbar switch for large N.
2. Draw the 3-stage switching based on Clos network for the data forwarding. When the
input stage consists of 50 crossbar switches for the total input size of 500, calculate the
number of crosspoints for the Clos network. Compare the obtained results for the case of
a crossbar switch.
3. Prove that a Clos network-based switch is non-blocking if 𝑚 2𝑛 1.
4. For supporting communication between a pair of end devices in the time-division
switching, a time-slot interchange (TSI) switching mechanism is used. When five data
channels are time-multiplexed, TSI switching mechanism is applied to the time-
multiplexed signal. Considering TSI interchanges the time-slots as follows: 1  2, 2  5,
3  3, 4  1, 5  4. Regarding the expressed transmission, draw block diagram for the
time division multiplexer/demultiplexer including the TSI switching block.
5. When there are two routers between the source and destination devices, the source
wants to send three packets to destination. For the data transmission in packet-

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 50


switching, the store-and-forward mechanism is utilized. Considering propagation delay
on each links (i.e., between devices and routers), illustrate the timing diagram for the
transmission of three packets.
6. Calculate the end-to-end transmission delay, for the following networks:
a. There are five links between source and destination, when the transmission rate
is 10 Mbps. The propagation delay for each link is calculated as 100ms. Let the
total data size of 1MB, the data is divided into 100 equal size packets. We omit the
header size.
b. Redo the question (a) but in the case each packet is accompanied with a header of
size 10KB.
c. What is the optimal packet size to have a minimum end-to-end transmission
delay. What is the ratio of the payload with respect to the total packet size when
the header size is 10KB.
7. In a packet-switching network, there are five links between source and destination,
where each link has the propagation delay of 200μs. Let the total data size of 4MB, the
data is divided into 100 equal size packets and the header size is neglected. For the
maximum end-to-end delay of 100ms, what is the minimum acceptable data rate?
8. For the packet switching in a MPLS network, please express label switching routers, label
edge routers, label switched paths. Also, specify all three operations handled by them.
9. a. Consider a MPLS LSP as in the image below going through LER1→LSR3→LSR4→LER2.
This LSP is defined for all packets belongs to the subnet 192.168.100.0/24. Assuming that
the labeled packets belong to this LSP are illustrated as shown in the figure along with
the interface number of each router. Find the label forwarding table for this LSP on LER1,
LSR3, LSR4, LER2.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 51


b. Further consider a second MPLS LSP as in the image below going through
LER1→LSR2 →LER2. This LSP is defined for all packets belongs to the subnet 10.0.0.0/8.
Assuming that the labeled packets belong to this LSP are illustrated as shown in the
figure along with the interface number of each router. Given the label overlapping at
LER1 and LER2, can the two LSPs coexist with this label assignment? If no, explain why.
If yes, show the label forwarding table for both LSPs on LER1, LSR2, LSR3, LSR4, LER2.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 52


References
[1] RFC 3031; E. Rosen, A. Viswanathan, R. Callon, “Multiprotocol Label Switching
Architecture”, 2001.
[2] RFC 3032; E. Rosen, D. Tappan, G. Fedorkow, Y. Rekhter, D. Farinacci, T. Li, A, Conta,
“MPLS Label Stack Encoding”, 2001.
[3] RFC 5036; L. Andersson, I. Minei, B. Thomas, “LDP specification”, 2007.
[4] RFC 3209; D. Awduchue, L. Berger, D. Gan, T. Li, V. Srinivasan, G. Swallow, “RSVP-TE:
Extensions to RSVP for LSP Tunnels”, 2001.
[5] RFC 3985; S. Bryant, P. Pate, “Pseudo Wire Emulation Edge-to-Edge (PWE3)
Architecture”, 2005.
[6] RFC 4364; E. Rosen, Y. Rekhter, “BGP/MPLS IP Virtual Private Network (VPNs)”, 2006.
[7] RFC 2370; E. Berger, I. Bryskin, A. Zinin, R. Coltun, “The OSPF Opaque LSA Option”,
2008.
[8] The Open Networking Foundation - https://www.opennetworking.org/
[9] The OpenFlow 1.5.1 Specifications - https://www.opennetworking.org/wp-
content/uploads/2014/10/openflow-switch-v1.5.1.pdf
[10] P. Rost, C. Mannweiler, D. S. Michalopoulos, C. Sartori, V. Sciancalepore, N. Sastry, O.
Holland, S. Tayade, B. Han, "Network Slicing to Enable Scalability and Flexibility in 5G
Mobile Networks". IEEE Communications Magazine, Vol. 55 (5) pp.72–79, 2017.
[11] X. Foukas, G. Patounas, A. Elmokashfi, M. K. Marina, “Network Slicing in 5G: Survey and
Challenges”, IEEE Communications Magazine, Vol. 55 (5), pp. 94-100, 2017.
[12] RFC 7348; M. Mahalingam, D. Dutt, K. Duda, P. Agarwal, L. Kreeger, T. Sridhar, M.
Bursell, C. Wright, “Virtual eXtensible Local Area Network (VXLAN): A Framework for
Overlaying Virtualized Layer 2 Networks over Layer 3 Networks”, 2014.
[13] RFC 7637; P. Garg, Y. Wang, “NVGRE: Network Virtualization Using Generic Routing
Encapsulation”, 2015.
[14] RFC draft; B. Davie, J. Gross, “A Stateless Transport Tunneling Protocol for Network
Virtualization (STT)”, 2014.

© QLD & TLN Chapter 9: Switching Technologies in Core Networks | 53

You might also like