Eng 7
Eng 7
Towards Cloud-
Cloud-assisted
Software-
Software-defined Networking
Frank Dürr
Universitätsstraße 38
70569 Stuttgart
Germany
August 2012
Towards Cloud-assisted Software-defined
Networking
Frank Dürr
Institute of Parallel and Distributed Systems (IPVS)
Universität Stuttgart
Stuttgart, Germany
[email protected]
Abstract
1 Introduction
In this paper, we introduce a novel networking paradigm called cloud-assisted
software-defined networking (Ca-SDN) that combines the strengths of both
cloud computing and state of the art software-defined networking technologies
for the implementation of flexible and highly optimized communication sys-
tems.
The basic idea is to pull out computational complex and memory-intensive
network management operations such as optimized route calculation, the man-
agement of network state information, multicast group management, account-
ing, etc. from the network and to implement them in one or—for instance, to
increase reliability—multiple datacenters (“in the cloud”). The network itself is
only responsible for forwarding packets.
To enable this separation of management and forwarding functionality, we
built on the current trend of software-defined networking (SDN) [14, 17], in par-
1
ticular, the OpenFlow standard [16]. SDN allows for the separation of the con-
trol plane and data (forwarding) plane of switches and routers (since we assume
multilayer switches, which de facto also implement routing functionality, we
only speak of “switches” in the following). The forwarding tables (also called
flow tables) of these switches are configured by an external controller process.
In our Ca-SDN system, this controller is hosted in a datacenter to utilize the
cloud resources for its scalable implementation and to enable the implementa-
tion of optimized routing algorithms. Packets without matching flow table en-
try are forwarded to the controller to calculate suitable routes on demand and
configure the forwarding tables of switches on the calculated path accordingly.
With Ca-SDN, the necessary network state information for route calculation is
also gathered by the controller by querying the switches and stored in the dat-
acenter to build a global view on the network. After setting a flow table entry,
switches perform forwarding of matching packets without contacting the con-
troller. Overall, this results in a network that is centrally managed and controlled
from the datacenter.
Compared to current distributed “network-centric” approaches where rel-
ative complex protocols and functionalities like route calculation are executed
by the routers themselves, this clear separation of management functionality
(implemented in software and executed by the controller in the cloud) and for-
warding (performed by switches in hardware) has clear advantages stemming
either from SDN or the utilization of cloud resources:
• Flexibility. SDN offers a high degree of flexibility since the addition or
modification of functionality only require a modification of the software
implementation of the controller. For instance, novel routing algorithms
can be installed without modifying switches just by adding or changing
routing algorithms executed by the controller. The fact that the controller
is typically implemented using modern standard languages such as Java
or C++ together with the possibility to use the corresponding powerful de-
veloping environments further facilitates rapid changes.
2
• Scalability. Central control naturally raises concerns about the scalability
of the controller. However, the large computational resources of a data-
center promise a high degree of scalability by scaling up to multiple cores
and scaling out to many machines. Moreover, we will show later that cen-
tral control in fact often reduces network traffic compared to distributed
protocols since the distribution of information often is based on flooding
information instead of only forwarding it to a single entity.
Due to these advantages, we envision that Ca-SDN will have great influence
on future network infrastructures and protocols, including the whole range of
networks from local datacenter networks to wide-area networks. The necessary
components and basic standards are already available today. Several providers
offer powerful cloud computing infrastructures, and also network providers can
3
Routing
Routing Service
Service
Datacenter
NetIB
FIowIB
Controller
Controller GrpIB
Control Network
Software-defined
Host Multilayer
Switch
Network
Figure 1: System Architecture.
setup their own datacenters (“private cloud”). Moreover, the OpenFlow stan-
dard enables the implementation of SDN. Companies such as IBM, NEC, and
HP already offer first OpenFlow switches. The example of Google, which has
adopted OpenFlow throughout their networks recently, shows that SDN is ready
to be used in productive systems [9].
However, to enable Ca-SDN some challenges and research questions have to
be addressed. Therefore, besides introducing the concept of Ca-SDN, the main
contribution of this paper is the identification the associated challenges and re-
search questions of Ca-SDN. In order to firm up this discussion and bring it to
a technical level, we introduce a concrete communication system that benefits
from Ca-SDN, namely, a cloud-assisted IP multicast service.
The rest of this paper is structured as follows. We start with a description
of an architecture for CA-SDN systems in Section 2 using the concrete exam-
ple of a cloud-assisted IP multicast system. Then, we describe the routing pro-
cess, group management, and path calculation and optimization of the cloud-
assisted IP multicast system in Section 3. Finally, we summarize the identified
research questions in Section 4, and conclude the paper with a summary and
outlook onto future work in Section 5.
2 System Architecture
In this section, we present a system architecture for Ca-SDN systems. Most of
the described components and functions are generic. However, we will also
show some specific components and give specific descriptions for our Ca-SDN
example cloud-assisted IP multicast.
The components or our system are depicted in Figure 1. Hosts are the
senders and receivers of communication flows. As one goal, we want to keep
the current protocol stack implementation of hosts unmodified since otherwise
4
problems with a large number of legacy systems would arise. In our IP multicast
example this means that we only rely on standard protocols like UDP/IP or the
Internet Group Management Protocol (IGMP [4]).
Switches are responsible for message forwarding. We assume that these are
multilayer switches that can interpret layer 2 to layer 4 header information to
identify communication flows. According to the OpenFlow standard, flows can
be identified by a 10 tuple including, for instance, source/destination MAC ad-
dresses, IP addresses, port numbers, or VLAN ids. For IP multicast, we can de-
fine source-specific distribution trees using destination multicast IP address and
sender IP address, or shared trees using only the destination multicast IP ad-
dress.
Furthermore, we assume that switches implement the OpenFlow protocol.
In particular, the flow tables can be modified by an external controller process,
which in case of Ca-SDN runs on a host in a datacenter as already mentioned.
For our multicast example, we assume a single controller that is responsible for
all switches of the network. The controller is connected to the switches through
a control network using TCP and optionally TLS. This can be either a separate
network, or control messages are forwarded “in-band” over the data network.
The configuration of flow tables can be done either proactively or reactively. For
instance, in the case of multicast, routes (distribution trees) and corresponding
flow table entries can be installed proactively when hosts join a multicast group.
Or they can be installed reactively “on the fly”, when a switch receives a multicast
packet without matching flow table entry. In the later case, when the first packet
of a flow without matching flow table entry arrives at a switch, the switch sends
it to the controller. The controller calculates a suitable route for this flow, and
updates the flow tables of the switches along the calculated path—or multiple
paths in case of multicast distribution trees—accordingly. Subsequent packets
of the flow are forwarded at line rate without the need for contacting the con-
troller again.
We only consider the network of one operator. This could be a datacenter
network—in this case, the cloud network is managed by a controller residing in
the same cloud—, or up to a complete autonomous system. Question on net-
work virtualization to offer the “network as a service” to the customer or to cre-
ate a virtual network spanning multiple providers are beyond the scope of this
paper, and we refer to the literature for these questions [13, 21, 12]. Ca-SDN as
presented in this paper is rather focused on how a network operator can manage
and optimize his network. Although we only consider interior gateway routing,
we want to note that logically centralized network management has also proven
to be beneficial for BGP using central Routing Control Platforms (RCP). For in-
stance, an RCP based on OpenFlow was proposed in [20] that also utilizes central
data stores to manage network information similar to the central management
of network information in Ca-SDN.
Besides the controller, several other services are hosted by the datacenter.
The network information base (NetIB) stores information about the configura-
5
tion of the network. In particular, it contains information about the physical
network topology. To determine the topology automatically, the Link Layer Dis-
covery Protocol (LLDP [10]) can be used. Besides connectivity information, the
NetIB also manages additional link information such as link capacities, the load
of links, and latencies. This advanced information is a prerequisite for calcu-
lating optimal routes or routes fulfilling given QoS guarantees. With OpenFlow,
switches maintain several counters, for instance, for the number of received and
sent bytes per port, which are queried by the controller to determine the dy-
namic link state. The result is a global view on the network topology and link
state.
The flow information base (FlowIB) contains information about each flow.
On the one hand, it stores the path of each flow using a global view on the flow
table entries of each switch (global forwarding information base). On the other
hand, it manages traffic statistics for each flow, in particular, flow data rates.
Again, this dynamic information can be collected by querying the counters of
switches.
The group information base (GrpIB) stores multicast group membership in-
formation. In more detail, it stores information about which switch has group
members directly connected to a switch port. Theoretically, the GrpIB could also
manage group memberships of individual hosts. However, as shown below it is
sufficient for our approach to know at which port of a switch group members
are connected. For this purpose, we can use the standard IGMP protocol, thus,
no changes to hosts are required. Moreover, we will show below that switches
do not have to implement IGMP since group management is moved to the con-
troller in the datacenter.
In order to reduce the latencies when updating or querying these informa-
tion bases, they are kept in main memory, possibly replicated on different ma-
chines running processes of the routing service (see below) to increase scalabil-
ity and for fast error recovery in case of host failures.
The routing service is responsible for calculating routes of flows, i.e., distri-
bution trees in the case of multicast. Routes can be calculated according to dif-
ferent optimization goals and QoS constraints (cf. Section 3.4). The routing ser-
vice has access to all the information bases mentioned above. Moreover, several
routing processes can run in parallel on one machine (with several processing
cores) or separate machines in order to reduce the latency of route calculation
and to scale with the number of flows needing route information.
Besides these core services, there can be further auxiliary services (not
shown in the figure). For instance, an accounting service could track the con-
sumed network resources of individual flows based on flow counters. Messages
exceeding a certain quota could be interrupted by simply removing flow table
entries at the source switch. Or according to the common cloud pricing model
“pay as you go” the sender could be billed based on the actually induced network
traffic as already motivated. Although we do not go into further detail on how
such auxiliary services could be implemented, the central availability of the nec-
6
essary information in a single datacenter (possibly also storing the master data
of the provider such as customer data) facilitates the implementation of such
services.
7
3.2 Reactive Routing
Next, we describe the cloud-assisted routing process. Following our principle
not to modify the hosts, the first step in sending a multicast message is similar
to traditional IP multicast. The sending hosts maps the IP multicast address to
a MAC layer multicast address and sends the multicast packets using MAC layer
multicast to the local switch.
Let us assume that a reactive routing strategy is used for installing flow ta-
ble entries, and this is the first packet to this multicast group arriving at the the
switch. Since the switch does not have a flow table entry for this destination
IP multicast address, it forwards the packet to the controller. The controller is
in charge of identifying group members, calculating a distribution tree, and in-
stalling the distribution tree by configuring the switches’ flow tables along the
calculated tree. Obviously, these steps are time critical since as long as the dis-
tribution tree is not installed, the switch cannot forward packets and relays them
to the controller which might result in an overloaded controller or control net-
work if routing takes too long.
The identification of group members is done using the global GrpIB. A
lookup with the multicast address of the packet yields all switches having di-
rectly connected group members. These are the terminal nodes of the distri-
bution tree. This lookup can be implemented efficiently in practical constant
time using a trie with sub-microsecond lookup latency. Moreover, a parallel im-
plementation using multiple machines for different requests is straightforward
since lookups of different requests are independent.
Knowing the terminal nodes, the distribution tree can be calculated using
topology and flow information from the NetIB and FlowIB, respectively. Since
this is one of the most interesting steps w.r.t. efficiency and possibilities for op-
timization, we dedicate Section 3.4 to this topic. The controller updates the flow
tables of switches along the distribution tree accordingly. Moreover, for switches
with directly connected group members, actions to perform a MAC-layer multi-
cast for the respective ports have to be installed on the switches to re-write the
destination MAC address to the MAC-multicast address corresponding to the
IP group address (if it is guaranteed that the destination MAC address does not
change during forwarding, this step can be left out since the sender already used
the corresponding MAC-layer destination address). Then, hosts will receive the
multicast packets as usual per MAC-layer multicast.
As advantages compared to protocols based on flooding & pruning like
DVMRP [22] or PIM-DM [1], no multicast packets have to be flooded throughout
the network. Compared to link-state protocols like MOSPF, the distribution tree
is calculated only once rather than (redundantly) at every multicast router along
the distribution tree. Also no rendezvous or core routers have to be managed in
contrast to CBT [3] or PIM-SM [6] (the controller knows the complete distribu-
tion tree anyway). In general, the centrally calculated tree can be optimized in
various ways based on global network and flow state, and sophisticated algo-
8
rithms as shown in Section 3.4. It is even possible to reconfigure the multicast
tree completely “on the fly”, changing, for instance, from a source-based tree to
a shared tree, just by centrally updating the flow tables. Finally, the OpenFlow
switches do not need to implement any multicast routing protocol at all.
9
is minimal. Such constrained optimization problems are particular useful for
ensuring QoS parameters.
Routing can even be changed on the fly by switching from one distribution
tree to another during message forwarding. This is essential to adapt to dynamic
network state such as the changing load of links to satisfy the optimality objec-
tives or QoS constraints. However, instead of just re-calculating the tree with the
same routing algorithm, SDN has the flexibility to complete change the strat-
egy of tree calculation and the associated routing algorithm dynamically. For
instance, when the number of source-based flow table entries grows too large
in case of a source-based distribution tree, it could fall back to shared trees with
only one entry per group.
Moreover, the resources of the cloud can be used to calculate more optimal
routes than would be possible with comparable weak routers. For instance, as-
sume that the minimization of network load is the optimization goal. The opti-
mal tree w.r.t. network load is a minimum Steiner tree. However, for larger net-
works calculating the exact solution is infeasible in general since the Minimum
Steiner Tree Problem is an NP-complete problem. Due to the complexity of this
problem, common multicast routers fall back to simpler algorithms like calcu-
lating a single-source shortest path tree using Dijkstra’s algorithm and pruning
the branches that do not lead to terminal nodes. Given the processing resources
of a datacenter, more optimal heuristics for this problem could be applied [18].
This requires algorithms tailored to the large number of processing cores and
machines available in the datacenter. For instance, the enumeration algorithm
describe in [18] calculates a number of minimum spanning trees (MST) for dif-
ferent subnetworks of the complete network. Calculating these MSTs in parallel
on different cores or machines is straightforward since these calculation are in-
dependent.
Moreover, it might be desirable to strictly limit the runtime of such calcu-
lations. In particular, if a reactive routing strategy is applied, the calculating
should be in the sub-milliseconds range. The mentioned enumeration algo-
rithm is a good example for an algorithm where this could be achieved. Instead
of calculating the complete set of MSTs, one could set a deadline for the calcu-
lation of MSTs. If this deadline has passed, the final result (Steiner tree) is cal-
culated based on the MSTs calculated so far. Given a weaker deadline or larger
number of compute resources, this heuristic yields more optimal results. There-
fore, we can trade-off optimality for runtime. In particular, for high volume data
flows and/or proactive routing strategies, a better result and longer runtime pays
off.
Beyond optimizing single flows individually, we can also consider global op-
timization problems involving all flows together. As a realistic example, consider
the following constrained optimization problem: Maximize the overall network
throughput under max-min fair bandwidth allocation for each flow. This prob-
lem has been studied in [5] for ATM networks and unicast flows (similar to TCP
flows which also achieve fair bandwidth allocation), however, similar problems
10
could also be defined for multicast communication. Although TCP is known
to achieve fair bandwidth allocation on links traversed by the flow, usually the
route of flows is assumed to be given. In [5], the authors show that the overall
throughput can be increased significantly by calculating optimized route. Also
this problem is a (NP) hard problem [11] that could benefit from advanced opti-
mization algorithms and the compute resources of the cloud. In [2], the authors
demonstrated that central global route optimization using advanced optimiza-
tion algorithms (in this example, a simulated annealing algorithm) can be used
to optimize also larger networks (in this example a larger datacenter network)
online.
• Only strive for eventual consistency, which is often the consistency seman-
tic guaranteed by distributed routing protocols. Obviously, if no further
changes are induced, eventually the network again reaches a consistent
state where the last (and correct) distribution tree is installed in the net-
work. Since in general packet forwarding is not required to be loop-free,
loss-free, etc., during transitions, one could argue that the resulting prob-
lem anyway have to be handled by the transport protocols or on the appli-
cation layer, for instance, by detecting duplicates, retransmitting packets,
etc. In other words, ensuring correct behavior during transitions is only
an optimization w.r.t. eventual consistency.
11
in [19]. Such mechanisms are facilitated by the global view and central
control as available in Ca-SDN.
12
Internet
77x10 Gbps
Aggr. Switch Aggr. Switch ... Aggr. Switch 144x Aggr. S.
77x10 Gbps
2x10 Gbps
ToR Switch ... ToR Switch 5184x ToR. S.
1 Gbps
Host ... Host Host ... Host
layer 2 header, 20 bytes IP header, 20 bytes TCP header) per packet. The ap-
plication layer protocol part of an OpenFlow port statistics request consists of
S r eq = 32 bytes (8 bytes general header, 16 bytes status request header, 8 bytes
port request header). The application layer part of an OpenFlow port statis-
tics response for one switch with p = 144 ports consists of S r es = 15000 bytes
(8 bytes general header, 16 bytes status response header, 104p bytes response
body). Note that this response contains more information than only the bytes
sent and received since currently the OpenFlow standard returns all traffic statis-
tics per port including number of received/sent packets, packet drops, etc. Thus,
the overhead could be further reduced by implementing more selective query
mechanisms in the OpenFlow standard. Overall, this leads to the following total
data rate r = 1.8 Mbps at the central controller assuming a maximum transmis-
sion unit of M = 1500 bytes and A = 144 Aggr switches:
S r es
r = 8A(H2−4 + S r eq + S r es + H2−4 )q [bps]
M − H3−4
13
103,680 hosts (assuming several virtual machines connected to a virtual soft-
ware switch on each physical host, this number even increases more). Therefore,
how to optimize the collection of global network state remains another research
question.
4 Research Questions
In this section, we summarize the research questions identified in the previous
sections. Some of these questions are relevant in general for SDN, and some in
particular for Ca-SDN:
• Minimizing flow table size. The number of flow table entries is limited
by the TCAM size of the switches (typically, up to 150,000 entries). There-
fore, the number of flow table entries needs to be minimized at least for
larger networks such as the large datacenter network used in the example
above. Considering the multicast example, shared trees consume less en-
tries than source-based trees. Since for some applications source-based
trees might be better suited due to the optimization criteria (for instance,
minimum latency using a single-source shortest path tree), one has to de-
cide carefully, when to use which kind of distribution tree to find a trade-
off between satisfying application requirements and the minimization of
TCAM space. With respect to the routing strategy, the central question is,
which flows to install proactively (possibly risking to waste flow table en-
tries for inactive flows), and for which a reactive strategy (using flow table
entries only on-demand) is sufficient? This also affects the next question,
the low latency setup of flows.
• Low latency flow setup. In particular for the reactive routing strategy,
achieving a low latency for setting up flow table entries is essential. There-
fore, how to utilize the resources of the datacenter for fast routing through
parallel algorithms becomes an interesting question. If very low latency is
required, a proactive flow setup strategy might be the better choice. How-
ever, this required a priori knowledge of the communication relations. We
have seen that for IP multicast such a primitive to signal the intent to send
to a certain group is missing.
14
global information must not outweigh the benefits of global optimization.
Therefore, the first question is, what is the best trade-off between the gran-
ularity of collected state and the resulting benefit? This question has to
be answered individually, depending on the optimization objective and
the application requirements. We have also mentioned the possibility to
increase the efficiency of collection by using more controllers co-located
with the switches and/or eventing mechanisms for traffic statistics.
15
gies. By outsourcing complex management functionality to powerful datacen-
ters and using OpenFlow switches for fast flow-based forwarding, we tried to
achieve a clear design that benefits from the strengths of both technologies. The
benefits of Ca-SDN include high flexibility; fast low-latency forwarding; high
scalability; optimization of the network w.r.t. to various objectives; ease of ad-
ministration; and cost reduction. Moreover, we presented the design of a cloud-
assisted IP multicast service to showcase the benefits, challenges, and research
questions of Ca-SDN.
As part of ongoing work, we are implementing the introduced cloud-assisted
multicast system and different route optimization algorithms to explore the
practical limits and gains of Ca-SDN.
Moreover, we are going to investigate how different communication systems
such as advanced group communication systems (semantic multicast, geocast,
etc.), or event and publish/subscribe middleware can be implemented using
SDN in general, and Ca-SDN in particular. We argue that these systems, which
are typically implemented in overlay networks today, can increase their perfor-
mance significantly by using Ca-SDN. A specific question for these systems is
how to map their application-specific addresses to flows to utilize the efficiency
of switches for forwarding.
References
[1] A. Adams, J. Nicholas, and W. Siadak. Protocol independent multicast –
dense mode (PIM-DM). IETF, RFC 3973, Jan. 2005.
[3] T. Ballardie, P. Francis, and J. Crowcroft. Core based trees (CBT). SIGCOMM
Comput. Commun. Rev., 23(4):85–95, Oct. 1993.
16
network. In Proceedings of the ACM SIGCOMM 2009 Conference on Appli-
cations, Technologies, Architectures, and Protocols for Computer Communi-
cation, pages 51–62, Aug. 2009.
[10] IEEE Computer Society. 802.1AB – Station and Media Access Control Con-
nectivity Discovery, 2005.
[11] J. Kleinberg, Y. Rabani, and E. Tardos. Fairness in routing and load balanc-
ing. Journal of Computer and System Sciences, 63(1):2–20, 2001.
[15] J. Moy. Multicast extensions to OSPF. IETF, RFC 1584, Mar. 1994.
[18] H. J. Prömel and A. Steger. The Steiner Tree Problem. Vieweg, 2002.
17
Workshop on Hot Topics in Software Defined Networking (HotSDN ’12), Aug.
2012.
18