0% found this document useful (0 votes)
15 views18 pages

Accurate and Efficient Monitoring For Virtualized SDN in Clouds

Uploaded by

Aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

Accurate and Efficient Monitoring For Virtualized SDN in Clouds

Uploaded by

Aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 1

Accurate and Efficient Monitoring for


Virtualized SDN in Clouds
Gyeongsik Yang, Member, IEEE, Yeonho Yoo, Minkoo Kang, Heesang Jin, and Chuck Yoo, Member, IEEE

Abstract—This paper presents V-Sight, a network monitoring framework for programmable virtual networks in clouds. Network
virtualization based on software-defined networking (SDN-NV) in clouds makes it possible to realize programmable virtual networks;
consequently, this technology offers many benefits to cloud services for tenants. However, to the best of our knowledge, network
monitoring, which is a prerequisite for managing and optimizing virtual networks, has not been investigated in the context of SDN-NV
systems. As the first framework for network monitoring in SDN-NV, we identify three challenges: non-isolated and inaccurate statistics,
high monitoring delay, and excessive control channel consumption for gathering statistics. To address these challenges, V-Sight
introduces three key mechanisms: 1) statistics virtualization for isolated statistics, 2) transmission disaggregation for reduced
transmission delay, and 3) pCollector aggregation for efficient control channel consumption. The evaluation results reveal that V-Sight
successfully provides accurate and isolated statistics while reducing the monitoring delay and control channel consumption in orders of
magnitude. We also show that V-Sight can achieve a data plane throughput close to that of non-virtualized SDN.

Index Terms—Distributed systems, Network management, Network monitoring

1 I NTRODUCTION

N ETWORK virtualization (NV) is a vital technology in


datacenters [2]. NV creates virtual networks (VNs)
for tenants based on a single physical network infrastruc-
service quality of the applications. However, due to the
restricted programmability of overlay networking, such op-
timizations are hindered in clouds [5]. Consequently, pro-
ture and isolates network traffic between the VNs. Because grammable VNs have been identified as a critical missing
tenants require isolated network connections between their component for NV [9], [10].
computing nodes, such as virtual machines (VMs) and con- Fortunately, software-defined networking (SDN) pro-
tainers, NV is widely deployed in cloud datacenters [3], [4]. vides a new path for NV [5]. SDN is a network system
To implement NV, overlay networking of TCP/IP network structure that splits the network control and packet forward-
stacks is commonly used. Overlay networking distinguishes ing functionalities. SDN centralizes the network control
the packets of multiple tenants with a tenant identifier (TID) functions into software (SDN controller). Because multiple
attached as an additional encapsulated header. tenants are present in clouds, each tenant can have its own
However, overlay networking has a critical SDN controller (tenant controller)1 , which leads to SDN-
shortcoming—it does not allow tenants to configure or based NV (SDN-NV). One of the SDN-NV architectures
program their VNs [5] because the underlying network utilizes the network hypervisor [5], [11], [12], [13] that sits
resources (e.g., switches, ports, and links) are solely between the physical network and the tenant controllers.
determined by datacenter operators. Therefore, tenants Network hypervisors support VN abstractions, such
cannot install their desired network policies (e.g., flow as virtual switches, links, ports [11], and addresses [13].
entries for packet forwarding or redirection to a proxy) in With the abstractions provided, SDN-NV can provide pro-
an arbitrary VN switch. In addition, tenants cannot create a grammability to tenants [13]. In other words, each tenant
VN topology between their VMs or containers as required. can have a virtualized SDN so that it can create its own
This limitation translates into a severe problem because VN topology and program its VN using SDN controllers
many applications demand in-network optimizations (e.g., (e.g., POX [14], ONOS [15], or OpenDayLight [16]). There
OpenFlow [6] and P4 [7]) or their own network architectures have been advances in network hypervisor technology that
(e.g., information-centric networking [8]), which requires enhance the scalability [17], [18] and flexibility [19].
programmable networks. Their objective is to enhance the Nevertheless, to the best of our knowledge, no study
has discussed network monitoring for SDN-NV (details in
• Gyeongsik Yang, Yeonho Yoo, Minkoo Kang, and Chuck Yoo are with §2.4). Network monitoring is a vital prerequisite for VN
the Department of Computer Science and Engineering, Korea Univer- management in providing statistics. For example, gathering
sity, Seoul, Republic of Korea, 02841. E-mail: [email protected],
[email protected], [email protected], [email protected]. the processed volume of traffic for each flow entry or port
(Corresponding author: Chuck Yoo.) is a basis of link utilization for network management, such
• Heesang Jin is with the Blockchain Research Section, Electronics and as QoS routing, network planning, and anomaly detection
Telecommunications Research Institute (ETRI), Daejeon, Republic of Ko-
rea, 34129. This work was performed when Heesang Jin was a graduate
1. Typically, SDN controllers (e.g., POX, ONOS, OpenDayLight) are
student at Korea University. E-mail: [email protected].
used as tenant controllers. Thus, throughout this paper, we use the
• A preliminary version of this paper appeared in the proceedings of IEEE
terms “SDN controller” and “tenant controller” interchangeably. We
INFOCOM 2020 - IEEE Conference on Computer Communications [1].
use the term “SDN controller” for the context of non-virtualized SDN,
Manuscript received XX; revised XX. and the term “tenant controller” for SDN-based NV

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 2

[20], [21], [22], [23], [24]. Specifically, a research paper of Tenant 1 Tenant 2 Tenant
controller controller controllers
Microsoft has reported that despite the significant difficul-
ties associated with VN monitoring, monitoring of VNs is Network
VN 1 VN 2
necessary because it is key to handling faults in network S1 S2 S3 S5 S6 S1 S6
hypervisor
infrastructures and providing performant services to tenants
by detecting overheads for each tenant [25]. Despite its such
conspicuous importance, network monitoring has received
Physical
relatively little attention in SDN-NV studies. S1 S2 S3 S5 S6 network
To address such problems, this paper presents V-Sight,
a comprehensive network monitoring framework for SDN-
S4
NV. As the first framework for network monitoring, V-Sight
faces three main challenges: 1) inaccurate statistics, 2) high
Fig. 1. SDN-based network virtualization.
monitoring delay, and 3) excessive control channel traffic
consumption. First, because tenant controllers attempt to
pCollectors, pCollector aggregation attempts to merge the
optimize and manage their VNs based on statistics (e.g.,
pCollectors to ensure that multiple pStatistics are retrieved
the volume of traffic processed by a flow entry for rout-
with a single request message, which reduces the number of
ing), accurate statistics should be provided. In SDN-NV, the
messages and thus control channel consumptions.
statistics collected in the physical network are the aggregate
In short, this paper accomplishes the followings:
of multiple VNs running on the network, but there is no
mechanism that isolates the statistics for each VN. • Identification and formulation of three key challenges

Second, the SDN-NV system inevitably increases the for network monitoring in SDN-NV systems: statistics
delay (so-called transmission delay) between the statistics isolation, monitoring delay, and control channel con-
request from a tenant controller and the reply from switches. sumption for network hypervisors.
When the statistics request message arrives at the network • Introduction of the new concepts, namely, statistics

hypervisor from the controller, the network hypervisor must virtualization, transmission disaggregation, and pCol-
send the corresponding network statistics request messages lector aggregation.
to the physical network (switches) and wait to receive the • Full system implementation of the framework as an

results, thus increasing the transmission delay. For example, open-source software.
if a tenant controller sends a request for “all flow entries • Comprehensive experiments that result in 1) improve-

of a virtual switch,” the transmission delay can be high ment in vStatistics accuracy by three orders of mag-
because the individual flow entries’ statistics are collected nitude, 2) up to 454 times reduction in transmission
sequentially. Our experiment shows that the transmission delay, 3) up to 1.9 times improvement in control channel
delay increases by up to 333 times compared with that of a consumptions, and 4) 5.5 times variance improvement
non-virtualized SDN (§2.3.2). The increased delay causes the in TCP throughput in a practical usage scenario.
collected statistics to be out-of-date; thus, a careful design to The remainder of this paper is organized as follows. §2
reduce such delay is required. describes the background and challenges of network mon-
Third, the network hypervisor consumes control channel itoring in SDN-NV. §3 provides the fundamental concepts
traffic excessively compared with a non-virtualized SDN. In and the complete design of V-Sight, and §4 presents the
our experiment, control channel consumption increases by evaluation results. §2.4 elaborates on related work, and §5
up to three times (§2.3.3) when the tenant controller asks discusses future research directions. Finally, §6 concludes
for the statistics of all the flow entries per switch. This this paper.
high consumption is because the network hypervisor has
to send multiple messages to switches. Considering that
such messages go through the control channel, other traffic 2 BACKGROUND AND M OTIVATION
is affected [26]. For instance, our experiment finds that the Here, we explain the background of this study: SDN-NV
flow entry installation time increases by 4.3 times due to the and network monitoring. Then, we identify challenges for
control channel consumption in retrieving the statistics. the network monitoring framework in SDN-NV systems. In
V-Sight addresses the above challenges through three addition, we comprehensively explain the related work and
key mechanisms: 1) statistics virtualization to isolate statis- the differences of this study.
tics per VN, 2) transmission disaggregation to reduce trans-
mission delay, and 3) pCollector aggregation to reduce con-
trol channel consumption. Statistics virtualization (§3.2) iso- 2.1 SDN-based Network Virtualization
lates the virtual network statistics (vStatistics) per VN from SDN-NV comprises three layers (Fig. 1), namely, tenant
physical network statistics (pStatistics). Transmission disag- controllers, network hypervisor, and physical network (PN).
gregation (§3.3) uses caching of frequently used pStatistics. A tenant refers to a user or a group of users who share the
The caching is performed by a pCollector that retrieves authority for using the given resources provided by a cloud.
the pStatistics routinely and stores the data in the network We denote a physical network connecting the servers of a
hypervisor, which removes the delays for pStatistics trans- datacenter as PN. Based on the PN, each tenant provides
mission. Further, we design pCollector aggregation (§3.4) VN, which stands for virtual networks over PN.
to reduce the control channel consumption of the pCol- A tenant controller can create its VN’s topology with VN
lector. Instead of collecting the pStatistics from individual resources, such as virtual switches, links, and ports, when

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 3

Switch SDN controller by the notation S . For example, S(pfi ) and S(ppj ) represent
① Collection the statistics of pfi and ppj . An SDN controller then gathers
the statistics from a switch ( 2 transmission). With the col-
Statistics request lected information, the SDN controller analyzes, manages,
② Transmission
Statistics reply
and optimizes the networks ( 3 analysis).
③ Analysis We additionally explain the sizes of statistics request
and reply messages. Both messages consist of a packet
(a) SDN. header, usually an Ethernet, IP, TCP, or OpenFlow header,
and payloads. The header sizes of requests and replies are
PN switch Network hypervisor Tenant controller
similar (l(H)). A request’s payload is the network resource
④?
① Collection to be monitored. For example, in case of flow entry, the IP

Statistics request addresses or actions that the entry performs are included.

Statistics request ② Transmission The reply’s payload includes the same network resource and
Statistics reply its statistics. The size of the network resource i included in
Statistics reply both request and reply payloads is l(I(i)), and the size of
③ Analysis actual statistics included in the reply payload is l(S(i)).
For example, in SDN, an ONOS controller [15] sends
(b) SDN-NV.
statistics requests messages toward flow entries and ports
Fig. 2. Steps of network monitoring. of a switch every 5 s as its default settings. The statistics
transmission process is finished when the SDN controller re-
the tenant controller sends a request to the network hyper- ceives the corresponding reply messages from the switches,
visor. When the network hypervisor receives the request, it and the transmission time can be longer or shorter than 5
substantiates the VN resources with mappings to the PN s. Note that the statistics request sending interval can be
resources. For instance, a virtual switch operates based on changed by a network operator.
the mapping of one physical switch or a set of physical
switches and links. The virtual port (vp) for each virtual 2.2.2 Network monitoring in SDN-NV
switch is also mapped to the physical port (pp). In addition, Fig. 2b shows the network monitoring in SDN-NV. In SDN-
a virtual link can be created by connecting two vps. NV, the switches in PN collect S(pf ) and S(pp) ( 1 in
After the VN topology is created, the network hypervisor Fig.2b), similar to the collection step of SDN ( 1 in Fig.2a).
emulates the requested VN resources as they are standard In SDN-NV, the tenant controllers perform the transmission
SDN switches. The tenant controller then manages the process ( 2 in Fig.2a). However, the difference is that as
created VN resources without recognizing whether its re- the controllers face the virtual switches (which is a network
sources are virtualized or not. The tenant controller connects hypervisor), the statistics request is delivered to the network
to the virtual switches through south-bound interfaces (e.g., hypervisor instead of PN switches ( a in Fig. 2b. The net-
OpenFlow) and implements flow entries that match packets work hypervisor then appropriately handles the request and
to ensure that it can process (e.g., forward) the matched generates a reply for the request. For example, the network
packets. These operations are achieved by control messages hypervisor should collect the statistics corresponding to
from the tenant controller, and the messages pass through the request from the PN switches (( b in Fig. 2b, which
the control channel. may be multiple transmissions according to VN and PN
VN resources and flow entries are mapped to the cor- mappings) and generates a reply message based on the
responding resources in the PN, which implies that the collected statistics. However, to the best of our knowledge,
PN resources can be mapped to either one or more of the existing NHs lack an appropriate scheme for handling such
VN resources. Thus, the flow entries from multiple tenant requests from tenant controllers ( 4 in Fig. 2b, details in
controllers can be mapped to a smaller number of physical §2.4.1), resulting in critical challenges (discussed in §2.3).
flow entries [17], [18]. Throughout this paper, the term V For further discussions, we formalize several notations
represents a virtualization function, and V’ represents a de- for network monitoring in SDN-NV. First, the transmission
virtualization function—these functions map PN resources time for the statistics request and reply messages between
to the VN resource, or vice versa. For example, when a the tenant controller and the network hypervisor is denoted
physical flow entry (pf ) is given, V(pf ) provides the virtual as dv . This time is identical to the round-trip-time (RTT)
flow entries (vf s) mapped to the pf . Similarly, given a between the tenant controller and network hypervisor. The
virtual switch S , V’(S ) generates the list of physical switches transmission time for the statistics request and reply mes-
and links mapped to S . sage between the network hypervisor and PN switches is
denoted as dp . In addition, the processing time in the net-
work hypervisor for calculating vStatistics and generating
2.2 Network Monitoring the reply message for tenant controllers is denoted as dN H .
2.2.1 Network monitoring in SDN Table 1 summarizes the terminologies explained up to now.
Network monitoring in SDN involves three steps (Fig. 2a):
1 collection, 2 transmission, and 3 analysis [27]. The 2.3 Challenges of Network Monitoring for SDN-NV
statistics are recorded at the switches, which measure the Here, we discuss the three network monitoring challenges
processed number of packets per flow entry or port ( 1 in SDN-NV systems in detail, which motivates the de-
collection). We denote the statistics of a network resource velopment of V-Sight. The challenges here represent the

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 4

2000
TABLE 1
Native NH+SM

transmission delay (ms)


Terminologies and their description. 1500
Terminology Description 1000

Statistics
pStatistics Statistics of physical network resource
vStatistics Statistics of virtual network resource 500
A user or group of users sharing the resources provided
Tenant 10
by a cloud
Datacenter network connecting physical servers for VMs 5
PN
and containers
0
VN An isolated logical network given for a tenant 2 4 8 16 32
pf , pp Physical flow entry and physical port
Number of connections
V (i) VN resources mapped to PN resource i
V 0 (j) PN resources mapped to VN resource j
S(i) Statistics of the network resource i Fig. 4. Statistics transmission delay comparison (ms).
l(H) Header length of statistics request/reply
l(I(i)), l(S(i)) Length of information, statistics of i in payload
dv Transmission time of vStatistics request or reply message
between the request and reply messages distances the value
dp Transmission time of pStatistics request or reply message of the statistics from the request time.
dN H Processing time of network hypervisor for vStatistics We conduct an experiment to determine the increase
in transmission delay. Existing network hypervisors do not
Tenant 1 Tenant 2 Tenant 3 support network monitoring; thus, we implement a simple
controller controller controller
monitoring function on Libera [5], which is an open-source
VN1 VN2 VN 3
vPort1 vPort2 vPort3
network hypervisor. The implementation receives the statis-
tics requests from tenant controllers and then gathers the
corresponding statistics from the PN based on the mappings
Network hypervisor between the VNs and the PN. The monitoring function
replies to the tenant controllers after all pStatistics from
Physical network the physical switches arrive. We call evaluations performed
pPort1
using this implementation, NH+SM. The experiment is con-
ducted in a 4-ary fat-tree topology with 2, 4, 8, 16, and
32 TCP connections with one VN. The tenant controller
Fig. 3. Non-isolated statistics example. issues statistics requests at 5 s intervals for every switch
in its network, requesting the statistics of all flow entries of
problems that V-Sight should overcome as the first network each switch. As described in Fig. 4, the non-virtualized SDN
monitoring framework for SDN-NV systems. (Native) case exhibits almost constant statistics transmission
delays, at 4.6 ms on average, regardless of the number of
2.3.1 Non-isolated and inaccurate statistics network connections. In contrast, NH+SM exhibits delays
of 187 to 1,836 ms, which are 38 to 333 times higher than
In SDN-NV, the PN resources (e.g., switches and ports) are
that of Native.
shared among multiple VNs. Thus, the collected statistics
We formulate the transmission delay of NH+SM and NH
from PN resources are not isolated between the VNs. The
to determine the reason of the increased delay by using
pStatistics collected in PN switches can be expressed as
the notations introduced in Table 1. For a request message
follows. For the PN resources i (e.g., pf or pp), S(i) =
P from a tenant controller that retrieves the statistics of all
j∈V (i) S(j). Fig. 3 shows an example of three VNs, each flow entries of a virtual switch, we refer to the number
comprising one vp. In this scenario, all vps are mapped to
of flow entries of the switch as n. Then, the transmission
the same pp (pPort1). Suppose that the tenant1 controller
delay of NH+SM is formulated as dv + ndp + dN H , which
retrieves the statistics of vPort1. Because pPort1 is unaware
is the sum of the following instances: 1) one dv vStatistics
of the presence of multiple VNs, S(vP ort1) collected in PN
transmission, 2) n instances of pStatistics transmissions, and
is the sum of S(vP ort1) + S(vP ort2) + S(vP ort3). This
3) one instance of processing in the network hypervisor for
indicates that pStatistics does not separate the statistics per
a vStatistics calculation and reply message creation (dN H ).
VN. Thus, the tenant1 controller ends up with aggregated
On the other hand, the total transmission delay in the
statistics which is inaccurate for VN1.
Native case is dc because a single statistics transmission
Statistics are used for various network management
between the PN and the SDN controller can retrieve all
operations of tenant controllers, such as cost-based central
existing flow entries of a switch. Note that NH+SM cannot
routing, traffic engineering, and QoS. However, with non-
retrieve n numbers of pf s in a single transmission because
isolated statistics, tenant controllers cannot accomplish their
NH+SM collects only the pf s mapped to the tenant and
desired management operations. Thus, V-Sight should be
the PN switches contain the pf s of other tenants at the
capable of isolating statistics in the sense that the statistics
same time. Consequently, the transmission delay of NH+SM
provided to each tenant should only contain information
includes the additional time of ndp + dN H , which increases
regarding a particular VN, not the aggregated statistics.
this transmission delay by up to 1.84 s (Fig. 4).

2.3.2 High transmission delay 2.3.3 Excessive control channel consumption


Network monitoring is performed repeatedly to track the Statistics transmission passes through the control channel.
changing statistics. A reply to a statistics request should In SDN-NV, two types of control channels exist: between the
arrive as quickly as possible because any transmission delay network hypervisor and tenant controllers, and between the

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 5

25000
Native NH+SM P denoted by Fpf , the size of the reply message becomes
is
i∈Fvf l(H) + l(I(i)) + l(S(i)). Thus, control channel con-

Statistics messages
(bytes per second)
20000
sumption, which is the sum of request and reply messages,
15000
is quite higher in NH+SM than in Native.
10000

5000
2.4 Related Work
0
6 12 18 24 30 In this section, we first explain the existing studies on
Number of connections network hypervisors and their consideration in network
monitoring. Then, we review the existing studies on net-
Fig. 5. Control channel consumption comparison (bytes per second). work monitoring in non-virtualized SDN and summarize
the differences of V-Sight compared with them.
network hypervisor and physical switches. In this study, we
focus on the latter because the traffic between the network 2.4.1 Related studies on NH and monitoring
hypervisor and physical switches is increased by network
virtualization. A network hypervisor emulates ordinary Table 2 presents the descriptions and objectives of existing
switches with virtual switches so that SDN controllers can network hypervisors. FlowVisor [11] introduced the first
be used as tenant controllers without any further modifica- idea of NV in SDN, and FlowN [12] defined abstractions
tion [5]. Thus, the traffic between the network hypervisor for virtual networks, such as virtual addresses, based on
and tenant controllers is similar to the control traffic in containers. OpenVirteX [13] defined address virtualization
non-virtualized SDN. Therefore, this study presents and schemes based on mapping between virtual and physical
aims to reduce the control traffic increased by network addresses, which can provide full address field accesses
virtualization, which is the traffic between the hypervisor to tenants. AutoSlice [28] and AutoVFlow [29] proposed
and physical switches. a distributed network hypervisor to improve the platform
The control channel is utilized by tenant controllers for scalability. Also, CoVisor [31] designed a policy composition
control operations, such as switch connection handshaking, framework for a network to be managed using heteroge-
flow entry installation and modification, the topology dis- neous SDN controllers. Libera [5] defined a cloud-service
covery process, and ARP processing. Thus, when the control model based on the SDN-NV system.
channel consumption for statistics increases from 5.11 to 22 In addition, FlowVirt [17] proposed flow entry virtu-
KB/s, we find that the flow entry installation suffers a four alization, which maps multiple physical flow entries to
times higher delay (from 86 to 368 ms). Moreover, because virtual flow entries, thereby reducing the amount of switch
operations such as flow entry installation can reduce the memory. LiteVisor [18] proposed a new packet forwarding
throughput of network connections, the control channel scheme named LITE, which separates the address, location
consumption for network monitoring should be reduced. identifier, and tenant identifier to effectively manage and
To be precise, we evaluate the control channel con- update information in a datacenter. TeaVisor [32] proposed
sumption for the network monitoring of NH+SM. We set path virtualization, which ensures provision of the re-
a linear topology with five switches and three VNs. Each quested bandwidth of each tenant by leveraging multipath
VN consists of two hosts at the edge of the topology with 6, routing, bandwidth reservation, and bandwidth limiting.
12, 18, 24, and 30 network connections in PN. We conduct The above studies improve different aspects of SDN-
experiments with the same monitoring function as §2.3.2. NV systems to make the system feasible and reliable in
Fig. 5 shows the control channel consumption for the flow cloud computing. Including the content of the survey papers
statistics transmission. The results for NH+SM are 1.5 to 2.3 on NV [33], [34], [35], however, we find that studies on
times higher than those of Native. network hypervisors do not cover network monitoring for
In NH+SM, a network hypervisor collects the statistics tenants (i.e., non-isolated statistics, transmission delay, and
for a request as follows. It first checks the existing vf s in the control channel consumption). In addition, Microsoft has
requested switch. Let us denote a set of vf s in the requested mentioned that the monitoring of the virtualized networks
virtual switch as Fvf . For each element of j in Fvf , the in cloud systems has not been investigated [25]. Thus, to the
network hypervisor finds the mapped pf s to j and sends the best of our knowledge, no previous study has focused on
statistics request messages one-by-one. The reply message network monitoring for tenants, and this fact motivates us
arrives at the network hypervisor for each request message. to develop V-Sight.
The control channel consumption from network monitoring,
which is the total of request and reply messages, is thus 2.4.2 Related studies on monitoring in non-virtualized SDN
formulated
P P by the sum of the size of total request messages The SDN controllers, used as tenant controllers, provide
j∈Fvf Pk∈V 0 (j) l(H)
P
+ l(I(k)) and the size of total reply APIs or sub-modules for network monitoring (e.g., fwd in
messages j∈Fvf k∈V 0 (j) l(H) + l(I(k)) + l(S(k)). ONOS or OpenFlow Plugin in OpenDayLight). Such tools
On the other hand, in Native, SDN controllers can collect are used for creating statistics request messages and pro-
all pf s existing in a switch through a single request and cessing the reply messages to be received according to the
reply by assigning ”all entry” as the payload of the request request. To work with such tools and physical networks,
message. We denote the payload size for ”all flow entry” we aim to design V-Sight to generate proper statistics reply
message as l(∗vf ); then, the size of the request message be- messages containing the isolated statistics, with reasonable
comes l(H)+l(∗vf ). When a set of pf s in the physical switch transmission delay and control channel consumptions.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 6

TABLE 2
Related studies analysis—network hypervisor.
Monitoring challenges in SDN-NV
Study Description Statistics High trans- Excessive control
isolation mission delay channel consumption
FlowVisor [11] Divide network resources such as address and topology and allocate them to tenants Not solved Not solved Not solved
Design an address virtualization scheme (based on FlowN) and container-based tenant
FlowN [12] Not solved Not solved Not solved
controller architecture
OpenVirteX [13] Design address virtualization that provides an access to entire address fields Not solved Not solved Not solved
AutoSlice [28] Design network hypervisor as a distributed system (multiple proxies) Not solved Not solved Not solved
AutoVFlow [29] In addition to AutoSlice, AutoVFlow enables entire address space to tenants Not solved Not solved Not solved
HyperFlex [30] Disaggregate the functions of an internal network hypervisor to flexibly locate them Not solved Not solved Not solved
CoVisor [31] Design composition policies for flow entries coming from various SDN controllers Not solved Not solved Not solved
Define the architecture, APIs, and essential operations of SDN-NV for cloud
Libera [5] Not solved Not solved Not solved
datacenters, summarized as Libera
FlowVirt [17] Aggregate multiple flow entries of tenants into a smaller number of physical ones Not solved Not solved Not solved
LiteVisor [18] Suggest a routing scheme for separating the location, identifier, and tenant distinguisher Not solved Not solved Not solved
Design path virtualization, a customized multipath routing, bandwidth reservation,
TeaVisor [32] Not solved Not solved Not solved
and bandwidth limiting, for SDN-NV
Provide isolated statistics and reduce statistics transmission delay and
V-Sight Solved (§3.2) Solved (§3.3) Solved (§3.4)
control channel consumption

TABLE 3
Related studies analysis—network monitoring in non-virtualized SDN.
Designs Evaluation methodology
New switch New
Sampling Adaptive interval Others Implementation Network topology/trace
architecture API
Switch: NetFPGA-based HW CAIDA packet trace
OpenSketch [36] X X
Controller: C++ SW or single switch
Switch: Lagopus-based SW
SDN-Mon [37] X X Single switch
Controller: Module on Ryu
4-ary fat-tree
OpenSample [21] X(packet) Module on Floodlight
and four switches
Custom linear
OpenTM [22] X(flow) Module on NOX
topology (ten switches)
Erdős–Rényi graph,
FlowCover [23] X(switch) Simulator
Waxman graph
Linear topology
OpenNetMon [38] X(flow) Module on POX
(four switches)
cFlow [20] X(flow) Simulator GEANT trace
Tahaei et. al. [39] X(switch) X(link utilization) Module on Floodlight Fat-tree topology
PayLess [24] X(link utilization) Module on Floodlight Tree topology
X(reinforcement Custom tree
IPro [40] SW based on Ryu API
learning) topology (11 switches)
Split controller and
MicroTE [41] Kernel module on Linux Tree topology
monitoring framework
Linear topology
Embed statistics in NA
FlowSense [42] X (two switches)
other messages
Simulator EDU1 trace
Internal scheme Linear (five switches) and
V-Sight Not relevant (can work together)
in Libera 4-ary fat-tree topology

Various studies have been proposed to reduce monitor- on monitoring, which means that parts of the statistics are
ing overheads in non-virtualized SDN. Table 3 summarizes selectively gathered to reduce overhead. OpenSample per-
these studies by comparing their key designs and evaluation forms sampling on packets on the network and calculates
methodology. The objectives of these studies are mostly to the flow and port statistics based on the sampled packets.
reduce the monitoring overheads between SDN controllers OpenTM monitors only statistics of some flows for calcu-
and switches while maintaining a degree of statistics accu- lating link utilization. FlowCover selects network switches
racy. We explain these studies comprehensively here. using greedy algorithms and heuristics in response to flow
First, OpenSketch [36] and SDN-Mon [37] introduced changes. OpenNetMon monitors only edge switches for per-
new monitoring architectures to reduce monitoring over- flow statistics. In addition, cFlow calculates link utilization
heads on both the switch and controller sides; thus, they using machine learning, and the required flows that have
require architecture modification on switches and API mod- high impact on the prediction accuracy of utilization are
ification on SDN controllers. For example, OpenSketch de- prioritized and monitored.
signs a hash-based architecture that collects statistics based Third, Tahaei et al. [39], PayLess [24], and IPro [40]
on the hash result of each flow. The memory for collec- regulated the monitoring interval. Tahaei et al. introduced
tion statistics in switches reduces, and the traffic or delay a monitoring scheme that frequently measures the statistics
in statistic transmission decreases accordingly. SDN-Mon that contribute more highly to link utilization than others.
introduced a switch architecture that separates the flow They also designed their scheme as sampling because their
table for packet routing and statistics collection so that scheme only monitors the statistics of top-of-rack or edge
the collection in switches can be performed with different switches. PayLess regulates the monitoring interval of each
granularities from flow entries. Because statistics collection flow according to its contribution on link utilization, similar
can be performed in a more coarse-grained manner than the to the scheme of Tahaei et al. IPro regulates the monitoring
flow entries, the monitoring overhead can be reduced. interval using reinforcement learning.
Second, OpenSample [21], OpenTM [22], FlowCover In addition, MicroTE [41] implemented monitoring and
[23], OpenNetMon [38], and cFlow [20] perform sampling traffic engineering functions on a separate machine from the

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 7

SDN controllers; thus, the bottleneck is removed from the Tenant Tenant controller 1 Tenant controller 2 Tenant controller 3
controllers
controller itself. FlowSense [42] introduced a technique for
vStatistics
removing additional statistics request messages from con- Network hypervisor
trollers by embedding the statistics values in other Open- V-Sight
Statistics virtualization
Flow messages, such as PacketIn (for flow entry generation) Virtualization
map Flow entry Port
or FlowRemoved (notification for the event that the flow
entry has been deleted).
Transmission disaggregation
The above-mentioned studies change the schemes of
Request interval estimation pStatistics cache
statistics collection or the manner of transmission (e.g., the
number or intervals of statistics request messages). On the
pCollector aggregation
other hand, the goal of V-Sight is to provide isolated statis-
pCollectors
tics and reduce the transmission delay and control channel pCollector filter

consumption of the network hypervisor. Therefore, V-Sight pCollector tuner
and the studies above are orthogonal and can work together.
One thing to note is that the evaluation methodology in
Table 3 can be categorized as follows. First, previous studies pStatistics
Physical
[36], [37] proposed a new architecture for monitoring and network
evaluated the architecture through a hardware prototype.
Using the hardware prototype, evaluations are conducted
on relatively small testbeds, usually a single switch, or simu- Fig. 6. V-Sight architecture.
lations using traces. Second, other studies [20], [23], [42] im-
plemented their solutions as a type of simulator and tested
key point of transmission disaggregation is to prepare the
them based on network traces. In addition, several studies
pStatistics needed for the vStatistics by disaggregating the
[21], [22], [24], [38], [39], [40] implemented a component
time the vStatistics comes in and the time at which the
(module) atop an existing SDN controller (e.g., Floodlight,
pStatistics is ready. In other words, transmission disag-
NOX, POX, or Ryu) and evaluated the component using
gregation ensures that the pStatistics is in the pStatistics
actual switches (e.g., Open vSwitch).
cache before the vStatistics request arrives. To achieve this,
For evaluations with switches, previous studies used lin-
transmission disaggregation performs the “request interval
ear and fat-tree topologies that do not contain routing loops.
estimation.”
We believe that this topology selection is performed because
pCollector aggregation (§3.4) consists of two tasks: the
the existing SDN controllers are not capable of properly
“pCollector filter” decides the execution period of each
handling routing loops in a network topology [43]. Thus,
pCollector and checks whether pCollectors can be merged
we expect to conduct experiments when SDN controllers
as one pCollector for a specific physical switch; and the
become capable of handling routing loops. In the meantime,
“pCollector tuner” decides the starting delay of a pCollector
we fully implement V-Sight in a network hypervisor and
for improved accuracy.
evaluate V-Sight with linear and fat-tree topologies (§4).

3.2 Statistics Virtualization


3 V-S IGHT D ESIGN Statistics virtualization aims to provide per-VN vStatistics
In this section, we first introduce the overall architecture of from non-isolated pStatistics. We develop calculation algo-
the V-Sight framework and its operations. We then present rithms for vf s (flow entry) and vps (port), which are the
three mechanisms of V-Sight: 1) statistics virtualization for most fine-grained resources of network monitoring in SDN
isolated statistics, 2) transmission disaggregation for im- networks [6]. Other resources (e.g., flow table, switch, or
proved transmission delay, and 3) pCollector aggregation entire network) can be derived from the per-VN statistics.
for reduced control channel consumption.
3.2.1 Per-VN flow entry statistics
3.1 V-Sight Framework Architecture For statistics isolation, V-Sight checks the mapping between
vf and pf from the virtualization map (in the statistics
Fig. 6 shows the architecture of the V-Sight framework. virtualization component of Fig. 6), which maintains the
The processing sequence of V-Sight is as follows. When a relationships between the vf from tenant controllers and pf
statistics request (e.g., vf or vp) from a tenant controller is existing in the PN. The mapping of vf is used in two ways.
sent, the statistics virtualization component (§3.2) of V-Sight First, if pf is not shared with the other VNs (|V (pf )| = 1),
receives the message and calculates the requested vStatistics the statistics of pf become the statistics of vf . Second, pf
based on the pStatistics. For calculation, V-Sight references is shared between VNs2 (|V (pf )| > 1) [17], [18], [44]. In
the virtualization map that maintains mappings between the this case, because the pf aggregates all the statistics of
VN and PN resources.
The pStatistics required for vStatistics calculation is ob- 2. Multiple VNs can share flow entries when NH merges flow entries
tained from the “pStatistics cache.” Transmission disaggre- in order to reduce the physical memory consumed by the flow entries
gation (§3.3) maintains the pStatistics cache, and the cache [17], [18], [44]. The conditions for flow entry merging are: 1) the flow
entries are for packet forwarding, 2) the input port and output port of
is filled by the pCollector. Transmission disaggregation en- the flow entries are identical or their masked IP addresses are identical,
ables a pCollector to run before the vStatistics request. A and 3) VN permits the sharing of flow entries.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 8

Algorithm 1: Per-tenant flow entry statistics. Network hypervisor


Tenant Statistics
Input: vf : virtual flow entry for which the VN Controller virtualization
PN

controller requires statistics vStatistics request

Transmission delay
Output: S(vf ): statistics of the vf pStatistics 1 request

pf = V 0 (vf ) pStatistics 1 reply



if |V (pf )| == 1 then
pStatistics n request
S(vf ) = S(pf )
pStatistics n reply
else
vStatistics reply
if |V (pf )| > 1 then
Epf = Find edge pf of vf (a) Without transmission disaggregation.
S(vf ) = S(Epf ) Network hypervisor
Tenant Statistics Transmission
Return S(vf ) Controller virtualization disaggregation
PN

pStatistics 1 request

Transmission delay
pStatistics 1 reply
Algorithm 2: Per-tenant port statistics. …
Input: vp: virtual port for which the VN controller pStatistics n request
requires statistics vStatistics request pStatistics n reply
Ask cached results
vs: virtual switch to which the vp belongs Get cached results

vStatistics reply
vf , vf in , vf out : virtual flow entry,
input port of the vf , output port of the vf (b) With transmission disaggregation.
Output: S(vp): statistics of the vp
Fig. 7. Transmission delay comparison.
pp = V 0 (vp)
if |V (pp)| == 1 then
that have vp as their input port. To calculate the TX packet
S(vp) = S(pp)
statistics, V-Sight sums the vf statistics that send packets
else
out to the vp. This calculation is summarized in Alg. 2.
for vfi belongs to vs do
if vfiin == vp then
S(vp).RX+ = S(vfi ); 3.3 Transmission Disaggregation
else if vfiout == vp then
S(vp).T X+ = S(vfi ) As formulated in §2.3.2, the increased delay from SDN-NV
is denoted by ndp + dN H . The time dN H is to perform statis-
Return S(vp) tics virtualization (§3.2); thus, minimizing the transmission
delay aims to reduce ndp , which is the time for pStatistics
transmissions (Fig. 7b). To reduce ndp , transmission disag-
vf s mapped to the pf , V-Sight should not return the pf gregation introduces the pStatistics cache and the request
statistics directly to the tenant controller. Instead, V-Sight interval estimation, to reduce the transmission delay from
isolates the pf statistics with the following observation: Fig. 7a to Fig. 7b.
even though multiple vf s are mapped to one pf , the vf s for
edge switches (the first and last switches on the forwarding 3.3.1 pStatistics cache
path) are installed individually per VN. This is because The pStatistics cache tracks the time that pStatistics are
the packets are dealt with separately per VN in the edge stored and whether it has already been used per VN. When
switches to ensure isolation in NV [18], [44]. In other words, the pStatistics cache contains pStatistics that are not out-of-
pf in the edge is allocated per-VN so that the packets at date (old), the pStatistics can be directly returned without
the edge are delivered to the host (or VM). Thus, V-Sight retrieving pStatistics from any physical switch of the net-
returns the pStatistics of the edge switch pf as the requested work hypervisor (hit).
vStatistics. Alg. 1 summarizes how to obtain the per-VN The pStatistics cache is considered old when 1) the re-
flow entry statistics. Because the vf statistics contain the trieved time of the pStatistics is longer than the monitoring
packet number (count) and byte (quantity), the algorithm interval or 2) it has already been used for the requested
calculates the count and quantity individually. VN. The reasons are as follows. First, when the tenant
controller performs periodic network monitoring, at least
3.2.2 Per-VN port statistics the statistics measured within the request interval should
vp statistics include the count and amount of received (RX) be returned because, if the statistics are collected before the
and transmitted (TX) packets. Similar to the flow entry, a pp interval starts, the value is old with respect to the current
can be shared by one or more VNs. If only one VN utilizes request; so, it is not accurate for the request. Second, if
the physical port, the statistics of pp become vp statistics. the stored pStatistics are used for the requested VN, the
Meanwhile, if pp is mapped to multiple vps, it receives and tenant controller would have already collected the data at
transmits the traffic of multiple VNs. In this case, V-Sight that time; thus, we consider the data old. If pStatistics do
uses the vf statistics obtained in Alg. 1 because the vf s not exist in the pStatistics cache (miss) or they are old, the
process the packets going to and from the vp of a switch. For cache retrieves the pStatistics from physical switches. Fig.
RX packets, V-Sight accumulates the vStatistics of the vf s 7b shows the working of transmission disaggregation.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 9

When the number of pStatistics required for vStatistics per pf after the interval window (w number of intervals)
is n, and k of pStatistics are “hit” (i.e., n − k accesses is accumulated. Before the interval window, the pStatistics
to the pStatistics cache are “miss” or “old”), the physical cache generates a “miss” for the required pStatistics of pfi ,
transmissions of n − k times are conducted for vStatistics. which makes V-Sight collect pStatistics from the PN for each
Subsequently, the entire transmission delay can be reduced request.
to (1 + n − k)dp + dN H . Therefore, increasing the number k Clearly, the request interval of each tenant controller
is important for improving the transmission delay. can change. The request estimation interval flushes the w
1 w
In addition, when pStatistics are updated, the pStatis- number of past intervals (pfi,j to pfi,j ) after sending a new
tics cache verifies whether the previously stored value has 7
interval distribution ( ) and accumulates the intervals from
been used. If the pStatistics stored in the pStatistics cache 1 to w again. Therefore, for the w number of recorded inter-
are not used for a certain time (e.g., 10 times), they are vals ( 2 ), (µi,j , σi,j ) is updated ( 3 ). If the pCollector for pfi
removed from the pStatistics cache. This policy prevents has already been created ( 4 ), the request interval estimation
useless transmission disaggregation. Even if the pStatistics checks how much the newly updated µi,j has changed from
are released, they can be re-cached to the pStatistics cache the previous value ( 8 ). If the changed amount is large (e.g.,
when the vStatistics that require pStatistics for statistics 25%), this function selects a new distribution for pfi ( 5 ) and
virtualization are requested. delivers a new triple (pfi , µi , σi ) to pCollector aggregation
( 6 ).
3.3.2 Request interval estimation
The pStatistics cache is filled by pCollectors. A pCollector 3.4 pCollector Aggregation
exists per pf so that a pCollector executes to retrieve the
The objective of pCollector aggregation is to execute and
pStatistics of the pf of a physical switch. In particular,
merge pCollectors. Given a triple (pfi , µi , σi ) from trans-
we use the term “interval” for the time between two con-
mission disaggregation, a pCollector for pfi is created. The
secutive requests from a tenant controller for a pf and
pCollector periodically retrieves the pfi statistics from a
”period” for the time difference between two consecutive
switch. However, if the number of pCollectors increases, the
executions of a pCollector. For each pCollector, the period
pCollectors can consume too much of the control channel
of execution should be determined. If the period of the
(as discussed in §2.3.3).
pCollector is much shorter than the request interval, the
There are two types of pCollectors, as shown in Fig. 9.
pCollector will end up executing multiple times before a
At the top of Fig. 9, three pCollectors retrieve statistics from
“hit,” which wastes CPU and control channel resources.
their own pf . The bottom of Fig. 9 shows one pCollector
Conversely, if the pCollector is executed less often than
that collects multiple pf statistics of a switch simultane-
the vStatistics requests, the transmission delay cannot be
ously. The latter pCollector consumes less of the control
reduced because the pStatistics are “old.” Therefore, deter-
channel than the former because the required message sizes
mining the execution period is very important, and this is
for statistics transmission are smaller. Specifically, for the
the reason the request interval estimation is used.
statistics requests, the former pCollector should contain the
The request interval estimation calculates the mean (µ)
specific information of individual pf , i; thus, the request
and variance (σ ) per pf that characterize the VN controller’s
message size is formulated as l(H) + l(I(i)). If n numbers
request intervals. For pfi , the request of VN j is denoted as
of the former pCollectors are running, the entire request
pfi,j , and its distribution is (µi,j , σi,j ). The pStatistics cache P
message size becomes n × l(H) + k l(I(ik )). In the reply
contains a pf identifier (pfi ) and VN identifier (j ). The k -th
k message, thePstatistics for the requested pf s are added, so
interval for pfi,j is denoted as pfi,j .
n × l(H) + k l(I(ik )) + l(S(ik )).
Fig. 8 shows the flowchart of the entire request interval
In contrast, the request message of the latter pCollector
estimation. This process is executed every time the pf
includes ”all flow entry” instead of individual information
identifier (pfi ) and VN identifier (j ) are received as per
of pf s. The number of request messages then becomes one,
each vStatistics request. First, the request interval estimation
and its size is l(H)+l(I(∗pf )). The payload of the reply mes-
records the interval between consecutive requests ( 1 in Fig.
sage does not change compared with the former pCollector
8). The request interval estimation calculates (µi,j , σi,j ) ( 3 )
to contain the statistics of each pf , but the replies are cre-
once a certain number of intervals is accumulated, which
ated as a single message corresponding to a single request
is denoted as “interval window (w)3 .” When the (w + 1)-
message, resulting in the creation of a P single packet header.
th request arrives, the distribution of pfi,j (µi,j , σi,j ) is
1 w Thus, the size of the reply is l(H) + k l(I(ik ) + l(S(ik ))
calculated based on the pfi,j to pfi,j . Next, among the 4
. Thus, the latter pCollectorP reduces less control traffic in
distributions of multiple VNs, V-Sight chooses the interval
amount of (2n − 2) × l(H) + k l(I(ik )) − l(I(∗pf )).
distribution that has the minimum µ value ( 5 ). In other
We call the pCollector for a single pf (former) as a “tiny
words, (µi , σi ) = (µi,l , σi,l ) where l = arg minj µi,j . Requests
pCollector” and the other pCollector as an “aggregated
that have a higher µ than the selected pfi will “hit” because
pCollector.” An aggregated pCollector is created when mul-
the pCollector for pfi based on (µi , σi ) stores the statistics
tiple tiny pCollectors follow a similar period for pf s in a
of pfi for those requests in a timely manner. The selected
switch. pCollector aggregation is achieved using two tasks:
distribution is passed to the pCollector aggregation ( 6 ,
§3.4) as a triple (pfi , µi , σi ). Note that a pCollector is created 4. When the size of the reply exceeds the maximum transmission
unit size, the reply packet fragments. In this case, the number of l(H)
3. Explicitly, we find that the value 30 is sufficient to obtain a stable consumed can increase but still be less than the former pCollector,
and reliable interval distribution with general SDN controllers. because the payloads are piggybacked as much as possible.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 10

VN identifier, pStatistics identifier


from pStatistics cache

② Number of Yes Yes


③ Calculate interval ④ pCollector ⑧ Distribution
① Store the interval stored intervals >=
distribution created? changed?
interval window
No Yes

⑤ Select one ⑥ Send distribution to ⑦ Flush stored


distribution pCollector aggregation intervals

Fig. 8. Flowchart of request interval estimation.

pCollector for pf1 pCollector for pf2 pCollector for pf3


l(H) l(I(pf1) l(H) l(I(pf2) l(H) l(I(pf3)
period range using µi and σi . Next, for every possible
Request
Request Header pf1 Request Header pf2 Request Header pf3 period value within the range, the pCollector filter counts
l(H) l(I(pf1) l(S(pf1) l(H) l(I(pf2) l(S(pf2) l(H) l(I(pf3) l(S(pf3)
Reply pf1 S(pf1) pf2 S(pf2) pf3 S(pf3)
the number of tiny pCollectors with the period for the
Reply Header Reply Header Reply Header
value ( 2 ). The period value with the largest number of
pCollector for a physical switch of pf1, pf2 and pf3
tiny pCollectors is selected ( 3 ). Once a period is selected,
l(H) l(I(*pf ) the pCollector filter calculates the ratio of the number of
Request Header * (all pfs)
l(H) l(I(pf1) l(S(pf1) l(I(pf2) l(S(pf2) l(I(pf3) l(S(pf3)
tiny pCollectors that follow a similar period to the number
Reply Header pf1 S(pf1) pf2 S(pf2) pf3 S(pf3) of existing pf s in the switch ( 4 ). If the ratio is low, an
aggregated pCollector consumes more control traffic than
Fig. 9. Control channel consumption for two kinds of pCollectors.
the tiny pCollectors. Subsequently, only when the ratio is
high ( 5 ), for instance, 70%5 , the pCollector tuner merges
pCollector tuner tiny pCollectors into an aggregated pCollector.
vStatistics request
⑦ Execute the pCollector routinely
Statistics virtualization
⑥-3. Set starting delay 3.4.2 pCollector tuner
pStatistics request for aggregated pCollector
⑥-1. Set starting The role of the pCollector tuner is to provide an additional
Transmission disaggregation delay for tiny
pCollector
⑥-2. Check existing delay to the first execution of each pCollector in order
vStatistics requests to improve the accuracy of vStatistics. In Fig. 11a, a time
Request interval distribution
difference, which is shown as an arrow with ”t.d.,” exists
① Find pCollector period range No Yes between the time the vStatistics requests arrive and the
time the pStatistics are gathered through the pCollector.
⑤ Higher than
② Count tiny pCollectors per threshold? This time difference depends on the time at which the
period value in the range pCollector first runs. If the pCollector is executed slightly
before the vStatistics request, the time difference becomes
③ Select one pCollector period ④ Calculate ratio for the period
small, as shown in Fig. 11b, which implies that the cached
pCollector filter pStatistics are up to date. As the time difference becomes
larger, it decreases the accuracy of vStatistics. Therefore, V-
Fig. 10. Flowchart of pCollector aggregation. This routine is executed Sight introduces a “starting delay” to add the delay to the
according to statistics virtualization and transmission disaggregation.
first execution of the pCollectors.
1) a pCollector filter determines the execution period of For tiny pCollectors, the starting delay should be set in
the tiny pCollectors and aggregated pCollectors and 2) a order to execute the tiny pCollector immediately before the
pCollector tuner improves the accuracy of vStatistics. Fig. vStatistics requests (coming after the interval window). In
10 explains the operation of the two tasks to be discussed in addition, the starting delay should not be too large to pre-
the following subsections. vent the pCollector from being executed after the vStatistics
request, as shown in Fig. 11a. Empirically, we set the starting
3.4.1 pCollector filter delay at 95% of the pCollector period ( 6 -1 in Fig. 10).
Meanwhile, the method of setting the starting delay
From (pfi , µi , σi ), the pCollector filter decides on a period
for tiny pCollectors leads to poor accuracy for aggregated
of the pCollector for pfi . For the tiny pCollector, it is simple.
pCollectors. This is because the multiple requests managed
However, for the aggregated pCollector, even if tenant con-
by an aggregated pCollector exist at different times in terms
trollers issue statistics requests with similar intervals, each
of the pCollector period. Fig. 11c shows an example with
µi of pfi can be slightly different (e.g., 4.7, 4.9, and 5.1 s) two vStatistics requests from different tenants (tenant2 fol-
because the distribution is estimated based on w samples.
lowed by tenant1). If the starting delay is set to 95% of the
Thus, it is challenging to decide the period of an aggregated
aggregated pCollector, the execution time of the aggregated
pCollector.
pCollector is after tenant2 and before tenant1. As shown in
To address this problem, the pCollector filter starts with
Fig. 11c, tenant2 suffers a long delay because the aggregated
tiny pCollectors that have a similar period. From the cumu-
pCollector executes immediately after tenant2’s request.
lative probability distribution function derived by µi and
Therefore, the pCollector tuner sets the starting delay for
σi , the pCollector filter finds a period range that satisfies the aggregated pCollector as follows. First, the pCollector
a specific hit rate, such as 90% to 95% ( 1 in Fig. 10). The
requests that have longer intervals than the pCollector’s 5. In our evaluation, we explicitly find that sufficient improvement is
period will “hit”; so, this task can stochastically derive the obtained using 70%.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 11

Q vStatistics request from tenants t pStatistics reply by tiny pCollector a pStatistics reply by aggregated pCollector t.d time difference between vStatistics request and pStatistics reply

Tenant2 Tenant1 Tenant2 Tenant1 Tenant2 Tenant1 Tenant2 Tenant1


Q Q Q Q Q Q Q Q
t.d t.d Q Q Q Q
t.d t.d (tenant 1) t.d (tenant 1) t.d (tenant 1)

Current time t.d (tenant 2) t.d (tenant 2) Current time t.d (tenant 2)
time time
t t t a a a
starting delay starting
pCollector period pCollector period delay
(a) Low vStatistics accuracy (tiny (b) Enhanced vStatistics accuracy (c) Low vStatistics accuracy (ag- (d) Enhanced vStatistics accuracy
pCollector). with starting delay for tiny pCollec- gregated pCollector). with starting delay for aggregated
tors. pCollectors.

Fig. 11. Starting delay for tiny pCollectors and aggregated pCollectors.

run on separate physical servers. Table 4 summarizes the


hardware and software specifications used for evaluations.
Mininet emulates the PN based on Open vSwitch. We em-
ulate two types of topologies (Fig. 12): 1) a linear topology
(a) linear (five switches).
consisting of five switches and 2) a 4-ary fat-tree topology to
(b) fat-tree (4-ary). evaluate the effects on datacenters. For the linear topology,
we create three tenants that clone the PN topology as their
Fig. 12. Experiment topologies.
VN topologies. For each tenant, the number of TCP connec-
TABLE 4 tions varies (i.e., 2, 4, 6, 8, 10); thus, in the PN, 6, 12, 18, 24,
Hardware and software specifications.
or 30 connections exist. For the fat-tree topology, we change
Server (hardware) specifications. the number of connections to 2, 4, 8, 16, and 32 with a single
CPU Intel Xeon E5-2650 (2.30 GHz)
tenant, resulting in the same number of TCP connections as
Memory 64 GB
NIC Intel 82599ES 10GbE NIC in the PN. The TCP connections are generated through the
Software specifications. iperf3 [46]. Each VN is managed by an ONOS controller.
Mininet: 2.3.0d6 with Open vSwitch v2.9.5 The ONOS monitors all the flow entries and ports of each
PN
OS: Ubuntu 18.04 switch at 5 s intervals. ONOS controllers run as containers,
Network Libera: v0.1 and no ONOS container suffers performance or resource
hypervisor OS: Ubuntu 14.04
Tenant ONOS: v2.0.0 bottlenecks.
controller OS: Ubuntu 16.04.6
4.1.2 Metrics
tuner checks the request interval estimation (§3.3.2), which
stores the vStatistics request times for each VN ( 6 -2 in Fig. We evaluate V-Sight based on the following micro-
10). Then, the starting delay is set to be immediately before benchmarking metrics.
the first vStatistics request among the VN requests that the • Statistics virtualization accuracy (§4.2): the root mean
aggregated pCollector merges, which is tenant2’s request squared error (RMSE)—caused by the statistics virtu-
in Fig. 11d ( 6 -3). In this way, the sum of time differences alization algorithms—between the statistics calculated
from the time the aggregated pCollector executes to the time from the network hypervisor and the actual value.
each vStatistics request arrives is minimized. Finally, the • Transmission delay (§4.3): the average interval be-
pCollector tuner executes the pCollector periodically with tween the vStatistics request and the reply messages
the starting delay ( 7 ). from/to tenant controllers.
• Control channel consumption (§4.4): the average bytes
per second of the control channel traffic to obtain pf
4 E VALUATION statistics between the network hypervisor and the phys-
In this section, we present the evaluation results of V-Sight. ical switches.
V-Sight is implemented on Libera network hypervisor with
Second, we present three metrics for the system over-
OpenFlow version 1.3 (1.8K LoCs) [5], [13]. The source
heads of V-Sight as below.
code of V-Sight is available in the GitHub repository6 . We
measure micro-benchmarks, system overheads, and macro- • Time skew of pStatistics cache (§4.5): time skew im-
benchmarks that are explained in detail below. Each experi- plies an interval between the vStatistics request time
ment is repeated to obtain reliable results. and pStatistics collection time of the pCollectors—the
average value with 95% confidence interval. This time
skew shows the time difference of transmission disag-
4.1 Test Setup
gregation on the accuracy of the pStatistics required for
4.1.1 Settings the vStatistics calculation.
We use three physical servers. Mininet [45], the network • CPU and memory usage (§4.6): the average CPU cycle
hypervisor, and one or more ONOS as tenant controllers and memory consumption of the V-Sight framework
during the experiment.
6. https://github.com/gsyang33/V-Sight. V-Sight is easily
tested through the tutorial from Libera (https://github.com/os- Finally, for macro-benchmarks (§4.7) in practice, we
libera/Libera). show the effects of V-Sight on tenants by measuring the TCP

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 12

TABLE 5
RMSEs of statistics provided by SDN-NV and V-Sight compared with real values. Quantity is the volume of data processed by each flow entry or
port (shown in the unit of MB), and count is the number of packets processed.

NH+SM V-Sight
NH+SM V-Sight
RX TX RX TX
Quantity Count Quantity Count
Quantity Count Quantity Count Quantity Count Quantity Count
Tenant 1 114.78 10000.83 0.14 10.87
Tenant 1 65.83 5871.13 65.77 5871.00 0.15 16.85 0.15 16.85
Tenant 2 121.80 10633.79 0.02 8.52
Tenant 2 53.33 4583.02 53.24 4582.68 0.42 37.98 0.41 38.02
Tenant 3 112.62 9875.42 0.55 13.60
Tenant 3 40.92 4469.03 40.38 4468.36 0.88 61.81 0.89 61.94
(a) Flow entry statistics.
(b) Port statistics.

Port statistics, TX quantity (KB)


70

Port statistics, RX quantity (KB)


Flow entry statistics, quantity (KB)

30000 2000 12000 NH+SM V-Sight


60
Flow entry statistics, count

NH+SM V-Sight NH+SM V-Sight


1500 10000 Actual value
20000 Actual value 50
Actual value 8000 NH+SM V-Sight
1000 40
10000 6000 Actual value
500 30
100 4,000 20
2000
1500 75 3,000 15
1000 50 2,000 10
500 25 1,000 5
0 0 0 0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
Time (s) Time (s) Time (s) Time (s)

(a) flow entry statistics, quantity. (b) flow entry statistics, count. (c) port statistics, RX quantity. (d) port statistics, TX quantity.

Fig. 13. vStatistics replied by the network hypervisor in comparison with actual values.

throughput, CPU cycle, and control channel consumption of Statistics virtualization provides isolated statistics from
the tenant controller. the pStatistics. When pf or pp is not shared between mul-
tiple tenants and only mapped to a single tenant, their
4.1.3 Comparisons statistics are those of the single tenant’s vf or vp. Therefore,
The metrics explained in §4.1.2 are measured in the follow- for this evaluation, we set pf and pp to be mapped to
ing comparison cases: multiple tenants based on the concepts of previous studies
• Native: non-virtualized SDN in which physical [11], [12], [17], [18].
switches are directly connected to ONOS without a Table 5 shows the RMSEs of vStatistics provided by
network hypervisor. NH+SM and V-Sight per tenant. For both flow entry and
7 port statistics, two types of vStatistics, namely, quantity and
• Network hypervisor (NH): Libera without V-Sight. It
frequently returns no values to the statistics requests count (number of packets), are provided. In terms of the
from tenant controllers. flow entry in Table 5a, the RMSEs for quantity in V-Sight
• NH with simple monitoring function (NH+SM): Lib- are less than 1, whereas those of NH+SM are more than
era with simple monitoring that is used in §2.3.2. 110. For the count, the average RMSE of V-Sight is 10.99,
• V-Sight: the full implementation of V-Sight. whereas that of NH+SM is 10170.01—an improvement by
three orders of magnitude.
4.2 Statistics Virtualization Accuracy In terms of port (Table 5b), vStatistics are categorized
into RX and TX for incoming and outgoing traffic, re-
Statistics virtualization accuracy is measured on the linear
spectively. NH+SM exhibits an average RMSE of 53.25 for
topology with three tenants because accuracy results in
the quantity, whereas V-Sight exhibits an average of 0.48.
the linear and fat-tree topologies are similar. Each tenant
Similarly, for the count, NH+SM and V-Sight exhibit the
generates single TCP traffic at different sending rates. We
average values of 4974.20 and 38.91, respectively. In sum-
measure the vStatistics collected from NH+SM and V-Sight
mary, V-Sight exhibits much better accuracy (by two orders
and present the RMSEs of the statistics from the two cases
of magnitude) than NH+SM because NH+SM does not have
and the actual values. The vStatistics from V-Sight could
any mechanism to provide isolated statistics. This result
contain errors when the calculation algorithms use the pf s
implies that the statistics virtualization algorithms of V-
in edge switches and the statistics of the pf or pp contain the
Sight successfully provide isolated statistics from pStatistics.
statistics of multiple tenants (§3.2). In other words, when the
calculation uses statistics from other switches, it can result In addition, Fig. 13 shows the vStatistics over time,
in errors. In addition, the pStatistics used for the calculation which the tenant controller receives in response to its re-
are retrieved in advance, not at the requested time, which quests from NH+SM and V-Sight. The monitoring results
can also lead to errors (§3.3). The actual values used for error of tenant 2’s vf and vp installed in the center switch of
calculation are measured from hosts that send and receive the linear topology are shown as representative results.
the traffic within each tenant’s VN. Each point represents the number of quantities or counts
processed since the previous statistics reply. The hatched
7. To the best of our knowledge, existing network hypervisors (in- area in the graphs depicts the actual values.
cluding open-source network hypervisors) do not have a complete
network monitoring framework, so we choose Libera, which is up to Figs. 13a and 13b show the vf statistics of quantity
date and open-source, for comparison with V-Sight. and count, respectively, and Figs. 13c and 13d show the

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 13

Statistics transmission delay (ms) 25000 25000

Statistics transmission delay (ms)


2500 2500 Native Native
Native NH+SM Native NH+SM

Statistics messages
Statistics messages

(bytes per second)


(bytes per second)
2000 2000 20000 NH+SM 20000 NH+SM
V-Sight V-Sight
1500 1500 V-Sight V-Sight
15000 15000
1000 1000
500 500 10000 10000
15 15
5000 5000
10 10
5 5
0 0
0 0 6 12 18 24 30 2 4 8 16 32
6 12 18 24 30 6 12 18 24 30
Number of connections Number of connections
Number of connections Number of connections
(a) linear. (b) fat-tree.
(a) linear (flow entry). (b) linear (port).
Fig. 15. Control channel traffic consumption (bytes per second).
Statistics transmission delay (ms)

Statistics transmission delay (ms)


2000 2000
Native NH+SM Native NH+SM
1500 1500

1000
V-Sight
1000
V-Sight
port, respectively (Figs. 14c and 14d). The delay in V-Sight
500 500 improves by 14 times (port statistics, 2 connections) to 269
20 20
times (port statistics, 32 connections).
15 15
10 10 In detail, the transmission delay in NH+SM increases in
5 5
0 0 proportion to the number of TCP connections (even by over
2 4 8 16 32 2 4 8 16 32
Number of connections Number of connections 2 s) because the number of pStatistics required for vStatistics
(c) fat-tree (flow entry). (d) fat-tree (port). increases as the number of connections increases. Subse-
quently, the vStatistics are returned to the tenant controller
Fig. 14. Average statistics transmission delay (ms). only after the corresponding pStatistics are collected. Con-
versely, V-Sight disaggregates the pStatistics transmission
vp statistics for RX quantity and TX quantity, respectively. routines from the statistics virtualization, thereby reducing
We omit the graphs for RX and TX counts for vp statistics this delay.
because they look similar to the quantity results of the vp
statistics. In short, the vStatistics provided in NH+SM are 4.3.2 SDN-NV overheads
aggregated values of tenants that are significantly different We compare the transmission delay between V-Sight and
from the actual values. For example, the mean absolute Native to examine the SDN-NV overheads. In the linear
errors (MAEs) of the flow entry statistics for the count (Fig. topology, Native exhibits 2.8 and 1.5 ms for flow entry and
13b) of NH+SM and V-Sight are 612% and 2%, respectively, port statistics transmission delay, respectively (Figs. 14a and
and other results show similar tendencies on MAEs. By 14b). The delays in V-Sight are 3.4 times higher, on average,
using V-Sight, the tenant controllers can receive statistics than those of Native. Further, for the fat-tree topology,
very close to the actual values, thus enhancing the accuracy Native exhibits 4.6 and 2.37 ms delays for flow entry and
of the network management and optimization tasks of the port statistics transmission (Figs. 14c and 14d), respectively.
tenant controllers. The results of V-Sight are 1.09 times (flow statistics, 2 con-
nections) to 6.69 times (port statistics, 2 connections) higher
4.3 Transmission Delay than those of Native.
Although the delays of V-Sight are higher than those of
Transmission delay is measured on both the linear and fat- Native, note that all the values are lower than 20 ms. In
tree topologies. The results are analyzed in regard to two cri- comparison, the default monitoring intervals of the ONOS,
teria: 1) V-Sight performance improvement—a comparison Floodlight, and OpenDayLight are 5, 10, and 15 s, respec-
between NH+SM and V-Sight, and 2) SDN-NV overheads— tively. Therefore, we believe that the transmission delay of
a comparison between Native and V-Sight. To measure V-Sight, which is 19.36 ms at maximum, is acceptable.
the transmission delay, we modify the ONOS controller to
report the sending time of the statistics request message 4.4 Control Channel Consumption
and the receiving time of the statistics reply. Except for this
Similar to the transmission delay, the control channel con-
timestamping, the ONOS is not modified in any other way.
sumption for statistics is also evaluated under the linear
Because the PN, network hypervisors, and tenant con-
and fat-tree topologies. Two criteria are used: V-Sight per-
trollers execute on separate physical servers, we measure
formance improvement and SDN-NV overheads as in §4.3.
the RTT between two servers. The average and 95% tail RTT
are 0.15 ms and 0.18 ms, respectively, for all pairs of servers. 4.4.1 V-Sight performance improvement
Further, we confirm that no bottleneck occurs on network Fig. 15 shows the control channel consumption for statistics
connections between servers. transmission in both topologies. The consumption increases
in proportion to the number of connections because the
4.3.1 V-Sight performance improvement number of pf s to be monitored increases accordingly. In
In the linear topology, V-Sight consumes 9.35 and 4.68 ms, the linear topology (Fig. 15a), V-Sight improves the con-
on average, for flow entry and port statistics transmission, trol channel consumption by approximately 1.9 times on
respectively (Figs. 14a and 14b). The delays in V-Sight average. In the fat-tree topology (Fig. 15b), the average con-
improve by 46 times (flow entry statistics, 6 connections) sumption of V-Sight is 1.44 times less than that of NH+SM.
to 454 times (port statistics, 30 connections) compared with This improvement is due to the benefit of the aggregated
those in NH+SM. For the fat-tree topology, V-Sight exhibits pCollector that merges the individual statistics messages
9.75 and 7.29 ms of transmission delay for flow entry and (§3.4).

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 14

5000 5000 15 20
NH V-Sight NH V-Sight
4000 4000 15

CPU cycle (%)

CPU cycle (%)


Time skew (ms)

Time skew (ms)


10

3000 3000 10

5
2000 2000 5

1000 1000 0 0
6 12 18 24 30 2 4 8 16 32

0 0 Number of connections Number of connections


6 12 18 24 30 6 12 18 24 30 (a) linear (b) fat-tree
Number of connections Number of connections
(a) linear (flow entry). (b) linear (port). Fig. 17. CPU cycle usage (%).

5000 5000 indicate that the average time skews in all cases of network
4000 4000
connections are equal to or less than 2500 ms in both topolo-
Time skew (ms)
Time skew (ms)

gies. This implies that V-Sight replies to the tenant controller


3000 3000 within 2.5 s, which is half of the request interval of the tenant
2000 2000 controllers. Consequently, the tenant controller, at least, does
not receive statistics from the previous statistics request;
1000 1000
therefore, the accuracy overhead does not jeopardize the
0 0 accuracy of vStatistics itself.
2 4 8 16 32 2 4 8 16 32
Number of connections Number of connections
(c) fat-tree (flow entry). (d) linear (port). 4.6 CPU and Memory Usage

Fig. 16. Time skews of pStatistics cache (ms).


V-Sight inevitably incurs additional computational resource
consumption, especially CPU and memory resources in the
4.4.2 SDN-NV overheads network hypervisor (e.g., pStatistics cache and pCollectors).
The CPU and memory usage are measured with similar
Comparing V-Sight with Native, V-Sight consumes 107%
settings for the transmission delay and control channel
and 93% of Native’s control channel traffic in the linear
consumption evaluations. We compare the results of NH
and fat-tree topologies, respectively, which implies that the
and V-Sight.
consumption of V-Sight is comparable to that of Native.
Furthermore, in the fat-tree topology with few network
4.6.1 CPU usage
connections, V-Sight is even better than Native. This is
because V-Sight only monitors the switches that have the Fig. 17 shows the average CPU cycle usage of NH and V-
pStatistics required for vStatistics. The fat-tree topology has Sight during the evaluation based on the network topology
20 switches (Fig. 12b), and multiple paths are available and number of connections. Regardless of the topologies
between every host pair. In this topology, when the number (Figs. 17a and 17b), the CPU cycles are proportional to the
of connections is small, not all switches are used for packet number of connections for both NH and V-Sight. Compar-
forwarding and, consequently, not for vStatistics. ing the two, V-Sight is expected to use more CPU cycles
In Native, however, the tenant controller monitors all the than NH because V-Sight runs additional threads, such as
switches in the PN. Therefore, request and reply messages pCollectors. However, surprisingly, V-Sight consumes, on
are generated for all switches regularly. In V-Sight, trans- average, 0.6% and 0.9% fewer CPU cycles than NH in the
mission disaggregation controls the creation of pCollectors linear and fat-tree topologies, respectively.
toward the required pf s. Thus, pCollectors are created only The reason for the CPU usage results is because V-
for the required pf s, and the statistics request/reply mes- Sight prevents the unnecessary operations of tenant con-
sages are not created for switches that are not used. trollers. Specifically, we find that the tenant controller in
the experiments periodically collects flow entry and port
statistics to confirm whether the network policies, such as
4.5 Time Skew of pStatistics Cache the installed flow entries or the configured settings on the
The experiment results of previous sections (§4.3 and §4.4) network devices, are consistent between the controller and
show that V-Sight successfully improves the transmission the network.
delay and control channel consumption. The improvements However, NH, without a network monitoring scheme,
come with a time skew, which is the interval between does not reply to such statistics requests (i.e., mostly an-
the times when a vStatistics request arrives and when the swering that there is nothing in the switches). Thus, the
pStatistics required for the vStatistics request are stored in tenant controller considers this situation as inconsistent. The
the cache. When this timing in pStatistics cache becomes controller then removes or re-installs the flow entries, which
longer, the pStatistics required for the vStatistics are col- leads to the repeated installation of flow entries already
lected from the PN much in advance; thus, the calculated existing in the PN. Thus, NH repeatedly processes the re-
vStatistics could be out-of-date. installation messages from the tenant controllers. On the
Fig. 16 shows the time skews for the number of network other hand, V-Sight eliminates these repetitions because it
connections in the linear and fat-tree topologies, plotted provides accurate and timely statistics to tenant controllers.
with average and 95% confidence intervals. The results For this reason, even if V-Sight adds additional design and

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 15

TABLE 6
200 200
Memory consumption (MB)
Total amount of control channel consumptions of VN controller (KB).

Memory consumption (MB)


NH V-Sight NH V-Sight
150 150

Native NH V-Sight
100 100
Flow entry addition 4.10 5510.54 9.43
50 50
Flow entry removal 0 23.16 0
0 0
6 12 18 24 30 2 4 8 16 32
Number of connections Number of connections the CPU cycles of tenant controllers and control channel
(a) linear. (b) fat-tree. consumption (between tenant controllers and network hy-
Fig. 18. Memory consumption (MB). pervisor) are measured. The Native, NH, and V-Sight cases
are compared in the linear topology of one tenant having a
30 single TCP connection.
Cumulative distribution (%)

100
TCP thorughput (Gbps)

Fig. 19a shows the TCP throughput in the data plane.


20 We omit the first 5 s, which is typically the period for
50
congestion window convergence. First, the results indicate
10 Native that the throughput of NH fluctuates more than that of
NH
V-Sight
Native NH V-Sight Native. The 90% tail throughput values of Native, NH, and
0 0 V-Sight are 25.9, 18.7, and 25.6 Gbps, respectively. Even
0 100 200 400 500
0 50 100 150 200 250
CPU cycle (%) after approximately 250 s, the throughput of NH decreases
Time (s)
(b) CPU cycle of tenant controller to zero, while Native and V-Sight show relatively constant
(a) TCP throughput (Gbps).
(%). throughput values throughout the experiment. In addition,
the ranges between the lowest and highest TCP throughput
Control channel consumption (KB/s)
Control channel consumption (KB/s)

400 5
of V-Sight and Native are 3.5 and 3.2 Gbps, respectively,
300 Native NH V-Sight
4
Native NH V-Sight
while that of NH is 17.6 Gbps, which implies that 1) the
200
3
performance of V-Sight is similar to Native and 2) the
100

2
improvement in variance over that of NH is about 5.5 times.
10

5 1
The reason for this TCP throughput improvement in V-
0 0
Sight is the same as the reason discussed in Section4.6.1.
Because NH does not provide correct statistics to the tenant
0 50 100 150 200 250 0 50 100 150 200 250
Time (s) Time (s)
controller, the flow entries used for transmitting packets are
(c) Control channel consumption (d) Control channel consumption re-installed. When they are removed or re-installed in the
for flow entry addition (KB/s). for flow entry removal (KB/s). middle of packet processing, the TCP throughput shows
high variation and a value of even zero in NH. On the other
Fig. 19. Effects of V-Sight on tenants. hand, V-Sight provides timely and correct statistics similar
to Native, and thus, its TCP throughput also becomes simi-
implementation overheads to the NH, the CPU cycle is lar to that of Native.
improved.
Next, Fig. 19b shows the cumulative distribution func-
tion of the CPU cycles of the tenant controller. The results
4.6.2 Memory usage
show that the Native and V-Sight cases exhibit similar dis-
Fig. 18 depicts the memory consumption of V-Sight and NH. tributions. The average CPU cycle consumptions of Native,
In the results, two cases exhibit memory consumption of NH, and V-Sight are 23.94%, 47.25%, and 22.09%, respec-
between 130 and 150 MB for both topologies. On average, tively. V-Sight improves the CPU utilization of the tenant
V-Sight consumes 0.99 and 6.06 MB more memory than controller about 2.14 times over that of NH by removing the
NH in the linear and fat-tree topologies, respectively. This flow entry inconsistency situations.
consumption comes from the additional structures in V-
Lastly, Table 6 summarizes the total amount of control
Sight such as the pStatistics cache.
channel consumptions, which are categorized into flow
However, at some points (e.g., 18 and 30 connections
entry addition and flow entry removal. Figs. 19c and 19d
in the linear topology), V-Sight consumes lesser memory
show the control channel consumptions according to time.
than NH. This result also comes from the same reasons why
For flow entry addition, NH consumes significantly higher
the CPU consumption is improved (§4.6.1). NH and V-Sight
volumes of control channel traffic, i.e., 1344.03 and 584.36
store the messages from the tenant controllers to distinguish
times over Native and V-Sight, respectively (Table 2), be-
the flow entries and policies between the tenants for VN
cause of the repetitive flow entry installation, which can
isolation. Therefore, when the messages for setting up flow
be confirmed from NH’s repetitive spike patterns in Fig.
entries and policies are generated repeatedly, NH sometimes
19c. On the other hand, Native and V-Sight consume control
consumes more memory than V-Sight.
channel traffic only at the beginning. In terms of flow entry
removal (Table 6 and Fig. 19d), Native and V-Sight do not
4.7 Effects of V-Sight on Tenants consume any traffic, while NH consumes 23.16 KB.
We investigate the effects of V-Sight on tenants in practice. In comparison to Native, V-Sight consumes 2.3 times
Because network monitoring is used for routing (flow entry more bandwidth for flow entry installation because V-Sight
installation), we measure the TCP throughput. In addition, is a network hypervisor and it inherently has longer control

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 16

channel messages [13], [17]; but, note that the network 5.4 Integration with Cloud Orchestration Platforms
hypervisor provides address virtualization and topology One of the interesting future research is integrating V-Sight
virtualization for isolation between tenants while Native with cloud orchestration platforms for practically deploying
does not. V-Sight. Several cloud orchestration platforms (e.g., Open-
Stack and Kubernetes) provide plug-in type solutions for
SDN systems [56] to manage the underlying network in-
frastructures and control the network connections between
5 D ISCUSSION
the containers of cloud infrastructures. Since V-Sight is atop
5.1 Consideration of In-band Network Telemetry of SDN-based solutions, it can be readily integrated with
such solutions.
Unlike existing monitoring approaches, in-band network
telemetry (INT) provides custom packet-level network mon- 5.5 Consolidated Monitoring Framework for In-network
itoring abilities by allowing the collection and reporting Computing
of network states according to user-defined operations in Another promising research direction is about building a
network switches [47], [48]. For example, a network user consolidated and unified monitoring framework for in-
who wants to know certain states inside a network (e.g., the network computing. For upcoming beyond 5G and 6G
queue length of each switch) sends an instruction to INT- systems, the network resources are expected to perform both
capable devices to notify them of the types of states to be network functions (e.g., packet forwarding) and computing
collected. The devices then embed the requested states in functions offloaded from hosts [57]. For example, several
packets, and the hosts can receive the in-network states [49]. studies proposed offloaded training on deep learning mod-
However, INT-based monitoring requires INT-capable els that accelerates the speeds [58], [59]. In that system, the
hardware devices such as a field programmable gate array monitoring framework should consider both the network
(FPGA) or P4 programmable switch, while V-Sight exists side and the computational resources of the network devices
as a software component that operates on SDN-compatible (such as CPUs, memories, and ASICs). Such consolidated
switches that are common nowadays. In addition, INT can- monitoring can lead to highly optimized decision on net-
not deliver per-tenant isolated statistics itself. In terms of work devices (e.g., routing considering both network and
NH, existing NHs do not support P4 or FPGA to enable computational resource capacities).
INT. However, several approaches for making P4 and other
devices on NV exist [50], [51], [52], [53]. So, if INT could be 6 C ONCLUSION
available on NV, it also could be a part of V-Sight framework
We present V-Sight, the first comprehensive network moni-
as a means to collect custom network statistics. Then, V-
toring framework in SDN-NV. V-Sight makes it possible to
Sight designs for isolated statistics and resource-efficient
isolate statistics between VNs, reduce statistics transmission
monitoring schemes could work for INT as well.
delays, and scale control channel consumption. To this end,
V-Sight introduces statistics virtualization, transmission dis-
aggregation, and pCollector aggregation. We implement
5.2 Isolated Statistics using P4 V-Sight and evaluate its key performance characteristics
in terms of statistics virtualization accuracy, transmission
Using P4, network operators can implement custom opera-
delay, and control channel consumption. Furthermore, we
tions on hardware switches. This could open the possibility
present the time skew of pStatistics cache, CPU and memory
for isolating statistics from the switch-side, not from the
usage, and effects of V-Sight on tenants, which implies that
network hypervisor. To enable this, switches should be able
V-Sight attains a level comparable to network monitoring in
to distinguish which tenant each packet belongs to, for
a non-virtualized SDN.
counting packets per tenants. A challenge is that the method
In future research, we plan to investigate the reliability
of classifying tenants at the packet level differs for each
and performance of VNs through traffic engineering, work-
proposed network hypervisor (e.g., VLAN [12], address
ing on the isolated and timely statistics from V-Sight.
rewriting [13], MPLS [18], and TID embedding [44]). We
leave this topic as our future work, which can provide an
accurate and flexible monitoring scheme in SDN-NV. ACKNOWLEDGMENTS
We thank Gi Jun Moon listed in [1] who made substantial
contributions to earlier work. This research was supported
5.3 Machine Learning for Monitoring on SDN-NV in part by the National Research Foundation of Korea
(NRF) funded by the Ministry of Science, ICT (MSIT) (NRF-
Recently, various studies attempted to use machine learning 2019H1D8A2105513) and also by Basic Science Research
to predict network traffic in the data plane. For example, Program through the NRF funded by the Ministry of Ed-
various studies [54], [55] used neural networks to predict the ucation (NRF-2021R1A6A1A13044830). This work was also
existing network traffic or load on paths for better network partially supported by Institute of Information & communi-
management, such as routing. Currently, V-Sight calculates cations Technology Planning & Evaluation grant funded by
vStatistics based on non-isolated pStatistics. If prediction on the Korea government (MSIT) (2015-0-00280, (SW Starlab)
the tenant statistics is possible, the accuracy of vStatistics Next generation cloud infra-software toward the guarantee
could be enhanced. Thus, it is an open question of future of performance and security SLA), and a Korea University
research to investigate. Grant.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 17

R EFERENCES [20] M. Li, C. Chen, C. Hua, and X. Guan, “CFlow: A learning-based


compressive flow statistics collection scheme for SDNs,” in ICC
[1] G. Yang, H. Jin, M. Kang, G. J. Moon, and C. Yoo, “Network 2019-2019 IEEE International Conference on Communications (ICC).
monitoring for SDN virtual networks,” in IEEE INFOCOM 2020- IEEE, 2019, pp. 1–6.
IEEE Conference on Computer Communications. IEEE, 2020, pp. [21] J. Suh, T. T. Kwon, C. Dixon, W. Felter, and J. Carter, “Open-
1261–1270. Sample: A low-latency, sampling-based measurement platform for
[2] B. Wang, Z. Qi, R. Ma, H. Guan, and A. V. Vasilakos, “A survey on commodity SDN,” in 2014 IEEE 34th International Conference on
data center networking for cloud computing,” Computer Networks, Distributed Computing Systems. IEEE, 2014, pp. 228–237.
vol. 91, pp. 528–547, 2015. [22] A. Tootoonchian, M. Ghobadi, and Y. Ganjali, “OpenTM: Traffic
[3] T. Koponen, K. Amidon, P. Balland, M. Casado, A. Chanda, matrix estimator for openflow networks,” in Passive and Active
B. Fulton, I. Ganichev, J. Gross, P. Ingram, E. Jackson et al., “Net- Measurement, A. Krishnamurthy and B. Plattner, Eds. Springer
work virtualization in multi-tenant datacenters,” in 11th USENIX Berlin Heidelberg, 2010, pp. 201–210.
Symposium on Networked Systems Design and Implementation (NSDI [23] Z. Su, T. Wang, Y. Xia, and M. Hamdi, “FlowCover: Low-cost flow
14). USENIX Association, 2014, pp. 203–216. monitoring scheme in software defined networks,” in 2014 IEEE
[4] D. Firestone, “VFP: A virtual switch platform for host SDN in the Global Communications Conference, 2014, pp. 1956–1961.
public cloud,” in 14th USENIX Symposium on Networked Systems [24] S. R. Chowdhury, M. F. Bari, R. Ahmed, and R. Boutaba, “Payless:
Design and Implementation (NSDI 17). USENIX Association, 2017, A low cost network monitoring framework for software defined
pp. 315–328. networks,” in 2014 IEEE Network Operations and Management Sym-
[5] G. Yang, B.-y. Yu, H. Jin, and C. Yoo, “Libera for programmable posium (NOMS). IEEE, 2014, pp. 1–9.
network virtualization,” IEEE Communications Magazine, vol. 58, [25] A. Roy, D. Bansal, D. Brumley, H. K. Chandrappa, P. Sharma,
no. 4, pp. 38–44, 2020. R. Tewari, B. Arzani, and A. C. Snoeren, “Cloud datacenter SDN
[6] Z. B. Pfaff, B. Lantz, B. Heller, C. Barker, D. Cohn, D. Talayco, D. Er- monitoring: Experiences and challenges,” in Proceedings of the
ickson, E. Crabbe, G. Gibb, G. Appenzeller, J. Tourrilhes, J. Pettit, Internet Measurement Conference 2018, ser. IMC ’18. Association
K. Yap, L. Poutievski, M. Casado, M. Takahashi, M. Kobayashi, for Computing Machinery, 2018, pp. 464–470.
and N. McKeown, “Openflow switch specification,” Open Network- [26] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma,
ing Foundation, vol. 28, pp. 1–56, 2011. and S. Banerjee, “DevoFlow: Scaling flow management for
[7] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, high-performance networks,” SIGCOMM Comput. Commun. Rev.,
C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese et al., “P4: Pro- vol. 41, no. 4, pp. 254–265, 2011.
gramming protocol-independent packet processors,” ACM SIG- [27] P.-W. Tsai, C.-W. Tsai, C.-W. Hsu, and C.-S. Yang, “Network mon-
COMM Computer Communication Review, vol. 44, no. 3, pp. 87–95, itoring in software-defined networking: A review,” IEEE Systems
2014. Journal, vol. 12, no. 4, pp. 3958–3969, 2018.
[8] B. Ahlgren, C. Dannewitz, C. Imbrenda, D. Kutscher, and [28] Z. Bozakov and P. Papadimitriou, “AutoSlice: Automated and
B. Ohlman, “A survey of information-centric networking,” IEEE scalable slicing for software-defined networks,” in Proceedings
Communications Magazine, vol. 50, no. 7, pp. 26–36, 2012. of the 2012 ACM Conference on CoNEXT Student Workshop, ser.
[9] M. Jammal, T. Singh, A. Shami, R. Asal, and Y. Li, “Software CoNEXT Student ’12. Association for Computing Machinery,
defined networking: State of the art and research challenges,” 2012, pp. 3–4.
Computer Networks, vol. 72, pp. 74–98, 2014. [29] H. Yamanaka, E. Kawai, S. Ishii, and S. Shimojo, “AutoVFlow:
[10] P. Costa, M. Migliavacca, P. Pietzuch, and A. L. Wolf, “NaaS: Autonomous virtualization for wide-area openflow networks,” in
Network-as-a-service in the cloud,” in 2nd USENIX Workshop on 2014 Third European Workshop on Software Defined Networks, 2014,
Hot Topics in Management of Internet, Cloud, and Enterprise Networks pp. 67–72.
and Services (Hot-ICE 12). USENIX Association, 2012.
[30] A. Blenk, A. Basta, and W. Kellerer, “HyperFlex: An SDN virtual-
[11] R. Sherwood, G. Gibb, K.-K. Yap, G. Appenzeller, M. Casado, ization architecture with flexible hypervisor function allocation,”
N. McKeown, and G. M. Parulkar, “Can the production network in 2015 IFIP/IEEE International Symposium on Integrated Network
be the testbed?” in Proceedings of the 9th USENIX Conference on Management (IM), 2015, pp. 397–405.
Operating Systems Design and Implementation, vol. 10. USENIX
[31] X. Jin, J. Gossels, J. Rexford, and D. Walker, “Covisor: A composi-
Association, 2010, pp. 365–378.
tional hypervisor for software-defined networks,” in 12th USENIX
[12] D. Drutskoy, E. Keller, and J. Rexford, “Scalable network virtu-
Symposium on Networked Systems Design and Implementation (NSDI
alization in software-defined networks,” IEEE Internet Computing,
15). USENIX Association, 2015, pp. 87–101.
vol. 17, no. 2, pp. 20–27, 2012.
[13] A. Al-Shabibi, M. De Leenheer, M. Gerola, A. Koshibe, G. Parulkar, [32] G. Yang, Y. Yoo, M. Kang, H. Jin, and C. Yoo, “Bandwidth isolation
E. Salvadori, and B. Snow, “OpenVirteX: Make your virtual SDNs guarantee for SDN virtual networks,” in IEEE INFOCOM 2021-
programmable,” in Proceedings of the third workshop on Hot topics in IEEE Conference on Computer Communications. IEEE, 2021.
software defined networking. Association for Computing Machinery, [33] A. Blenk, A. Basta, M. Reisslein, and W. Kellerer, “Survey on net-
2014, pp. 25–30. work virtualization hypervisors for software defined networking,”
[14] “POX controller.” Accessed: Nov. 17, 2020. [Online]. Available: IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 655–
https://noxrepo.github.io/pox-doc/html/ 685, 2015.
[15] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, [34] N. M. K. Chowdhury and R. Boutaba, “A survey of network
B. Lantz, B. O’Connor, P. Radoslavov, W. Snow et al., “ONOS: virtualization,” Computer Networks, vol. 54, no. 5, pp. 862–876,
towards an open, distributed SDN OS,” in Proceedings of the third 2010.
workshop on Hot topics in software defined networking. Association [35] I. Alam, K. Sharif, F. Li, Z. Latif, M. M. Karim, S. Biswas, B. Nour,
for Computing Machinery, 2014, pp. 1–6. and Y. Wang, “A survey of network virtualization techniques for
[16] J. Medved, R. Varga, A. Tkacik, and K. Gray, “Opendaylight: To- internet of things using SDN and NFV,” ACM Comput. Surv.,
wards a model-driven SDN controller architecture,” in Proceeding vol. 53, no. 2, Apr. 2020.
of IEEE International Symposium on a World of Wireless, Mobile and [36] M. Yu, L. Jose, and R. Miao, “Software defined traffic measure-
Multimedia Networks 2014. IEEE, 2014, pp. 1–6. ment with OpenSketch,” in Presented as part of the 10th USENIX
[17] G. Yang, B.-y. Yu, W. Jeong, and C. Yoo, “FlowVirt: Flow rule Symposium on Networked Systems Design and Implementation (NSDI
virtualization for dynamic scalability of programmable network 13). USENIX Association, 2013, pp. 29–42.
virtualization,” in 2018 IEEE 11th International Conference on Cloud [37] X. T. Phan and K. Fukuda, “SDN-Mon: Fine-grained traffic moni-
Computing (CLOUD). IEEE, 2018, pp. 350–358. toring framework in software-defined networks,” Journal of Infor-
[18] G. Yang, B.-Y. Yu, S.-M. Kim, and C. Yoo, “LiteVisor: A network mation Processing, vol. 25, pp. 182–190, 2017.
hypervisor to support flow aggregation and seamless network [38] N. L. M. van Adrichem, C. Doerr, and F. A. Kuipers, “Open-
reconfiguration for VM migration in virtualized software-defined NetMon: Network monitoring in openflow software-defined net-
networks,” IEEE Access, vol. 6, pp. 65 945–65 959, 2018. works,” in 2014 IEEE Network Operations and Management Sympo-
[19] B.-y. Yu, G. Yang, H. Jin, and C. Yoo, “WhiteVisor: Support of sium (NOMS), 2014, pp. 1–8.
white-box switch in SDN-based network hypervisor,” in 2019 [39] H. Tahaei, R. Salleh, S. Khan, R. Izard, K.-K. R. Choo, and N. B.
International Conference on Information Networking (ICOIN). IEEE, Anuar, “A multi-objective software defined network traffic mea-
2019, pp. 242–247. surement,” Measurement, vol. 95, pp. 317–327, 2017.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2021.3089225, IEEE
Transactions on Cloud Computing
IEEE TRANSACTIONS ON CLOUD COMPUTING 18

[40] E. F. Castillo, O. M. C. Rendon, A. Ordonez, and L. Zambenedetti distributed machine learning with in-network aggregation,” in
Granville, “IPro: An approach for intelligent SDN monitoring,” 18th USENIX Symposium on Networked Systems Design and Imple-
Computer Networks, vol. 170, p. 107108, 2020. mentation (NSDI 21). USENIX Association, 2021, pp. 785–808.
[41] T. Benson, A. Anand, A. Akella, and M. Zhang, “MicroTE: Fine
grained traffic engineering for data centers,” in Proceedings of
the Seventh COnference on Emerging Networking EXperiments and
Technologies. Association for Computing Machinery, 2011.
[42] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. V.
Gyeongsik Yang [M] received his B.S., M.S.,
Madhyastha, “Flowsense: Monitoring network utilization with
and Ph.D. degrees in computer science from
zero measurement cost,” in International Conference on Passive and
Korea University, Seoul, Republic of Korea, in
Active Network Measurement. Springer, 2013, pp. 31–41.
2015, 2017, and 2019, respectively.
[43] A. Basta, A. Blenk, S. Dudycz, A. Ludwig, and S. Schmid, “Ef- He worked as a research intern at Microsoft
ficient loop-free rerouting of multiple SDN flows,” IEEE/ACM Research Asia in 2018. He is currently a re-
Transactions on Networking, vol. 26, no. 2, pp. 948–961, 2018. search professor at Korea University. His re-
[44] B.-y. Yu, G. Yang, K. Lee, and C. Yoo, “AggFlow: Scalable and search interests include network virtualization,
efficient network address virtualization on software defined net- distributed deep learning, data center systems,
working,” in Proceedings of the 2016 ACM Workshop on Cloud- and SDN.
Assisted Networking. Association for Computing Machinery, 2016,
pp. 1–6.
[45] B. Lantz, B. Heller, and N. McKeown, “A network in a laptop:
rapid prototyping for software-defined networks,” in Proceedings
of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks.
Association for Computing Machinery, 2010, pp. 1–6.
[46] V. Gueant, “iperf-the tcp, udp and sctp network bandwidth Yeonho Yoo received his B.S. degree in com-
measurement tool,” Iperf. fr., 2017, Accessed: Aug. 06, 2020. puter science from Kookmin University, Seoul,
[Online]. Available: https://iperf.fr/ Republic of Korea, in 2017.
[47] C. Kim, A. Sivaraman, N. Katta, A. Bas, A. Dixit, and L. J. Wobker, He is currently pursuing his M.S. degree with
“In-band network telemetry via programmable dataplanes,” in Korea University, Seoul, Republic of Korea. His
ACM SIGCOMM Demo, 2015. current research interests include network virtu-
alization, distributed deep learning, and SDN.
[48] “Cisco nexus 9000 series nx-os programmability guide,
release 9.2(x) - inband network telemetry [cisco nexus 9000
series switches],” 2020, Accessed: Nov. 02, 2020. [Online].
Available: https://www.cisco.com/c/en/us/td/docs/switches/
datacenter/nexus9000/sw/92x/programmability/guide/
b-cisco-nexus-9000-series-nx-os-programmability-guide-92x/
b-cisco-nexus-9000-series-nx-os-programmability-guide-92x
chapter 0100001.html
[49] T. P. W. Group, “In-band network telemetry (int) dataplane Minkoo Kang received his B.S. and M.S. de-
specification,” 2020, Accessed: Oct. 24, 2020. [Online]. Avail- grees in computer science from Korea University
able: https://github.com/p4lang/p4-applications/blob/master/ in 2019 and 2021, respectively.
docs/INT v2 1.pdf He is currently a researcher at KIST. His
[50] S. Han, S. Jang, H. Choi, H. Lee, and S. Pack, “Virtualization in research interests include programmable data-
programmable data plane: A survey and open challenges,” IEEE plane, data center networking, and distributed
Open Journal of the Communications Society, vol. 1, pp. 527–534, 2020. deep learning.
[51] C. Zhang, J. Bi, Y. Zhou, and J. Wu, “HyperVDP: High-
performance virtualization of the programmable data plane,”
IEEE Journal on Selected Areas in Communications, vol. 37, no. 3,
pp. 556–569, 2019.
[52] M. Saquetti, G. Bueno, W. Cordeiro, and J. R. Azambuja, “P4VBox:
Enabling P4-based switch virtualization,” IEEE Communications
Letters, vol. 24, no. 1, pp. 146–149, 2019.
[53] P. Zheng, T. Benson, and C. Hu, “P4visor: Lightweight virtualiza- Heesang Jin received his B.S. and M.S. de-
tion and composition primitives for building and testing modular grees in computer science from Kookmin Uni-
programs,” in Proceedings of the 14th International Conference on versity, Seoul, Republic of Korea, in 2018 and
emerging Networking EXperiments and Technologies. Association Korea University, Seoul, Republic of Korea, in
for Computing Machinery, 2018, pp. 98–111. 2020, respectively.
[54] R. Alvizu, S. Troia, G. Maier, and A. Pattavina, “Matheuristic He is currently a researcher at ETRI. His re-
with machine-learning-based prediction for software-defined mo- search interests include network virtualization,
bile metro-core networks,” Journal of Optical Communications and traffic engineering, blockchain, and SDN.
Networking, vol. 9, no. 9, pp. D19–D30, Sep 2017.
[55] J. Xie, F. R. Yu, T. Huang, R. Xie, J. Liu, C. Wang, and Y. Liu,
“A survey of machine learning techniques applied to software
defined networking (SDN): Research issues and challenges,” IEEE
Communications Surveys Tutorials, vol. 21, no. 1, pp. 393–430, 2019.
[56] Z. Tao, Q. Xia, Z. Hao, C. Li, L. Ma, S. Yi, and Q. Li, “A survey of
virtual machine management in edge computing,” Proceedings of
the IEEE, vol. 107, no. 8, pp. 1482–1499, 2019. Chuck Yoo [M] received his B.S. and M.S. de-
[57] Y. Tokusashi, H. T. Dang, F. Pedone, R. Soulé, and N. Zilberman, grees in electronic engineering from Seoul Na-
“The case for in-network computing on demand,” in Proceedings of tional University, and M.S. and Ph.D. degrees in
the Fourteenth EuroSys Conference 2019. Association for Computing computer science from the University of Michi-
Machinery, 2019. gan, Ann Arbor.
[58] M. Kang, G. Yang, Y. Yoo, and C. Yoo, “TensorExpress: In-network He worked as a researcher at Sun Microsys-
communication scheduling for distributed deep learning,” in 2020 tems. Since 1995, he has been at the College
IEEE 13th International Conference on Cloud Computing (CLOUD). of Informatics at Korea University, where he is
IEEE, 2020, pp. 25–27. currently a professor. His research interests in-
[59] A. Sapio, M. Canini, C.-Y. Ho, J. Nelson, P. Kalnis, C. Kim, clude server/network virtualization and operating
A. Krishnamurthy, M. Moshref, D. Ports, and P. Richtarik, “Scaling systems.

2168-7161 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 18:12:59 UTC from IEEE Xplore. Restrictions apply.

You might also like