0% found this document useful (0 votes)
108 views2 pages

Netvision: Towards Network Telemetry As A Service: 2018 Ieee 26Th International Conference On Network Protocols

Uploaded by

军刘
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views2 pages

Netvision: Towards Network Telemetry As A Service: 2018 Ieee 26Th International Conference On Network Protocols

Uploaded by

军刘
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

2018 IEEE 26th International Conference on Network Protocols

NetVision: Towards Network Telemetry as a Service


Zhengzheng Liu∗†‡ , Jun Bi∗†‡ , Yu Zhou∗†‡ , Yangyang Wang∗†‡ , Yunsenxiao Lin∗†‡
∗ Institute
for Network Sciences and Cyberspace, Tsinghua University
† Department
of Computer Science, Tsinghua University
‡ Beijing National Research Center for Information Science and Technology (BNRist)

Abstract—In-band Network Telemetry (INT) can provide fine-  


 
 ! 

 

  
 
 "


 )# 
grained and accurate device-level telemetry metrics. Nonetheless,
INT can track only a small ratio of devices and links and     *+#

,

embedding telemetry data into normal packets brings high 
  
  

 /+# 
overhead and high operation complexity. Hence, we present   
NetVision, a powerful proactive network telemetry platform with   
 
 "


high coverage and high scalability.  
      
 
 &' (     -+
  
I. I NTRODUCTION #
$ !


 
 "  .+  
With the rapid growth of devices and protocols, networks  &' 
  
  
 
become rather sophisticated. A great number of network  
 
 ! 
failures such as misconfigurations, hardware malfunction, and %
 


  

 
software bugs occur very frequently. Therefore, operators are 

 
eager for an efficient telemetry approach to promptly detect
and locate the common and complex network issues (e.g., high Fig. 1. Architecture and workflow of NetVision.
latency, TCP incast, load imbalance, routing black hole).
Prior to the emergence of programmable data planes, op- Instead of passively capturing normal packets, NetVision ac-
erators have to indirectly detect network for delayed and tively sends suitable amount and format of probes to support
imprecise telemetry data through terminals at network edges. on-demand analysis with low overhead. Owing to simplistic
However, In-band Network Telemetry (INT) [1] based on and flexible routing control offered by Segment Routing (SR),
programmable data planes to a great extent relieves this we can customize the probe path by changing SR labels at
dilemma. In INT, normal packets contain header fields inter- runtime. In that way, we can easily achieve the comprehensive
preted as telemetry instructions. These instructions tell INT- network view with carefully designed probe paths. A single
capable devices which data to collect and write into normal vantage point is enough for telemetry through cycled probe
packets as they traverse the network. By this means, INT can paths without much cooperation complexity. What’s more,
directly capture much more fine-grained and accurate device- designed telemetry instructions can tell the device which data
level telemetry metrics (e.g., hop latency and queue length). is desired instead of all data. At last, device-level telemetry
metrics can be offered by customizing packet processing logic
In spite of many benefits, INT has some inherent drawbacks
based on programmable data planes.
as well. Firstly, INT detection scope is limited, which is
hard to obtain the comprehensive network view. That is Our contributions are as follows: (1) We propose an efficient
because telemetry paths and metrics have to be preassigned by proactive network telemetry platform with high coverage and
operators and can not be altered at runtime. As a result, INT high scalability. (2) We provision a suite of network telemetry
can only monitor certain telemetry data of specific paths. As a primitives to introduce simplicity and convenience for oper-
consequence, INT is very likely to miss some important net- ators. (3) We design the specified duplex-stack probe which
work failures. In one word, INT can only track a small ratio of comprises of forwarding stack and telemetry stack for flexibly
devices and links (low coverage). Moreover, due to monitoring forwarding and monitoring telemetry data respectively. (4) For
network by normal packets, this brings high extra telemetry sake of probe overhead and coverage compromise, we also
traffic overhead. Meanwhile, the payload ratio reduces a lot offer a probe path algorithm.
because of encapsulating telemetry instructions and data into II. D ESIGN
each normal packet. In addition, each telemetry path requires
two cooperated edge switches to communicate with an INT A. Design Overview
monitor. One encapsulates telemetry instructions and the other As is shown in Figure 1, operators can specify telemetry
extracts telemetry data. The synchronization and coordination requirements to NetVision and gain telemetry results without
between the two edge switches are very complicated. That is to manipulating the underlying infrastructure. The NetVision
say, INT brings high telemetry overhead and high deployment telemetry platform consists of four main components: The
and maintenance complexity (low scalability). telemetry antenna, the telemetry coordinator, the telemetry an-
To solve the low coverage and low scalability problems alyzer, and the telemetry service provider. General procedures
of INT, we propose NetVision, a powerful proactive network to apply the platform are as follows: Telemetry applications
telemetry platform with high coverage and high scalability. (e.g., traffic engineering and network visualization) enforce

978-1-5386-6043-0/18/$31.00 ©2018 IEEE 247


DOI 10.1109/ICNP.2018.00036

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:45:38 UTC from IEEE Xplore. Restrictions apply.
C1 C2
Applications Telemetry Service API Description
Q = PathQuery("1:1", "2:1") Measure the path RTT between the port 1 of
End-to-end Latency
.Select("PathRTT") switch 1 and the port 1 of switch 2, which
Measurement P4 Switch
.Where(“PathTrace=[1:1,3:1,3:2,2:2,2:1]") conforms to the limitation of the path trace.
Q = PathQuery("*:*", "*:*") To discover the link black during the period of A1 A2 A3 A4
Link Black Hole .Select("Path") 10ms, we specify the path length equals 1 and
Discovery .Where(“PathLength==1 and PassedProbes==0") determine whether the link is a black hole Host
 
.Period("10ms") through the count of passed probes.
Calculate the average ingress packet
Real-time Packet Q = NodeQuery("1:1") transmission rate of port 1 of switch 1. Normal Traffic
T1 T2 T3 T4
Transmission Rate .Select("InPktRate") Besides we also support packet count, byte
Calculation .Period("5s") count, byte rate, packet reception or dropout Burst Traffic
rate, hop latency and port utilization.
Q = NodeQuery("*:*")
Locate switch Id of the node black hole by
Node Black Hole .Select("SwitchId")
discovering the port not transmitting packets
Discovery .Where("OutPktCount==0") H11 H12 H21 H22 H31 H32 H41 H42
during the period of 5ms.
.Period("5ms")

Fig. 4. HTTP latency topology.


Fig. 2. Applications of the telemetry service API.

HTTP Request Latency (ms)

Queueing Latency (ms)


     

  T2
A2
T1

    
 
  
  

! " ! " ! " 


   #  $ $

Fig. 3. Duplex-stack probe format. Time (seconds) Time (seconds)

(a) HTTP Request Latency (b) Queueing Latency


various telemetry policies to NetVision by telemetry service Fig. 5. Latency telemetry.
API exposed by the telemetry service provider. The telemetry
service provider then allocates telemetry tasks to the telemetry capacity supported by realistic programmable devices and
coordinator, which is responsible for generating the appropri- enormous network size, we are able to partition one big
ate amount and contents of probes. Afterward, the telemetry network into several smaller networks based on the label
antenna injects probes into the underlying vantage server and capacity and choose a vantage point for each one. We can
collects probes. Finally, the telemetry analyzer analyses the collect and aggregate telemetry information from each vantage
received probes and passes the analyzed telemetry results to point. Then we can treat each link as two opposite direction
the telemetry service provider, through which applications gain edges in the graph. According to Euler Theorem of graph
the desirable telemetry data. theory, such directed graph is supposed to have Euler Circuit.
We can calculate the circuit by Hierholzer algorithm in linear
B. Telemetry Primitives time, O(E) [2], which can be our probe path.
To simplify telemetry policy enforcement for operators,
we provide a suite of convenient and expressive telemetry III. E VALUATION
primitives, including telemetry metadata and query primitives. We use Mininet to simulate a Fat-Tree Topology (Fig. 4).
Telemetry metadata comprises ports, timestamps, latencies H22 continuously responds to periodic HTTP requests from
and so forth for switch nodes, paths. Besides, we design H11 . While H32 occasionally sends a traffic burst to H22 .
query primitives to adopt the metadata for RTT, forward loop, Using probe timestamps in T1 , we can achieve latency without
congestion, etc. As shown in Figure 2, we offer some typical switch time synchronization. Meanwhile thanks to SR, we can
applications of the telemetry service API which are end-to- specify back path same as income path without asymmetric
end latency measurement, real-time packet transmission rate routing path problem. According to above two goodness, more
calculation and link or node packet black hole discovery. accurate and nearly instantaneous latency can be offered.
As Fig. 5 shows, queuing latencies of A2 and T2 have
C. Duplex-stack Probe same trend as HTTP request latency while T1 has no obvious
As Fig. 3 shows, the probe mainly contains two label stacks. change. We can infer that A2 and T2 is on the shared path.
SR stack comprises an outport label list and the list length for ACKNOWLEDGEMENT
flexible probe forwarding. INT stack also comprises a label list This research is supported by National Key R&D Program
and the list length for telemetry records. INT label is composed of China (2017YFB0801701) and National Science Founda-
of switch ID, metadata bitmap determining metadata type tion of China (No.61472213). Jun Bi is the corresponding
followed by a telemetry metadata value list. During the probe author.
transmission, SR labels are popped to forward probes while R EFERENCES
INT labels are pushed to record telemetry data.
[1] C. Kim et al., “In-band network telemetry,” https://github.com/p4lang/p4-
D. Probe Generating Algorithm applications/blob/master/docs/INT.pdf, 2018.
[2] H. Fleischner, “X. 1 algorithms for eulerian trails,” Eulerian Graphs and
To reduce probe overhead as far as possible, we utilize Related Topics:Part 1(Annals of Discrete Mathematics), vol. 2, no. 50,
a simple probe path algorithm. Considering limited label pp. 1–13, 1991.

248

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:45:38 UTC from IEEE Xplore. Restrictions apply.

You might also like