0% found this document useful (0 votes)
15 views15 pages

Test Coverage For Network Configurations

Uploaded by

0discard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Test Coverage For Network Configurations

Uploaded by

0discard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Test Coverage for Network Configurations

Xieyang Xu1 , Weixin Deng1 , Ryan Beckett2 , Ratul Mahajan1,3 , and David Walker4
1 University of Washington 2 Microsoft 3 Intentionet 4 Princeton University
Abstract— We develop NetCov, the first tool to reveal the engineers likely do not know what the old test suite is or
which network configuration lines are being tested by a suite is not testing for the updated network.
of network tests. It helps network engineers improve test Recent work has proposed data plane coverage [44] to re-
suites and thus increase network reliability. A key challenge veal testing gaps. It shows which data plane elements, such as
in its development is that many network tests test the data forwarding rules, are exercised by a test suite. However, well-
plane instead of testing the configurations (control plane) di- tested data plane does not imply well-tested configurations.
arXiv:2209.12870v1 [cs.NI] 26 Sep 2022

rectly. We must be able to efficiently infer which configuration Data plane elements are the output of network’s configura-
elements contribute to tested data plane elements, even when tions (which define its control plane) and the current operating
such contributions are non-local (on remote devices) or non- environment (failures, external routing information). Testing
deterministic. NetCov uses an information flow graph based a given data plane only tests configuration elements that are
model that precisely captures various forms of contributions exercised in that particular environment. Other configuration
and a scalable method to lazily infer contributions. Using it, elements are not tested. We demonstrate this empirically via
we show that an existing test suite for Internet2 (a nation-wide a scenario where testing all data plane elements leaves over
backbone network in the USA) covers only 26% of the con- half of configuration lines untested.
figuration lines. The feedback from NetCov makes it easy to We develop configuration coverage to provide comprehen-
define new tests that improve coverage. For Internet2, adding sive and precise feedback to network engineers on test suite
just three such tests covers an additional 17% of the lines. quality. Our goal is to identify exactly which configuration
lines are being tested and which ones are not. Further, we
1 Introduction want to consider all configuration elements, not only those
that contribute to the current data plane. The precise nature
As critical infrastructure, networks must be highly reliable but, of this feedback (untested configuration lines) helps improve
unfortunately, network outages are quite common. A primary tests—add tests that target untested lines—which in turn can
culprit is networks’ reliance on complex, low-level commands improve network reliability. This is similar to how code cov-
embedded in their configuration. The configurations dictate erage tools help improve tests and software quality [9, 11, 22].
how routers select best paths and forward traffic. Day-to- A major challenge we face is that many network tests do not
day updates to them is error-prone, leading to outages that exercise configurations directly. Instead, they reason about the
knock off important online services (e.g., banking), ground data plane elements produced by configurations. We need to
airplanes, and disable critical communication (e.g., emergency infer the configuration elements that contribute to the a tested
calls) [3, 31, 32, 36, 42]. data plane element. This inference is complicated because
To improve network reliability, automatic testing and veri- contributions can be non-local and non-deterministic. In a
fication of configurations is becoming commonplace. Today, distributed control plane, a piece of tested routing information
network operators have at their disposal many tools with in- may have been propagated and transformed multiple times
creasing sophistication that can scale to large networks and along its path, and both local and non-local configurations
check various aspects of network behavior [5, 23, 37, 46, 47]. may have contributed to its existence. For example, the path
However, using such tools is not sufficient by itself; one attributes of a BGP route is shaped by routing policies on each
must also use them effectively. Outages can occur despite and every hop that it traversed. Further, not all contributions
automated testing when the test suite is poor and does not are deterministic. For instance, any one of possibly multiple
cover key aspects of network configuration. This was the case sub-prefixes can lead to the route of an aggregate prefix. We
with the massive Facebook outage during which Facebook, must scalably account for local and non-local contributions
WhatsApp, Instagram, and Oculus were unavailable for six and for non-deterministic contributions.
hours [33]. Current tools have pushed the limits of what can Our solution is to model the contribution between config-
be tested but left open the question of what need be tested. uration elements and data plane elements as an information
Without tool support, it is difficult for engineers to know if flow graph. An IFG is a directed acyclic graph (DAG) where
they are effectively testing network configurations. In indus- vertices denote network elements and edges denote contribu-
trial networks with hundreds of thousands of lines of config- tions. In addition to direct contributions from configuration
urations, engineers’ understanding of network behavior and elements to data plane elements, we also model contribu-
dependencies is necessarily incomplete. It is even harder to tions between data plane elements (from predecessors to suc-
evolve an existing test suite after the network evolves because cessors). For instance, a BGP route contributes to the BGP

1
message that derived from it. Indirect contributions are thus best paths to the same destination. The device selects the best
modeled by multi-hop paths in the DAG. When contributions one(s) based on the relative preference of the protocols and
exhibit non-determinism, we use special disjunctive nodes to stores the selection in its main RIB. Information in the main
organize possible DAG paths that may contribute to a given RIB is used to forward packets.1
data plane element. Network engineers can control many aspects of the com-
We build a tool called NetCov based on this model. It putation above using device configuration. This includes the
annotates which configuration lines and logical elements are routing protocol instances that are running; the peering be-
tested by a given test suite and produces aggregated coverage tween instances; the destination prefixes that are announced
statistics. To efficiently map tested data plane element to by each routing protocol instance; how routing messages are
the set of contributing configuration elements, it materializes transformed prior to sending (export policy) and upon recep-
the IFG lazily, instead of tracking contributions proactively tion (import policy); and the preference function for best path
(during data plane generation). This design avoids the cost to selection. Naturally, thus, how the network forwards packets
compute and store contributions for transient or untested data is intimately dependent on device configurations.
plane elements. NetCov is open-sourced on GitHub [1]. Given the importance of configurations to correct network
We evaluate NetCov on Internet2, a nation-wide backbone behavior, network engineers use automatic testing and verifi-
network in the USA, and on synthetic data center networks. cation to find bugs and gain confidence in their correctness.
We show that test suites proposed in prior work can have poor Network tests come in two flavors. Data plane tests analyze
coverage. For instance, we found that the three tests proposed the computed data plane state (i.e., RIBs), e.g., checking that
by Bagpipe [41] covered only 26% of the configuration lines node A can reach B and that route to a particular destina-
of Internet2. We also show how surfacing untested configura- tion is present at node C. Control plane tests directly analyze
tion elements suggests new tests that will improve coverage. device configuration, e.g., checking that the import policy
By adding just three such tests to the Internet2 test suite based blocks routing messages for private address space (such as
on NetCov’s feedback, we could improve coverage to 43%, 10.0.0.0/8) and BGP peerings are correctly configured.
and more similar tests can be added to further increase cover-
age. NetCov performs reasonably well. The time to compute
coverage is 1.2 hours for the largest network that we study, 3 Configuration Coverage: Overview
which has over 2 million forwarding rules. This time is an
Network engineers today create data and control plane tests
order of magnitude less than the time to execute tests.
based on past outages and their knowledge of which behaviors
Stepping back, we note that networking is not alone in
are important to test. There are no tools to provide feedback
its reliance on configuration. Today, a lot of infrastructure
on how well they are testing configurations and which aspects
and distributed applications are deployed by composing ex-
of the configuration are untested. We aim to build such a tool.
isting components using configuration (e.g., infrastructure
Given the complexity of real-world networks, it is difficult for
deployment using Terraform, and application deployment us-
humans to know if they have covered all important elements of
ing containers and service meshes). These configurations are
configurations. As with software, high coverage is necessary
central to correct behavior, which is why there is an intense
but not sufficient for a good test suite. In addition to exercising
focus on testing them properly [21,40]. As for networks, there
all key behaviors, the tests must also properly assert that those
are no tools to help engineers discover how well the config-
behaviors match intent. This latter task is not our focus.
urations are tested. The techniques developed in our work,
We now outline how we compute which configuration ele-
the IFG-based contribution tracking and its lazy traversal, can
ments are covered by a suite of data and control plane tests.
provide a starting point toward better testing of infrastructure
and distributed application configuration as well.
3.1 Defining coverage
2 Background on Network Testing We deem a configuration element to be covered if it i) is
tested directly by a control plane test; or ii) contributes to
In networks with distributed control planes, each device runs the production of a data plane element tested by a data plane
one or more routing protocol (e.g., BGP, OSPF) instances. test. For now, assume that contributions are deterministic. We
Each instance exchanges routing messages with its neighbor- discuss non-deterministic contributions in the next section.
ing instances. Routing messages contain attributes of paths Figure 1 illustrates configuration coverage as a result of
that the sender is using to various destinations. A routing a data plane test. It shows parts of the two routers’ configu-
instance may learn multiple paths to the same destination via
1 In reality, for fast forwarding, routers have a forwarding information
different neighbors. It selects the best one (or multiple best
base (FIB), which maps each main RIB destination to its outgoing interface,
ones if multipath routing is enabled) based on its policy and by recursively resolving next hop information (which may be an IP address).
stores that path in its protocol RIB (routing information base). The difference between main RIB and FIB is not material for our work, and
Multiple routing protocol instances on a device may have we use the term main RIB for the table that has forwarding information.

2
ration. R1’s configuration defines one interface (Lines 1-2)
and one BGP peer (192.168.1.2, which is R2’s address), and
it specifies the import and export policy to use. The import
policy (R2-to-R1 at Lines 6-11) denies routing messages for
a particular prefix and sets the preference for another.
R2’s configuration defines two interfaces, a BGP peer (R1)
and routing policies. At Line 13, it states that the prefix
10.10.1.0/24 should be announced to BGP peers iff it is in
the main RIB.2 In our example, 10.10.1.0/24 will be in the
main RIB as it corresponds to the eth1’s prefix. (Address
statements like Line 4 encode the IP address and prefix length.
For eth1, given the address 10.10.1.1 and prefix length of 24,
the prefix is 10.10.1.0/24.) Routers add interface prefixes to
the "connected" protocol RIB, from where those prefixes can
enter the main RIB. The resulting RIBs on the two routers
are shown in the figure. Each entry includes the next hop and
source routing protocol ("conn" = connected). Figure 1: An example network with routing tables and con-
figurations. The highlighted configuration lines are covered
Suppose the entry for 10.10.1.0/24 at R1 was tested by when the route to 10.10.1.0/24 is tested at R1.
a data plane test. The covered configuration elements are
highlighted. On R1, the BGP peer configuration and import 3.2 Our approach
policy binding (Lines 3-4) are covered because the tested
entry came via that peering and passed through that policy. While it is straightforward to identify configuration elements
Parts of the routing policy R2-to-R1 relevant to the tested covered by a control plane test, it is not for data plane tests.
state (Lines 6, 9-11) are also covered. The interface definition Data plane tests analyze the "output" of the control plane, and
(Lines 1-2) is covered because it enables the BGP peering to we need a scalable way to compute which configuration ele-
be established. In contrast, the export policy R1-to-R2 and ments contributed to tested data plane state. The relationship
unexercised parts of R2-to-R1 (Lines 7-8) are not covered. between these inputs and outputs is complex. How a particular
RIB entry comes about relies on many configuration elements
There are covered configuration elements at R2 as well. across multiple devices. The need to map tested outputs to
These include the interface definitions—eth0 enables the BGP input space sets computation of configuration coverage apart
edge and 10.10.1.0/24 was announced due to eth1—and BGP from data plane coverage and software coverage, for both of
peering, the export policy, and the BGP network statement. which the coverage domain is the same as test domain.
To motivate our approach to solving this problem, let us
Alternative definitions of coverage. One may consider an
first sketch two strawman approaches. One potential approach
alternative definitions of coverage that disregards non-local
is to express control plane computation declaratively, e.g., in
configuration elements. But we posit that including non-local
Datalog. This enables identification of contributing inputs
elements is more meaningful. These elements, such as the
for a given output using a form of backward-reasoning [43,
BGP network statement on R2’s Line 13, are just as key to
48]. However, network control plane computations can be
the existence of 10.10.1.0/24 at R1 as the local elements.
quite complex (e.g., non-monotonic behaviors [16,34]). While
Another definition of coverage is based on mutation [4]: declarative encodings may work in special cases [27], it is
a configuration line is covered if its mutation alters the test hard to get high-fidelty, performant encodings for the general
result. This definition will report an additional class of con- case. That is why most control plane analysis tools use an
figuration elements as covered—configuration elements that imperative approach [12, 29, 30, 46].3
de-prioritize (or reject) the competitors of the tested data plane Another potential approach is to use simulation-based for-
element. Mutation-based coverage tends to be significantly ward reasoning, i.e., simulate the control plane (imperatively)
harder to compute [24], and its results can be hard to inter- and track which configuration elements feed into each part of
pret. In developing the first tool in this space, we decided to the data plane state. However, this approach has scalability
focus on a simpler, more direct definition of coverage. We limitations. Network simulation is time and memory inten-
will explore more sophisticated definitions in the future. sive [12, 30, 46], and it will become significantly worse if it
needed to track all necessary information along each hop.
3 Batfish [12], a widely used control plane analysis tool, originally used
2 Different
router vendors have different semantics for BGP network state- Datalog to encode network control planes but switched to imperative simula-
ments. We are assuming Cisco semantics. tions due to expressiveness and performance challenges.

3
Figure 2: Subset of IFG for the Figure 1 example, to track configuration elements on both routers that contribute to the tested RIB
entry (F1). Colors denote different fact types: data plane state (white), configuration elements (yellow), auxiliary facts (green).

Our approach is based on two observations. First, for the elements that led to the tested RIB entry (F1).
purposes of computing coverage, we do not need a full com- We do not track IFG dependencies proactively but infer
putational model of the control plane. We need to only track them on-demand based on control plane semantics, using
which configuration elements contribute to tested data plane a mix of backward-forward reasoning. Backward inference
state (i.e., taint analysis [38]), not the exact input-output re- infers the parent (tail) of the edge from its child (head). The
lationship; and we need to reason only about the stable state information in child nodes is not enough to fully recover the
(i.e., the the of devices once they have settled on best paths), parent nodes, but is often enough to select them from the
not the transient states. Data plane testing [19, 23, 25, 26, 45] known stable state. For instance, we can compute the BGP
assumes that the analyzed state is stable. Our second observa- RIB entry F5 from the main RIB entry F1—the main RIB
tion is that the stable state contains enough information for us entry indicates that its source routing protocol is BGP, and
to infer contributions of configuration elements after the fact, we thus look up the BGP RIB for 10.10.1.0/24.
based on the semantics of the control plane. This inference
is vastly cheaper than tracking contributions towards all data Lookup-based inference does not always work. For in-
plane state entries, independent of whether they are tested. stance, given a BGP message which has passed through an
To model and infer contributions to the stable state, we use import policy, we can not compute backwards which pol-
an information flow graph (IFG). Figure 2 shows a simplified icy terms of the import policy were exercised (F10 ← F20).
subset of IFG for the Figure 1 example. Each node is a fact Another parent of F10, the pre-import BGP message (F11)
and each arrow denotes direct information flow from the tail cannot be looked up either because it is not part of the input
to head. IFGs have three types of facts: i) data plane state, and needs to be computed on-the-fly. To address these limi-
ii) configuration elements; and iii) auxiliary facts that help tations, we combine backward and forward inference. When
connect the previous two types. a parent can not be directly looked up, we first look up the
The main RIB entry 10.10.1.0/24 at R1 (F1) is derived prerequisites of the parent. For instance, we can look up F7
from the corresponding BGP RIB entry (F5), which in turn is based on F10. Next, we use targeted simulations to compute
derived from the BGP message from R2 (F10). This message non-existing facts and to select relevant facts exercised in
exists because of the BGP edge between R1 and R2 (F13), a control plane process or data plane process. For instance,
the source message sent by R2 (F11), and the relevant con- given the BGP route at R2 (F7), we simulate its processing
figuration element within import policy (F20). R2 sent the through the export policy, which allows us to derive the pre-
BGP message because of the same BGP edge (F13), its ex- import BGP message (F11) and find the policy term exercised
port policy elements (F22), and the BGP RIB entry (F7). This during the export process (F22). Once F11 is computed, we
BGP RIB entry exists because of the configuration element conduct another targeted simulation to discover the policy
(F23) and the RIB entry (F3), which exists because of the term exercised in the import process (F20). Unlike a full con-
connected route (F8). The BGP edge (F13) exists because of trol plane simulation, these targeted simulations are fast. They
the configuration elements that define the peering (F16, F17) have limited scope (e.g., best path selection is not simulated)
and paths between R2 and R1 that enable the BGP session and are done only for messages of interest, not all messages.
to be established. The paths depend on the RIB entries (F2 By combining backward and forward inference, atop the
and F4, respectively), the contributions to which can be simi- stable state IFG, we can scalably discover all covered config-
larly traced. In this manner, the IFG captures all configuration uration elements. We describe this approach in detail next.

4
4 Design of NetCov Network fact Information flow

NetCov takes as input configuration files, data plane state Configuration element (c) None
(protocol RIBs, main RIB and active routing edges) of the fi ← r j
network. The data plane state may be pulled from live net- Main RIB entry ( f )
fi ← r j , fk
work or produced by a control plane analysis tools [12,30,46].
Data ri ← m j
In addition, NetCov takes as input what is tested: data plane
plane ri ← c j
entries that are tested by data plane tests, and configuration el- Protocol RIB entry (r)
state ri ← f j , ck
ements that are tested by control plane tests. This information
ri ← {r j1 , ...}, ck
is produced by network testing tools [12, 44].
Based on these inputs, NetCov computes which config- ACL entry (a) ai ← {ci1 , ...}
uration elements are covered. The core of this computation mi ← r j , ek , {cl1 , ...}
efficiently mapping a data plane fact to configuration elements Routing message (m)
mi ← m j , ek , {cl1 , ...}
that contribute to it. We describe this computation next. Aux- ei ← {c j1 , ...}
iliary Routing edge (e)
ei ← {c j1 , ...}, {pk1 , ...}
4.1 Information flow model
Path (p) pi ← { f j1 , ...}, {ak1 , ...}
IFGs are directed acyclic graphs whose nodes denote network
facts and edges denote information flow between facts. Table 1 Table 1: Information flow model: Types of facts and all possi-
shows the types of network facts modeled by NetCov and the ble dependencies for each type. {t, ...} denotes a set of facts.
information flow between different types.
Our model has thee types of facts: configuration elements, tion. Finally, path facts depends on main RIB facts and ACL
data plane state, and auxiliary facts. Data plane state has facts that impact routing traffic along the way.
three subtypes: main RIB entries, protocol RIB entries, and
access control list (ACL) entries. Auxiliary facts help con-
cisely capture information flow dependencies from configura- 4.2 Inferring the IFG on demand
tions to data plane state. They have three subtypes: routing
Based on the information flow model, NetCov uses a
edges, routing messages, and paths that carry routing mes-
backward-forward inference framework to lazily material-
sages. Routing message facts represent messages between
ize the IFG from any set of facts whose coverage need to
routing protocol instances across devices as well as within a
be tracked. The framework can be abstracted with a set of
device, i.e., redistribution [10]. This uniform treatment is a
inference rules and an iterative construction algorithm. Each
modeling convenience. In reality, explicit messages are not
inference rule is function that takes a materialized IFG node
exchanged during redistribution (though redistribution is sub-
as input and materializes a set of its ancestor nodes as well
ject to routing policies akin to messages between cross-device
as the edges the allows the ancestors to reach the input node.
routing instances).
These nodes and edges will be merged into the materialized
The last column of Table 1 shows how information flows
IFG by the construction algorithm. The implementation of
among different types of facts. A main RIB entry stems from
these functions uses one or both of the lookup-based inference
a protocol RIB entry and optionally another main RIB entry
and simulation-based inference. Let us elaborate.
(when its next hop is an IP address whose corresponding out-
put interface needs further resolution). A protocol RIB fact Lookup-based inference. The computation of a control
stems from a routing message (for protocols such as BGP), plane is lossy. For instance, while a main RIB entry may
a configuration element (for connected interfaces and static be derived from a BGP RIB entry, we cannot infer the com-
routes), a main RIB entry accompanied with a configuration plete BGP RIB entry from the main RIB entry because BGP
element (such as when a BGP network statement populates a specific attributes (e.g., AS-path) are not preserved in the
main RIB entry into BGP RIB) or a set of RIB entries accom- main RIB.
panied with a configuration element (for aggregate routes). To address this challenge, our inference leverages the stable
ACLs facts stem from configuration facts and have no other state. It first infers a subset of attributes based on control plane
dependencies. Routing messages stem from a RIB fact or an- semantics. This partial inference provides enough information
other message (e.g., post-import-policy message depends on for us to look up the complete entry in the stable state.
pre-import-policy message), and they also depend on routing Figure 3 shows the simplified function to infer the BGP
edges and routing policy configurations. Inter-device routing RIB entry that led to a main RIB entry. Based on control plane
edges stem from paths that enable sessions to be established semantics, if a main RIB entry indicates its source protocol to
and configuration facts that define peerings; Intra-device rout- be BGP, it must have stem from a BGP RIB entry on the same
ing edges stem from configuration facts that define redistribu- router with the same prefix and nexthop attributes (Lines

5
1 def infer_from_main_rib_entry(f, stable_state): 1 def infer_from_bgp_message(m, stable_state):
2 if not (f is MainRibEntry and f.protocol == 2 if not (m is BgpMsg and m.is_post_import):
,→ 'bgp'): 3 return []
3 return [] 4 bgp_edge = stable_state.bgp_edges.lookup(
4 bgp_entry = stable_state.bgp_rib.lookup( 5 recv_host=m.host
5 host=f.host, 6 send_ip=m.nexthop
6 prefix=f.prefix, 7 )
7 nexthop=f.nexthop, 8 origin_entry = stable_state.bgp_rib.lookup(
8 status='BEST' 9 host=bgp_edge.send_host,
9 ) 10 prefix=r.prefix,
10 return [(bgp_entry, f)] 11 status='BEST'
12 )
Figure 3: Rule to infer BGP RIB entry from main RIB entry. 13 pre_import_msg, export_clauses =
,→ policy_simulation(
5-7). Besides, the BGP RIB entry should have been selected 14 input=origin_entry,
as the best route (Line 8). Such information is enough to 15 policy=bgp_edge.export_policy
uniquely identify the parent within the known stable state. 16 )
The return value (Line 10) is a list of tuples denoting the IFG 17 _, import_clauses = policy_simulation(
edges materialized by this rule. 18 input=pre_import_msg,
19 policy=bgp_edge.import_policy
Simulation-based inference. Lookup-based inference is not 20 )
enough to materialize the IFG. Some facts are not present 21 return [(pre_import_msg, m), (bgp_edge, m)] +
in the stable state (e.g., routing messages), and some facts 22 [(cl, m) for cl in import_clauses] +
do not contain enough information to uniquely identify their 23 [(origin_entry, pre_import_msg), (bgp_edge,
parents. We use local simulations to complement lookup- ,→ pre_import_msg)] +
based inference. But simulations can only be performed in the 24 [(cl, pre_import_msg) for cl in export_clauses]
forward direction, i.e., to compute a fact using simulations,
we first need to know its parent. We use a generalized version
of lookup-based inference to discover grandparent facts of a Figure 4: Rule to infer ancestors of a post-import BGP mes-
known fact, and then use simulations with the grandparents sage.
to infer their children (i.e., parents of the original fact).
the dirty nodes derived from the previous iteration (Line 8).
Figure 4 shows the simplified inference rule that infers the The new nodes and edges inferred during such process are
ancestors of a post-import BGP message. Line 13 demon- collected and merged (with deduplication) into the IFG (Line
strates the use of simulation-based forward inference to com- 9-14). The computation repeats until no new facts can be
pute a missing parent fact on the fly. The two prerequisites to derived in an iteration.
simulate the BGP message–the grandparent BGP RIB entry
(origin_entry) and the BGP edge–are discovered via lookup-
based backward inference, on Line 8 and Line 4 respectively. 4.3 Handling uncertainty
The simulation returns the derived BGP message after apply-
There are situations where it is not certain which stable state
ing the routing policy, as well as the policy clauses exercised
facts contributes to a given fact. One such scenario is BGP
during the process. The second forward-simulation (Line 17)
aggregation, where a prefix (e.g., 10.10.0.0/16) is added to
is to discover the policy clauses that are hit during the import
the RIB iff at least one more of its more specific prefixes (e.g.,
process. The return value includes the inferred IFG edges
10.10.1.0/24) is present. When multiple more specifics are
that connect to the input node m as well as ones that connect
present, we do not know which one triggered the aggregate.
to parent pre_import_msg. The former corresponds to infor-
Another such scenario is when multiple paths are available
mation flow mi ← m j , ek , {cl1 , ...} in Table 1 and the latter
for a routing edge to be established, which can happen when
corresponds to mi ← r j , ek , {cl1 , ...}.
the network uses multipath routing. Here, we do not know
IFG construction. Next, we detail IFG materialization using which path is actually used by routing messages.
inference rules. Assume for now that the information flow We model such uncertainty using disjunctive nodes in the
is deterministic; the next section discusses how we handle IFG. This node points to the parent fact (e.g., the aggregated
non-determinism. RIB fact) and the multiple contributors to the parent point
As shown in Algorithm 1, the IFG initially contains only to this node. See Figure 5(a) for an example where a BGP
the nodes representing the tested data plane state facts from aggregate could be triggered by either of the two more spe-
the input and does not have any edges (Line 2). It is then cific prefixes. When our inference rules encounter uncertainty
iteratively expanded by applying inference rules on existing during IFG materialization, they produce a disjunctive node
nodes. In each iteration, all inference rules are applied to and attach all contributors to it as children.

6
Algorithm 1: IFG lazy materialization Type Purpose
Input: Initial nodes {vi }; Inference rules {φi : v 7→ {(ui , vi )}}; Interface Interface and its settings (e.g., addresses)
Output: Materialized IFG (V, E) BGP peer BGP peer settings (e.g., IP address, AS number)
Data: Stable state data plane state (main RIB and protocol RIBs); BGP peer group BGP peer settings inherited by one or more peers
Routing edges; Configuration elements; Route policy clause One clause in an export or import route policy
1 Procedure BuildIFG({vi }, {φi }) Prefix list List of prefixes, used in route policy clauses
2 V, E ← {vi }, ∅ Community list List of BGP communities for route policy clauses
3 V 0 ← {vi } // dirty nodes of previous iteration AS-path list List of AS-path expressions for route policy clauses
4 while |V 0 | > 0 do
5 V 00 ← ∅ // dirty nodes of current iteration
6 foreach c ∈ V 0 do Table 2: Configuration elements analyzed by NetCov.
7 foreach φ ∈ {φi } do
8 E 0 ← φ(c)
predicate of each IFG node on top of these variables. The
9 foreach (ui , vi ) ∈ E 0 do
10 if ui ∈
/ V then predicate of a fact depends on the predicate of its ancestors
11 V ← V ∪ {ui }, V 00 ← V 00 ∪ {ui } in the IFG: A normal node depends on the conjunction of its
12 if vi ∈
/ V then immediate parents, and a disjunctive node depends on the dis-
13 V ← V ∪ {vi }, V 00 ← V 00 ∪ {vi } junction of parents. Therefore the predicate of any IFG node is
14 if (ui , vi ) ∈
/ E then E ← E ∪ {(ui , vi )} ultimately composed of the variables associated with configu-
ration elements that lead to it, denoted as Γ(v) = F(x1 , . . . , xn ).
15 V 0 ← V 00 Figure 5(c) shows the predicates of IFG nodes in Figure 5(b).
16 return (V, E) We represent these Boolean predicates using Binary Decision
Diagrams (BDDs) [8] and build BDD predicates by traversing
the IFG. Once the predicates are built, we test graph reacha-
bility and logical necessity between each pair of configuration
facts and tested facts. Necessity ¬xi ⇒ ¬Γ(v) is equivalent to
unsatisfiability of ¬xi ∧ Γ(v). While (un)satisfiability is NP-
Complete in general cases, we note that it is efficient in our
case—it can be reduced to computing the cofactor Γ(v)|xi =0
and testing whether the cofactor is constant false, both of
(a) (b) (c) which are efficient using BDD operations.
Figure 5: Modeling uncertainty. (a) BGP aggregate (F1) has We further reduce the size of BDD predicates by precluding
two potential contributors. (b) F5 is weakly covered but F6 configuration facts that can reach tested facts via a path with
and F7 are strongly covered. (c) The predicates of IFG nodes. no disjunctive node, such as node F7 in Figure 5(b). These
configuration facts must be strongly covered so their necessity
When computing coverage, we must account for uncer- do not need to be tested. Besides, their validity variables can
tainty because the notion of coverage is different. Assume be replaced with constant true when building BDD predicates,
in Figure 5(a) that F1 was tested. If the uncertainty were not which will not affect the strong/weak classification of other
there, and F2 and F3 directly pointed to F1, the configuration configuration elements. We empirically find this heuristic to
elements that led to F2 and F3 would have been critical to be effective in reducing the number of variables used for weak
the outcome. But with the uncertainty, those configuration coverage computation.
elements are not critical. The configuration elements that led
to F2 could disappear without impacting F1.
To model such possibilities, we introduce the notion of 5 Implementation
weak coverage. A configuration element is weakly covered if
it contributes to a tested fact but its contribution is not critical. We implemented NetCov with 4,000 lines of Python code.
In Figure 5(b), assume that F1 is the tested fact. Here, F5 is A total of 18 lambdas (Python functions) encode the IFG
weakly covered; F1 can be derived without any contribution inference rules. NetCov uses Batfish [6] to extract configu-
from F5 because F3 can be derived via F6, which is enough ration elements from configuration files and to run targeted
to derive the disjunctive child of F1. F6 is strongly covered; simulations, and it uses CUDD [35] for BDD operations.
without it, neither F2 nor F3 can be derived and thus the dis- NetCov supports all major router vendors supported by Bat-
junctive node cannot be derived. F7 is also strongly covered fish, including Arista, Cisco, and Juniper. It builds a vendor-
because it contributes to F4, which is essential to F1. neutral representation of configuration elements using vendor-
After materializing the IFG, NetCov labels each covered specific information provided by Batfish. Types of configu-
configuration element as strong or weak. The label is deter- ration elements currently analyzed by NetCov are listed in
mined as follows. We first assign a Boolean variable to each Table 2.
configuration element in the IFG. Next, we build a Boolean NetCov may not consider all components of a device’s

7
configuration. One category of such components device man-
agement configuration (e.g., login settings), which does not
impact data or control plane functionality. The second cate-
gory is control plane components that are not currently mod-
eled by NetCov. This includes IPv6 (which is not modeled by
Batfish currently) and routing protocols other than BGP (e.g.,
OSPF). The presence of unconsidered components does not
imply that NetCov cannot be used for that network. As we
show in the next section, NetCov provides helpful coverage
information for parts that are considered.
After constructing the IFG, which yields information on (a) Line-level coverage. Green background denotes covered lines, and red
which configuration elements are covered, NetCov computes denotes uncovered lines. Some lines are collapsed for simplicity.
which lines are covered. Each element typically spans mul-
tiple configuration lines, and when an element is covered, it
deems all of those lines as covered.
Based on element and line coverage, NetCov produces
three main outputs. The first is a coverage report at the granu-
larity of individual lines (or elements). We produce this report
in the lcov format, which is supported by common code cov-
erage tools and enables users to visualize coverage results
as annotations on configuration files. See Figure 6(a) for an
example. The second is coverage aggregated at the file level,
(b) File-level aggregate coverage. The overall coverage is at top right, and
generated with the help of GNU LCOV [17]. See Figure 6(b) the coverage for individual files (devices) is in the table.
for an example. The third output is coverage aggregated by
the type of configuration element, which shows what fraction Figure 6: Example NetCov outputs.
of elements of each type are covered.
These outputs help users uncover testing gaps and improve shared within the same peer group. Peer-specific policies tend
their test suites in different ways. The aggregate results help to specify a list of allowed prefixes from this peer, and others
identify systematic gaps such as "router A is poorly covered" are used for sanity checking, preference setting, etc. Export
or "routing policy clauses are poorly covered." The line-level policies are similarly structured.
results help them zoom in to specific gaps and develop tests
that target them. The case study in the next section demon- Internet2’s configurations that we study have 96,672 lines
strates this test suite improvement process. (in Juniper’s JunOS format) across all routers. Of these, Net-
Cov’s coverage computation considers 64,886 lines. The bulk
of the unconsidered lines correspond to device management,
6 Case Studies IPv6, and IS-IS protocol.

We present case studies of using NetCov on two disparate We do not have the data plane state of Internet2, which
networks, one a wide-area backbone and another a datacenter. is needed to run data plane tests. We approximate it using
In each case, using realistic test suites, we show that NetCov Route Views [39], a repository of BGP routes from over two
provides insight into what is and is not covered and how these hundreds ASes worldwide. This data helps approximate BGP
insights help improve the test suites. messages that external peers of Internet2 send to it. Consider
a peer with AS number X. If we find a prefix P in Route-
Views with AS-path [A, X,Y ], we assume that the peer sends
6.1 Case Study I: The Internet2 backbone P to Internet2 with AS-path [X,Y ]. The existence of AS-path
Internet2 is a nation-wide network that connects over 60,000 [A, X,Y ] means that AS A must have a route to P with AS-
US educational, research and government institutions. The path [X,Y ], which it announces to its neighbors. If we find
routing design of Internet2 is typical of backbone networks. multiple AS-paths for a prefix, we pick the one where X is
It has 10 BGP routers spread across the country. The routers closest to the origin AS (the last entry in the path).
are organized as a single autonomous system (AS), and they We use these BGP messages that each peer sends to In-
establish iBGP full mesh on top of internal reachability pro- ternet2 as inputs to simulate Internet2’s control plane using
vided by the IS-IS protocol. The Internet2 routers connect Batfish. The data plane state produced by this simulation is
to 279 external BGP peers, and heavily use route import and a coarse approximation of the real version, but it suffices to
export policies. The import policy for an external peer has meet our goals of running data plane tests and characterizing
multiple policy statements, some specific to the peer and some configuration coverage.

8
6.1.1 Test suite coverage bgp peer/group interface routing policy prefix/community/as-path list
BlockToExternal 0.6%
NoMartian 0.9%
RoutePreference 24.7%
To study how NetCov analyzes coverage for realistic test Test Suite 26.1%
suites, we use the test suite proposed in Bagpipe [41]. It has All Lines
three tests to validate Internet2’s BGP configuration. 0% 20% 40% 60% 80% 100%
Coverage
• BlockToExternal: ensure that BGP routes with BTE com-
munity are not announced to any external (eBGP) peer. Figure 7: Coverage of the initial test suite broken down to
• NoMartian: ensure that incoming BGP messages from each individual test and configuration type.
external peers for prefixes in the private address space
("Martian") are rejected. policies), and even within this type, they cover a small frac-
tion. RoutePreference covered all four buckets but its overall
• RoutePreference: ensure that if multiple routes to the coverage is still limited.
same prefix are accepted from multiple external neigh- Finally, NetCov reports that 27.9% of configuration lines
bors, the selected route belongs to the most preferred are "dead code" that will never be exercised. They include de-
neighbor. The neighbor’s preference depends on com- fined BGP peer groups with no members and defined routing
mercial relationship [13]. Customers are most preferred, policies that are never used for any peer.4
followed by peers, and then providers. With 69% of BGP configurations, 85% of interfaces, 88%
of routing policies, and 57% of route attribute match lists
We implemented these tests using Batfish. BlockToExter-
being completely untested, this test suite is clearly under-
nal and NoMartian are control plane tests. BlockToExternal
testing the network. This leaves the network vulnerable to
evaluates all BGP export policies on a set of BGP routes
bugs in untested configurations elements. Prior to NetCov,
carrying the BTE community and asserts that the result be
it was not possible for network engineers to get any insight
rejection. We generate the test cases by sampling BGP routes
into the quality of their test suite. It was also not possible for
from the data plane state and attaching the BTE community to
them to get help toward systematically improving tests. We
them. NoMartian evaluates all BGP import policies on a set
demonstrate this test suite improvement process next.
of BGP routes destined for Martian addresses and asserts that
the results be rejection. RoutePreference is a data plane test.
It focuses on destination prefixes available via multiple neigh- 6.1.2 Coverage-guided test development
bors and asserts that their local preferences reflect commercial
NetCov’s feedback enables a test suite development process
relationship. We use CAIDA data [28] to infer commercial
that enables users to systematically improve coverage, which
relationship between Internet2 and its BGP neighbors.
helps test more critical aspects of the network and prevent
After running this test suite on Internet2, we find that it
outages. This process is iterative. In each iteration the user
covers only 26.1% of configuration lines across all devices.
first identifies specific testing gaps and then creates new tests
Only a tiny fraction of configuration lines (0.5%) are weakly
to target those gaps. We demonstrate the process using three
covered, so we do not separate weak/strong coverage for this
iterations that focus on different types of gaps.
case study; we will do that in the next one.
To help understand what is and is not covered in more detail, Iteration 1. We saw that routing policy coverage of NoMar-
NetCov enables network engineers to look at the data from tian test is low (Figure 7) despite that it checks the import
multiple perspectives. Figure 6(b) shows per-device coverage. policies for all external peers. To investigate, we look at the
We see notable variation across devices, from 11.8% to 40.5%. structure of Internet2 import policies and find that routers
As we show below, the test suite has systematic gaps, and the have a policy named SANITY-IN which is shared by the major-
cross-device variation stems from different devices having ity of external neighbors. Figure 6(a) shows this policy with
different fractions of covered configuration elements. annotated coverage. Each router has an independent copy of
Figure 7 shows the coverage broken down by the type of this policy, but the copies and the coverage results are identi-
configuration elements. For simplicity, we create four buckets cal across routers. Of the five clauses in the policy, the clause
of element types, as shown in the legend. The bottom bar block-martians starting at line 6,896 is the only clause that
shows the fraction of reachable configuration lines in each is covered. This coverage result confirms that the NoMartian
bucket. The "Test Suite" bar shows the covered fraction of test did its job, and more importantly, it revealed a systematic
those lines, and the top three bars show the coverage of in- 4 Per best practices, these lines should be deleted. Or, at a minimum, they
dividual tests. The total coverage of individual tests is 0.6%, should be tested lest someone start using an unused, erroneous policy. When
0.9% and 24.7% respectively. BlockToExternal and NoMar- it comes to testing, such lines can never be exercised by data plane tests,
tian cover only one type of configuration element (routing though control plane tests may be written for them.

9
testing gap–the other four classes of forbidden routes are not bgp peer/group interface routing policy prefix/community/as-path list
being tested. 0: Initial Test Suite 26.1%
Once we know the gap, the solution suggests itself. We 1: Add SanityIn 26.7%
added a new test, SanityIn, to enforce that the other four 2: Add PeerSpecificRoute 36.9%
classes of received BGP messages should be rejected. After 3: Add InterfaceReachability 43.0%
adding this test, we used NetCov to confirm that this testing All Lines
gap had been addressed. Routing policy coverage was im- 0% 20% 40% 60% 80% 100%
Coverage
proved by 0.6% and all five terms of SANITY-IN were covered
by the new test suite. The quantitative improvement is low
because SANITY-IN is just one of many policies in the network. Figure 8: Coverage improvement with test suite iterations.
With feedback from NetCov, network engineers can identify (strong/weak) bgp peer/group (strong/weak) routing policy
(strong/weak) interface (strong/weak) prefix/community/as-path list
testing gaps in other routing policies and add more tests in a
DefaultRouteCheck 81.8%
similar way.5
ToRPingmesh 82.1%
Iteration 2. BGP peer configuration coverage of RoutePref- ExportAggregate 80.7%
erence test in Figure 7 is surprisingly low, given that all ex- Test Suite 85.6%
ternal BGP peers are supposed to be checked. Upon further All Lines
investigation we find that the uncovered peers have permitted 0% 20% 40% 60% 80% 100%
Coverage
prefix-lists that do not overlap with other peers’ lists, which
left these peers untested.
We added a new test, PeerSpecificRoute, to check that BGP Figure 9: Coverage of synthetic datacenter network broken
announcements received from external peers should be ac- down to each individual test and configuration type.
cepted if their prefixes is in a peer-specific prefix list. This
test improved BGP peer coverage from 32% to 46%. The rest tool to develop new tests that meaningfully improve coverage.
of untested BGP peers are either not allowed to send BGP
routes to Internet2 or is intended for other internal use, such as 6.2 Case study II: Datacenter networks
monitoring and management. This test also improved prefix-
list coverage from 45% to 63%. The remaining of untested We study the coverage for data center networks which have
prefix-lists are mostly (30% out of 37%) ones that are defined a different topology and routing design. We create synthetic
by never referenced. fat-tree [2] networks with routers across three tiers. The leaf
routers at the bottom tier connect to hosts. Aggregation routers
Iteration 3. The low coverage of interface configuration in at the middle tier connect to leaf routers in a pod and to spine
Figure 7 reveals another testing gap. RoutePreference is the routers at the top tier. The spine routers connect to the wide
only test in the initial test suite that checks interface configu- area network (WAN). The WAN is not part of the tested
rations, and it only considers one category of interfaces–ones network. Each leaf router is assigned a /24 prefix which is
that are used to establish the tested BGP edges. Many other advertised inside the data center through eBGP. Spine routers
interfaces remain untested, including but not limited to ones receive a default route (prefix 0.0.0.0/0) from WAN via eBGP
that associate with untested BGP edges and other routing and propagate it to lower tiers. At each spine router, the entire
protocols, and the ones that are unused. address space of the network is summarized into a /8 prefix
We added a new PingMesh-style [18] test, InterfaceReach- and is announced to WAN. Multipath routing (ECMP) is
ablility, to check that the IPv4 addresses assigned to interfaces enabled with maximum number of paths set to 4. Routing
should be reachable from each router in the network. This test policies are only configured at spine routers to white-list the
increased interface coverage from 15% to 53%. The rest of default route received from WAN peers. We synthesize the
untested interfaces do not have IPv4 addresses assigned. configurations of these networks in Cisco IOS format.
Figure 8 summarizes the coverage improvement for the We study a test suite of three tests inspired in prior works
three iterations of test improvement in our study. After only on data center network validation [18, 23].
three iterations, the overall coverage was improved from 26%
to 43%. This final coverage number is far from perfect, but • DefaultRouteCheck: ensure that each router has the de-
our goal was not to develop the ideal test suite for Internet2; fault route.
we wanted to demonstrate how coverage information helps • ToRPingmesh: ensure that each leaf router’s assigned
develop new tests. Networks are complex and we should not subnet is reachable from all other leaf routers.
expect to get the job done with 6 tests. Many more tests are
likely needed. With NetCov, network engineers now have a • ExportAggregate: ensure that each spine router exports
the aggregate route to WAN.
5 Automatictest generation based on coverage feedback will further help
engineers. We will investigate this in the future. Figure 9 shows the coverage result when the network has a

10
total of 80 routers. Given the uniformity of the network and However, time to compute coverage increases rapidly with
the test suite, coverage results are similar for other network network size. This is because the number of RIB entries grows
sizes. The total coverage of individual tests is 81.5%, 82.1% quadratically and so does the number of vertices in the IFG.
and 80.7% respectively, and the three tests together cover We find that the average time to materialize an IFG node does
85.3% of configuration lines. We find that these tests cover not change substantially because all computation is local to
largely the same configuration elements—interfaces and BGP the node. The scaling trends suggest that to scale NetCov to
peerings between the data center routers—despite checking much larger networks, we need a concurrent implementation
for seemingly different network behaviors. This result indi- of IFG materialization. Our current implementation is single-
cates that seemingly distinct network tests can be redundant threaded (as Python interpreter is single-threaded).
in terms of configuration testing.
The coverage of ExportAggregate shows a large proportion
of weak coverage. This is because a spine router has routes to
all leaf routers, so that all leaf subnets contribute to the tested
8 Comparison to Data Plane Coverage
aggregate route, albeit weakly. Separating out weak coverage
here avoids false negatives of testing gaps—the aggregate We demonstrate the unique value of control plane coverage by
routes would be there even if some of the BGP peering or comparing it to data plane coverage. Following Yardstick [44],
interfaces are misconfigured, therefore testing the aggregate we quantify data plane coverage as the proportion of main RIB
routes provides a weaker endorsement for the covered BGP (forwarding) rules exercised. Figure 11 shows the comparison
peerings and interfaces to be bug-free. for different cases. Figure 11(a) shows the comparison for
By looking at uncovered configuration lines reported by Internet2 for all tests in §6.1 and a hypothetical data plane
NetCov, we learn that most correspond to host-facing inter- test that inspects all main RIB rules. Figure 11(b) shows the
faces on leaf routers. Adding tests that target those interfaces comparison for fat-tree tests in §6.2.
improves this test suite and eliminate testing gaps. We omit Besides the obvious advantage that only control plane cov-
results of this iteration. erage can support control plane tests—the graphs show 0%
data plane coverage for these tests—there are two main ad-
vantages to using control plane coverage to guide network
7 Performance Evaluation test development. First, it reveals testing gaps that can not be
revealed by data plane coverage. Tests with high data plane
We benchmark the performance of NetCov on both types of coverage do not necessarily have high control plane coverage,
networks we studied above. Our test machine has two Intel as we can see in the last row of Figure 11(a). Covering 100%
Xeon CPUs (16 core each, 3.1 Ghz), 384 GiB of DRAM, and of the data plane state covered only 41% of the configuration.
runs Ubuntu 18.04. If the engineers were to improve the test quality under the
Figure 10(a) shows the time to compute coverage for each guidance of only data plane coverage, they would not know
test in §6.1 and for the full test suite. It breaks out the time that 59% of the configurations remain untested. The reason
spent on simulations and strong/weak labeling, and, for refer- of this disagreement is that some configuration lines are only
ence, also shows the test execution time. We see that coverage exercised under specific environments (failures, routing mes-
computation is reasonably fast. The full test suite takes only sages). For instance, list-filtered route policies apply on BGP
99.4 seconds. In comparison, the test execution takes 2,358 messages within a specific range, and will only be exercised
seconds. The total coverage computation time is less than the when such messages appear in the environment.
sum for individual tests because facts tested by multiple tests Second, testing more data plane state can sometimes be
are tracked only once. The graph also shows that simulations redundant in covering configurations, when the tests hit
and strong/weak labeling are a minority component, which the same configuration elements. For example, the Default-
means that most of the time is spent on walking the IFG and RouteCheck test in Figure 11(b) has only 1.8% data plane
doing lookups in stable state for backward inference. coverage because it only tests default routes, which is a small
Figure 10(b) shows test execution and coverage computa- fraction of all main RIB routes. However, because correct
tion time for the test suite in §6.2, as a function of the data propagation of default routes incorporates many BGP peer-
center network size. Coverage computation takes 4,413 sec ings and interfaces in the network, this test has extensive
on the largest network, which has 2,040,624 RIB entries. This configuration coverage (87%). The ToRPingmesh test covers
time is less than 9% of the time to execute the test suite. While much more data plane state (88%), but adding it atop Default-
substantial, we deem it acceptable in practice. Configuration RouteCheck has little value because this state is derived from
coverage analysis can be run in the background, as code cov- almost the same set of configurations lines. We do not nec-
erage is often run. NetCov does not slow down test execution, essarily imply that engineers should drop one of these tests,
which is on the critical path to finding configuration errors as there may be other reasons to keep both. Our observations
and updating the network. are about their value toward configuration coverage.

11
BlockToExternal 0.01
32.8 test execution N = 20 5.3
0.6
test execution
19.5 cov [other] cov [other]
NoMartian 0.01 cov [simulations] N = 80 126 cov [simulations]
12
RoutePreference 42.7 cov [strong/weak labeling] 1701
923
cov [strong/weak labeling]
N = 180 97
SanityIn 0.86
0.11
N = 320 4372
PeerSpecificRoute 11.4 427
61.1
593 N = 500 16677
InterfaceReachablility 25.2 1473
Test Suite 2358 N = 720 54043
99.4 4413
0 20 40 60 80 100 120 0 1000 2000 3000 4000 5000
Time (sec) Time (sec)
(a) Internet2. (b) Fat-tree networks.
Figure 10: Time to compute coverage.

BlockToExternal 0%0.6% configuration coverage configuration coverage


NoMartian 0% 0.9% data plane coverage data plane coverage
RoutePreference 0.7% 24.7%
0.7%
SanityIn 0% DefaultRouteCheck 1.8% 86.8%
PeerSpecificRoute 1.3% 34.0% 88.3%
ToRPingmesh 88.0%
InterfaceReachablility 0.7% 11.5% 84.9%
43.0% ExportAggregate 0.1%
Test Suite 2.7%
Hypothetical full DP 44.1% Test Suite 90.4%
100.0% 89.9%
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
Coverage Coverage
(a) Internet2. (b) Fat-tree with k=10.
Figure 11: Comparing control plane and data plane coverage.

9 Related Work by tracing system execution in forward direction. Negative


provenance systems can reason about missing events [43]
Our work builds on top of four lines of research. and materialize provenance graphs lazily using backward in-
Code coverage. We borrow from the software domain the ference. NetCov too uses a graph-based model. However, it
idea of using code coverage to reveal testing gaps, quan- is unique in terms of accommodating network configuration
tify test suite quality, and help engineers improve their test into a provenance model, and this model, tailored to the sta-
suites [4, 15, 20]. Our coverage analysis techniques, however, ble state assumption, is more succinct. Further, it combines
are specialized to the operation of network configurations. backward and forward inference to overcome the limitations
of using only one type of inference.
Data plane coverage. Yardstick introduced data plane cov-
erage metrics [44] that quantify the proportion of data plane
elements such as forwarding rules and paths that are exercised
by network tests. Configuration coverage goes further and
maps tested data plane components to configuration elements
10 Summary
that contribute to them. It provides more direct feedback be-
cause network engineers author configurations, not data plane
state, and it supports testing of configuration elements that NetCov is a tool that analyzes which configuration lines are
are not exercised by the current data plane state. tested by a suite of network tests. It uses an information flow
Network testing and verification. A range of tools can anal- model based on control plane semantics to track which config-
yse properties of network data and control planes [7,12,14,18, uration lines contribute to tested data plane state. It accounts
19, 23, 25, 26, 45, 46]. NetCov borrows ideas from verification for non-local and non-deterministic contributions, and for
tools to concisely model the network, e.g., focusing on stable performance, it discovers the graph lazily. Our experiments
state and routing protocol instances [7, 14]. However, NetCov showed that NetCov successfully reveals coverage gaps for
target a different problem—reveal what is tested vs enabling real-world networks and test suites, and these tests can have
testing of new properties–and uses different techniques. surprisingly low coverage, e.g., 26% of configuration lines for
Internet2. They also showed how its feedback helps improve
Network provenance. Provenance systems can track causal
coverage.
dependencies of events in distributed systems. Provenance
systems like ExSPAN [48] materialize provenance graphs This work does not raise any ethical issues.

12
References [13] Lixin Gao and Jennifer Rexford. Stable internet routing
without global coordination. IEEE/ACM Transactions
[1] Netcov: test coverage for network configurations. http on networking, 9(6):681–692, 2001.
s://github.com/UWNetworksLab/netcov.
[14] Aaron Gember-Jacobson, Raajay Viswanathan, Aditya
[2] Mohammad Al-Fares, Alexander Loukissas, and Amin Akella, and Ratul Mahajan. Fast control plane analysis
Vahdat. A scalable, commodity data center network using an abstract representation. In Proceedings of
architecture. In Proceedings of SIGCOMM ’08, page SIGCOMM ’16, pages 300–313. ACM, 2016.
63–74. ACM, 2008.
[15] Milos Gligoric, Alex Groce, Chaoqiang Zhang, Rohan
[3] Mae Anderson. Time warner cable says outages largely Sharma, Mohammad Amin Alipour, and Darko Mari-
resolved. http://www.seattletimes.com/busines nov. Comparing non-adequate test suites using coverage
s/time-warner-cable-says-outages-largely-r criteria. In Proceedings of the 2013 International Sym-
esolved, 2014. posium on Software Testing and Analysis, ISSTA 2013,
page 302–313, 2013.
[4] James H Andrews, Lionel C Briand, Yvan Labiche, and
Akbar Siami Namin. Using mutation analysis for as- [16] Timothy G Griffin, F Bruce Shepherd, and Gordon Wil-
sessing and comparing testing coverage criteria. IEEE fong. The stable paths problem and interdomain routing.
Transactions on Software Engineering, 32(8):608–624, IEEE/ACM Transactions On Networking, 10(2):232–
2006. 243, 2002.

[5] John Backes, Sam Bayless, Byron Cook, Catherine [17] GNU Guix. lcov–code coverage tool that enhances gnu
Dodge, Andrew Gacek, Alan J Hu, Temesghen Kah- gcov. https://guix.gnu.org/en/packages/lcov-
sai, Bill Kocik, Evgenii Kotelnikov, Jure Kukovec, et al. 1.15/.
Reachability analysis for aws-based networks. In Inter-
national Conference on Computer Aided Verification, [18] Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong
pages 231–241. Springer, 2019. Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang,
Bin Pang, Hua Chen, Zhi-Wei Lin, and Varugis Kurien.
[6] Batfish: Network configuration analysis tool. https: Pingmesh: A large-scale system for data center network
//github.com/batfish/batfish. latency measurement and analysis. In Proceedings of
SIGCOMM ’15, page 139–152. ACM, 2015.
[7] Ryan Beckett, Aarti Gupta, Ratul Mahajan, and David
Walker. A general approach to network configuration [19] Alex Horn, Ali Kheradmand, and Mukul Prasad. Delta-
verification. In Proceedings of SIGCOMM ’17, pages net: Real-time network verification using atoms. In
155–168. ACM, 2017. Proceedings of NSDI 17, pages 735–749. USENIX As-
sociation, 2017.
[8] Karl S Brace, Richard L Rudell, and Randal E Bryant.
Efficient implementation of a bdd package. In Proceed- [20] Monica Hutchins, Herb Foster, Tarak Goradia, and
ings of the 27th ACM/IEEE design automation confer- Thomas Ostrand. Experiments on the effectiveness
ence, pages 40–45, 1991. of dataflow-and control-flow-based test adequacy cri-
teria. In Proceedings of 16th International conference
[9] Larry Brader, Howie Hilliker, and Alan Wills. Testing on Software engineering, pages 191–200. IEEE, 1994.
for Continuous Delivery with Visual Studio 2012. Mi-
crosoft, 2013. [21] Istio. Diagnose your configuration with istioctl analyze.
https://istio.io/latest/docs/ops/diagnosti
[10] Cisco Systems, Inc. Configure protocol redistribution c-tools/istioctl-analyze/.
for routers. https://www.cisco.com/c/en/us/su
pport/docs/ip/enhanced-interior-gateway-ro [22] Marko Ivanković, Goran Petrović, René Just, and Gor-
uting-protocol-eigrp/8606-redist.html. don Fraser. Code coverage at google. In Proceedings
of the 2019 27th ACM Joint Meeting on European Soft-
[11] Codecov. Codecov: The leading code coverage solution. ware Engineering Conference and Symposium on the
https://about.codecov.io/, 2021. Foundations of Software Engineering, pages 955–963.
ACM, 2019.
[12] Ari Fogel, Stanley Fung, Luis Pedrosa, Meg Walraed-
Sullivan, Ramesh Govindan, Ratul Mahajan, and Todd [23] Karthick Jayaraman, Nikolaj Bjørner, Jitu Padhye, Amar
Millstein. A general approach to network configuration Agrawal, Ashish Bhargava, Paul-Andre C Bissonnette,
analysis. In Proceedings of NSDI 15, pages 469–483. Shane Foster, Andrew Helwer, Mark Kasten, Ivan
USENIX Association, 2015. Lee, Anup Namdhari, Haseeb Niaz, Aniruddha Parkhi,

13
Hanukumar Pinnamraju, Adrian Power, Neha Milind the 2003 conference on Applications, technologies, ar-
Raje, and Parag Sharma. Validating datacenters at chitectures, and protocols for computer communications,
scale. In Proceedings of SIGCOMM ’19, pages 200– pages 49–60, 2003.
213. ACM, 2019.
[35] Fabio Somenzi. Cudd: Cu decision diagram package
[24] Yue Jia and Mark Harman. An analysis and survey of release 2.5.0. University of Colorado at Boulder, 2012.
the development of mutation testing. IEEE transactions
on software engineering, 37(5):649–678, 2010. [36] Yevgeniy Sverdlik. United says it outage resolved,
dozen flights canceled monday. https://www.datace
[25] Peyman Kazemian, George Varghese, and Nick McK- nterknowledge.com/archives/2017/01/23/unit
eown. Header space analysis: Static checking for net- ed-says-it-outage-resolved-dozen-flights-c
works. In Proceedings of NSDI 12, pages 113–126. anceled-monday, 2017.
USENIX Association, 2012.
[37] Bingchuan Tian, Xinyi Zhang, Ennan Zhai,
[26] Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Hongqiang Harry Liu, Qiaobo Ye, Chunsheng
Caesar, and P Brighten Godfrey. Veriflow: Verifying Wang, Xin Wu, Zhiming Ji, Yihong Sang, Ming Zhang,
network-wide invariants in real time. In Proceedings of Da Yu, Chen Tian, Haitao Zheng, and Ben Y. Zhao.
NSDI 13, pages 15–27. USENIX Association, 2013. Safely and automatically updating in-network acl
configurations with intent language. In Proceedings of
[27] Nuno P Lopes and Andrey Rybalchenko. Fast bgp simu- SIGCOMM ’19, page 214–226. ACM, 2019.
lation of large datacenters. In International Conference
on Verification, Model Checking, and Abstract Interpre- [38] Omer Tripp, Marco Pistoia, Stephen J Fink, Manu Srid-
tation, pages 386–408. Springer, 2019. haran, and Omri Weisman. Taj: effective taint analysis
of web applications. ACM Sigplan Notices, 44(6):87–97,
[28] Matthew Luckie, Bradley Huffaker, Amogh Dhamdhere, 2009.
Vasileios Giotsas, and KC Claffy. As relationships, cus-
tomer cones, and validation. In Proceedings of IMC ’13, [39] Route Views. University of oregon route views project.
pages 243–256, 2013. http://www.routeviews.org/routeviews/, 1997.

[29] Santhosh Prabhu, Kuan Yen Chou, Ali Kheradmand, [40] Rosemary Wang. Testing hashicorp terraform. https:
Brighten Godfrey, and Matthew Caesar. Plankton: Scal- //www.hashicorp.com/blog/testing-hashicorp
able network configuration verification through model -terraform.
checking. In Proceedings of NSDI 20, pages 953–967.
USENIX Association, 2020. [41] Konstantin Weitz, Doug Woos, Emina Torlak, Michael D
Ernst, Arvind Krishnamurthy, and Zachary Tatlock.
[30] Bruno Quoitin and Steve Uhlig. Modeling the routing Scalable verification of border gateway protocol config-
of an autonomous system with c-bgp. IEEE network, urations with an smt solver. In Proceedings of OOPSLA
19(6):12–19, 2005. 2016, pages 765–780. ACM, 2016.

[31] Steve Ragan. Bgp errors are to blame for monday’s [42] Zach Whittaker. T-mobile hit by phone calling, text
twitter outage, not ddos attacks. https://www.csoonl message outage. https://techcrunch.com/2020/
ine.com/article/3138934/security/bgp-error 06/15/t-mobile-calling-outage/, 2020.
s-are-to-blame-for-monday-s-twitter-outage
-not-ddos-attacks.html, 2016. [43] Yang Wu, Mingchen Zhao, Andreas Haeberlen, Wen-
chao Zhou, and Boon Thau Loo. Diagnosing missing
[32] Deon Roberts. It’s been a week and customers are still events in distributed systems with negative provenance.
mad at bb&t. https://www.charlotteobserver.co ACM SIGCOMM Computer Communication Review,
m/news/business/banking/article202616124.h 44(4):383–394, 2014.
tml, 2018.
[44] Xieyang Xu, Ryan Beckett, Karthick Jayaraman, Ratul
[33] Deon Roberts. Facebook says its outage was caused by Mahajan, and David Walker. Test coverage metrics for
a cascade of errors. https://www.nytimes.com/20 the network. In Proceedings of SIGCOMM ’21, page
21/10/05/technology/facebook-outage-cause. 775–787. ACM, 2021.
html, 2021.
[45] Hongkun Yang and Simon S. Lam. Real-time verifi-
[34] Joao Luis Sobrinho. Network routing with path vector cation of network properties using atomic predicates.
protocols: Theory and applications. In Proceedings of IEEE/ACM Trans. Netw., 24(2):887–900, April 2016.

14
[46] Fangdan Ye, Da Yu, Ennan Zhai, Hongqiang Harry Liu,
Bingchuan Tian, Qiaobo Ye, Chunsheng Wang, Xin Wu,
Tianchen Guo, Cheng Jin, Duncheng She, Qing Ma,
Biao Cheng, Hui Xu, Ming Zhang, Zhiliang Wang, and
Rodrigo Fonseca. Accuracy, scalability, coverage: A
practical configuration verifier on a global wan. In Pro-
ceedings of SIGCOMM ’20, page 599–614. ACM, 2020.
[47] Hongyi Zeng, Shidong Zhang, Fei Ye, Vimalkumar
Jeyakumar, Mickey Ju, Junda Liu, Nick McKeown, and
Amin Vahdat. Libra: Divide and conquer to verify for-
warding tables in huge networks. In Proceedings of
NSDI 14, pages 87–99. USENIX Association, 2014.
[48] Wenchao Zhou, Micah Sherr, Tao Tao, Xiaozhou Li,
Boon Thau Loo, and Yun Mao. Efficient querying and
maintenance of network provenance at internet-scale.
In Proceedings of SIGMOD ’10, pages 615–626. ACM,
2010.

15

You might also like