Codedbulk: Inter-Datacenter Bulk Transfers Using Network Coding
Codedbulk: Inter-Datacenter Bulk Transfers Using Network Coding
(a) Inter-datacenter bulk transfer example (b) single-path solution (suboptimal) (c) multi-path solution (suboptimal)
? ? ? B
A B A
A
B
B
A
A
B
A B
B
(d) Steiner-tree solution (suboptimal) (e) Optimal non-coded solution (computed by (f) CodedBulk solution (optimal).
hand; efficient algorithms to compute such an opti-
mal solution are not known).
Figure 1: Understanding benefits of network coding. (a) An instance of the inter-datacenter bulk transfer problem on the Internet2 topology [5],
with one source (marked by a circle with a ?) and three destinations (marked by circles). The network model is as described in §2.1. (b, c, d)
Existing solutions based on single-path, multi-path and Steiner arborescence packing can be suboptimal (detailed discussion in §2.2). (e) An
optimal solution with Steiner arborescence packing (computed by hand); today, computing such a solution requires brute force search which is
unlikely to scale to inter-datacenter deployment sizes (tens to hundreds of datacenters) [23, 25]. (f) CodedBulk, using network coding, not only
achieves optimal throughput but also admits efficient algorithms to compute the corresponding network codes. More discussion in §2.2.
2.2 Network coding background ditional discussion and examples in [46]. Single-path (also
referred to as multiple unicast) solutions, where the source
Suppose a source wants to send a large file to a single desti-
transfers data along a single path to each individual desti-
nation, and that it is allowed to use as many paths as possible.
nation, can be suboptimal because they neither utilize all
If there are no other flows in the network, the maximum
the available network bandwidth, nor do they allow inter-
achievable throughput (the amount of data received by the
mediate nodes to forward/mirror data to other destinations.
destination per unit time) is given by the well-known max-
Multi-path solutions, where the source transfers data along
flow min-cut theorem—the achievable throughput is equal to
all edge-disjoint paths to each individual destinations (paths
the capacity of the min-cut between the source and the desti-
across destinations do not need to be edge disjoint), can be
nation in the induced graph. The corresponding problem for
suboptimal because they do not allow intermediate nodes to
a source sending a file to multiple destinations was an open
forward/mirror data to other destinations.
problem for decades. In 2000, a now celebrated paper [9]
established that, for a multicast transfer, the maximum achiev- The current state-of-the-art solutions for our network model
able throughput is equal to the minimum of the min-cuts are based on Steiner tree (or, more precisely, Steiner arbores-
between the source and individual destinations. This is also cence) packing [7, 18, 34]. These solutions use multiple paths,
optimal. For general directed graphs, achieving this through- and allow intermediate nodes to mirror and forward the data;
put is not possible using solutions where intermediate nodes however, they can be suboptimal because the problem of
simply forward or mirror the incoming data—it necessarily computing optimal Steiner tree (or arborescence) packing is
requires intermediate nodes to perform certain computations NP-hard, and approximation algorithms need to be used [12].
over the incoming data before forwarding the data [9, 30, 33]. To demonstrate the limitations of existing Steiner packing so-
For our network model that captures full-duplex links, net- lutions, consider the example shown in Figure 1(d): here, once
work coding achieves optimal throughput (since it subsumes the shown Steiner tree is constructed, no additional Steiner
solutions that do not perform coding); however, it is currently trees can be packed in a manner that higher throughput can
not known whether optimal throughput can be achieved with- be achieved. Figure 1(e) demonstrates the complexity of com-
out network coding [10, 34]. Figure 1 demonstrates the space puting an optimal solution (that we constructed by hand)—to
of existing solutions, using a bulk transfer instance from our achieve the optimal solution shown in the figure, one must
evaluation (§4) on the Internet2 topology. We present ad- explore intermediate solutions that use a suboptimal Steiner
tree (shown in blue color). Today, computing such an optimal
solution requires a brute force approach, which is unlikely to
f f
scale to inter-datacenter network sizes. Thus, we must use ap-
proximate suboptimal solutions; to the best of our knowledge,
the state-of-the-art algorithms for computing approximate f f f
Steiner packing solutions for our network model do not even
admit polylogarithmic approximation factors [11, 21]. (a) Forward. (b) Mirror.
Network coding avoids the aforementioned limitations of
existing solutions by allowing intermediate nodes to perform
certain computations, which subsume forwarding and mirror- f1 . . . fk f1 . . . fk
ing, on data (as shown in the Figure 1(f) example)—it utilizes
multiple paths, guarantees optimal throughput, and admits
f1 ⊕ · · · ⊕ fk f1 ⊕ · · · ⊕ fk f1 ⊕ · · · ⊕ fk
efficient computation of network codes that achieve optimal
throughput [24]. Thus, while designing optimal non-coded
solutions for bulk transfers remains an open problem, we can (c) Code-and-Forward. (d) Code-and-Mirror.
efficiently achieve throughput optimality for inter-datacenter
Figure 2: Four basic coding functions available at each intermediate
bulk transfers today using network coding. node to implement the network code generated by CodedBulk.
5 0 ms
shared link
100m
s
100m
s
(a) Reason: Interactive traffic. (b) Reason: Non-uniform delay. (c) Reason: Multiple bulk transfers.
Figure 3: Understanding asymmetric link problem. (a) Due to sporadic high-priority interactive traffic (e.g., the one shown in red), different
links may have different (time-varying) bandwidths; (b) If network links have significantly different round trip times, naïvely implementing
traditional network coding would require large amount of fast data plane storage to buffer data that arrives early at nodes; (c) multiple concurrent
bulk transfers, especially those that end up sharing links, make it hard to efficiently realize traditional network coding solutions that assume a
single bulk transfer at all times. Detailed discussion in §3.1.
information for each flow, and the computations done at 3.1 Understanding fundamental barriers
each intermediate node for the flows arriving at that node.
We start by building an in-depth understanding of the asym-
These codes can be expressed as a combination of four
metric link bandwidth problem, and how it renders techniques
basic functions shown in Figure 2.
in network coding literature infeasible in practice. We use
3. Once the network code is computed, the controller installs Figure 3 for the discussion in this subsection.
the network code on each node that participates in the
bulk transfer. We discuss, in §3.3, a mechanism to imple- Asymmetric links due to sporadic interactive traffic. Inter-
ment the forwarding and routing functions that requires no datacenter WANs transfer both latency-sensitive interactive
changes in existing inter-datacenter WAN infrastructure. traffic (e.g., user commits, like emails and documents) and
bandwidth-intensive bulk traffic [23, 25]. While interactive
4. Once the code is installed, the controller notifies the source. traffic is low-volume, it is unpredictable and is assigned higher
The source partitions the bulk transfer file into multiple priority. This leads to two main challenges. First, links may
subfiles (defined by the code) and then initiates the bulk have different bandwidths available at different times for bulk
transfer using CodedBulk, as described in the remainder transfers (as shown in Figure 3(a)). Second, the changes in
of the paper. For instance, for the example of Figure 1(f), available bandwidth may be at much finer-grained timescales
the source divides the file into two subfiles (A and B) of than the round trip times between geo-distributed datacenters.
equal sizes and transmits them using the code shown in
Traditional network coding literature does not consider the
the figure. Each intermediate node independently performs
case of interactive traffic. An obvious way to use traditional
CodedBulk’s hop-by-hop flow control mechanism. Impor-
network coding solutions for non-uniform link bandwidths is
tantly, a “hop” here refers to a datacenter on the network
to use traffic shaping to perform network coding on the mini-
topology graph. CodedBulk assumes that interactive traf-
mum of the available bandwidth across all links. For instance,
fic is always sent with the highest priority, and needs two
in the example of Figure 3(a), if the average load induced by
additional priority levels.
interactive traffic is 0.1× link bandwidth, then one can use
5. Once the bulk transfer is complete, the source notifies network coded transfers only on 0.9× bandwidth. However,
the controller. The controller periodically uninstalls the the two challenges discussed above make this solution hard,
inactive codes from all network nodes. if not infeasible: bandwidths are time-varying, making static
rate allocation hard; and, bandwidth changing at much fine-
The core of CodedBulk’s mechanisms are to efficiently enable
grained timescales than geographic round trip times makes it
the fourth step. We describe these in the next section.
hard to do dynamic rate allocation.
3 CodedBulk Design Asymmetric links due to non-uniform delay. Traditional
network coding solutions, at least the practically feasible
We describe the core techniques in CodedBulk design and
ones [24], require computations on data arriving from mul-
implementation. We start by building an in-depth understand-
tiple flows in a deterministic manner: packets that need to
ing of the asymmetric link problem (§3.1). We then describe
be coded are pre-defined (during code construction) so as to
how CodedBulk resolves the asymmetric link problem using a
allow the destinations to decode the original data correctly.
custom-designed hop-by-hop flow control mechanism (§3.2).
To achieve this, existing network coding solutions make one
Finally, we discuss the virtual link abstraction that enables
of the two assumptions: either the latency from the source
implementation of CodedBulk without any modifications in
to each individual node is uniform; or, unbounded storage
underlying transport- and network-layer protocols (§3.3).
F1 F1
F1 ⊕ F2 F1 ⊕ F2
F v F v v
F2 F2
Figure 4: Understanding hop-by-hop flow control for a single bulk transfer. (left) if the outgoing link has enough bandwidth to sustain the rate
of incoming traffic (flow F in this example), then all buffers will remain unfilled and flow control will not be instantiated; (center) the same
scenario as the left figure holds as long as the two conditions hold: (1) both flows that need to be coded at some node v send at the same rate;
and (2) the outgoing link has enough bandwidth to sustain the rate of incoming traffic; (right) If two flows need to be coded at some node v, and
one of the flows F1 is sending at higher rate, then the Rx buffer for F1 will fill up faster than it can be drained (due to v waiting for packets of
F2) and flow control to the downstream node of F1 will be triggered, resulting in rate reduction for flow F1. Detailed discussion in §3.2.
at intermediate nodes to buffer packets from multiple flows. on precisely one upstream flow; in contrast, CodedBulk op-
Neither of these assumptions may hold in practice. The delay erates on “coded flows” that may require multiple upstream
from the source to individual intermediate nodes can vary flows to be encoded at intermediate nodes. Thus, a flow being
by hundreds of milliseconds in a geo-distributed setting (Fig- transmitted at a low rate can affect the overall performance of
ure 3(b)). Keeping packets buffered during such delays would the transfer (since other flows that need to be encoded with
require an impractical amount of high-speed storage for high- this flow will need to lower their rate as well). This leads to a
bandwidth inter-datacenter WAN links: if links are operating correlated rate control problem. For instance, in Figure 2(c)
at terabits per second of bandwidth, each intermediate node and Figure 2(d), flows f1 to fk must converge to the same
would require hundreds of gigabits or more of storage. rate so that the intermediate node can perform coding opera-
tions correctly without buffering large number of packets. To
Asymmetric links due to simultaneous bulk transfers. Tra-
that end, CodedBulk’s hop-by-hop flow control mechanism
ditional network coding literature considers only the case of
maintains three invariants:
a single bulk transfer. Designing throughput-optimal network
codes for multiple concurrent bulk transfers is a long-standing • All flows within the same bulk transfer that need to be
open problem. We do not solve this problem; instead, we fo- encoded at any node must converge to the same rate;
cus on optimizing throughput for individual bulk transfers • All flows from different bulk transfers competing on the
while ensuring that the network runs at high utilization. congested link bandwidth must converge to a max-min fair
Achieving the above two goals simultaneously turns out bandwidth allocation;
to be hard, due to each individual bulk transfer observing • The network is deadlock-free.
different delays (between respective source to intermediate
nodes) and available link bandwidths due to interactive traf- CodedBulk maintains these invariants using a simple idea:
fic. Essentially, as shown in Figure 3(c), supporting multiple careful partitioning of buffer space to flows within and across
simultaneous bulk transfers requires additional mechanisms bulk transfers. The key insight here, that follows from early
for achieving high network utilization. work on buffer sharing [45], is that for large enough buffers,
two flows congested on a downstream link will converge to a
3.2 CodedBulk’s hop-by-hop flow control rate that corresponds to the fair share of the downstream link
Network coding, by its very nature, breaks the end-to-end bandwidth. We describe the idea of CodedBulk’s hop-by-hop
semantics of traffic between a source-destination pair, thus flow control mechanism using two scenarios: single isolated
necessitating treating the traffic as a set of flows between the bulk transfer and multiple concurrent bulk transfers.
intermediate nodes or hops. Recall that a “hop” here refers to Single bulk transfer. First consider the two simpler cases of
a (resource-rich) datacenter on the network graph. To ensure forward (Figure 2(a)) and mirror (Figure 2(b)). These cases
that we do not lose packets at intermediate nodes in spite are exactly similar to traditional congestion control protocols,
of the fact that they have limited storage, we rely on a hop- and hence do not require any special mechanism for buffer
by-hop flow control mechanism—a hop pushes back on the sharing. The main challenge comes from Code-and-Forward
previous hop when its buffers are full. This pushback can be (Figure 2(c)) and Code-and-Mirror (Figure 2(d)). For these
implicit (e.g., TCP flow control) or explicit. cases, the invariant we require is that the flows being used to
Hop-by-hop flow control is an old idea, dating back to the compute the outgoing data converge to the same rate since
origins of congestion control [41, 45]. However, our problem otherwise packets belonging to the flows sending at a higher
is different: traditional hop-by-hop flow control mechanisms rate will need to be buffered at the node, requiring high storage.
operate on individual flows—each downstream flow depends This is demonstrated in Figure 4, center and right figures.
F11 F11 F11
F11 ⊕ F12
F12 F12 F12
F11 ⊕ F12 F11 ⊕ F12
v v v
F21 ⊕ F22 F21 ⊕ F22
F21 F21 F21
F21 ⊕ F22
Figure 5: If concurrent bulk transfers use completely different outgoing links (left) or use the same outgoing link but with enough bandwidth
(center), the hop-by-hop flow control mechanism does not get triggered. However, if the outgoing link is bandwidth-bottlenecked, and one of
the bulk transfers is sending at higher rate (say the red one), then the buffers for the red flows will fill up faster than the buffers for blue flows;
at this point, hop-by-hop flow control mechanism will send a pushback to the downstream nodes of the red flows, resulting in reduced rate for
the red flows. Detailed discussion in §3.2.
Figure 7: The figure demonstrates the virtual link abstraction used 4 Evaluation
by CodedBulk to implement its hop-by-hop flow control mechanism
without any modifications in the underlying network stack. We implement CodedBulk in C++ and use TCP Cubic as
the underlying transport protocol. We use default TCP socket
buffers, with interactive traffic sent at higher priority than bulk
from the two flows. Blocking calls require expensive coordi- transfers (using TCP differentiated services field) set using
nation between two buffers since the node requires data from standard Linux socket API. To enforce priority scheduling,
both flows to be available before it can make progress. Non- we use Linux tc at each network interface.
blocking calls cannot be used either—the call will return the We now evaluate CodedBulk implementation over two real
data from one of the flows, but this data cannot be operated geo-distributed cloud testbeds. We start by describing the ex-
upon until the data from the other flow(s) is also available. periment setup (§4.1). We then discuss the results for Coded-
The fundamental challenge here is that we need efficient ways Bulk implementation over a variety of workloads with varying
to block on multiple flows, and return the call only when data choice of source and destination nodes for individual bulk
is available in all flows that need to be coded. transfers, interactive traffic load, number of concurrent bulk
It may be tempting to have a shared buffer across different transfers, and number of destinations in individual bulk trans-
flows that need to be coded together. The problem, however, is fers (§4.2). Finally, we present scalability of our CodedBulk
that shared buffers will lead to deadlocks [41]—if one of the prototype implementation in software and hardware (§4.3).
flows is sending data at much higher rate than the other flows,
it will end up saturating the buffer space, the other flows will 4.1 Setup
starve, and consequently the flow that filled up the buffer will Testbed details. To run our experiments, we use two testbeds
also not make progress since it waits to receive data from other that are built as an overlay on geo-distributed datacenters from
flows to be coded with. As discussed in §3.2, non-zero buffer Amazon AWS. Our testbeds use 13 and 9 geo-distributed data-
allocation to each individual flow is a necessary condition for centers organized around B4 [25] and Internet2 [5] topologies,
avoiding deadlocks in hop-by-hop flow control mechanisms. respectively. The datacenter locations are chosen to closely
Virtual links (see Figure 7). CodedBulk assigns each indi- emulate the two topologies and the corresponding geographi-
vidual bulk transfer a virtual link per outgoing physical link; cal distances and latencies. Within each datacenter, we take a
each virtual link has a single virtual transmit buffer vTx and high-end server; for every link in the corresponding topology,
as many virtual receive buffers vRx as the number of flows to we establish a connection between the servers across various
be coded together for that outgoing link. For instance, con- datacenter using the inter-datacenter connectivity provided
sider four incoming flows in a bulk transfer F1, F2, F3, F4 by Amazon AWS. To reduce cost of experimentation, we
such that F1 ⊕ F2 is forwarded on one of outgoing physical throttle the bandwidth between each pair of servers to 200
links, and F2 ⊕ F3 ⊕ F4 is forwarded on another outgoing Mbps for our experiments. The precise details on the inter-
physical link. Then, CodedBulk creates two virtual links each datacenter connectivity provided by Amazon AWS, whether
having one vTx; the first virtual link has two vRx (one for F1 they use public Internet or dedicated inter-datacenter links,
packets and another for F2 packets) and the second virtual is not publicly known. We run all the experiments for each
link has three vRx (one for each of F2, F3 and F4 packets). individual figure within a short period of time; while the inter-
Virtual links are created when the controller installs the net- datacenter links provided by Amazon AWS may be shared
work codes, since the knowledge of the precise network code and may cause interference, we observe fairly consistent inter-
to be used for the bulk transfer is necessary to create virtual datacenter bandwidth during our experiments. We use a server
links. As new codes are installed, CodedBulk reallocates the in one of the datacenters to act as the centralized controller
space to each vTx and vRx, within and across virtual links, to (to compute and install network codes on all servers across
ensure that all virtual buffers have non-zero size. our testbed).
Using these virtual links resolves the aforementioned chal- Workloads. As mentioned earlier, the benefits of network
lenge with blocking and non-blocking calls. Indeed, either coding depend on the underlying network topology, the num-
of the calls can now be used since the “correlation” between ber of destinations in individual bulk transfers, the location
the flows is now captured at the virtual link rather than at the of the source and the set of destinations in each bulk transfer,
flow control layer. Data from the incoming socket buffers for the number of concurrent transfers and interactive traffic load.
While there are no publicly available datasets or workloads for Single-Path Multi-Path
inter-datacenter bulk transfers, several details are known. For Steiner Arborescence CodedBulk
800 800
Aggregate Throughput (Mbps)
400 400
200 200
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9
Number of Multicast Sources Number of Multicast Sources
Figure 9: Performance of various bulk transfer mechanisms for varying number of concurrent bulk transfers. CodedBulk improves the bulk
transfer throughput by 1.6 − 4×, 1.3 − 2.8× and 1.2 − 2.5× when compared to single-path, multi-path, and Steiner arborescence based
mechanisms, respectively (discussion in §4.2).
4.2 Geo-distributed Testbed Experiments mechanism performs better on Internet2 topology because of
its sparsity—fewer links in the network means a Steiner ar-
We compare CodedBulk with the three baselines for varying borescence solution is more likely to be the same as network
interactive traffic loads, varying number of concurrent bulk coding solution due to fewer opportunities to perform coding.
transfers and varying number of replicas per bulk transfer. Nevertheless, CodedBulk outperforms Steiner arborescence
based mechanism by 1.4×.
Varying interactive traffic load. Figure 8 presents the
achievable throughput for each scheme with varying interac- Varying number of concurrent bulk transfers. Figure 9
tive traffic load. For this experiment, we use 3-way replication shows the performance of the four mechanisms with varying
and 6 concurrent transfers (to capture the case of Facebook, number of concurrent transfers. For this evaluation, we use
Netflix, Azure SQL server and CloudBasic SQL server as the same setup as earlier—3-way replication, multiple runs
discussed above), and vary the interactive traffic load from with each run selecting different sources and set of destina-
0.05 − 0.2× of the link bandwidth. tions, etc.—with the only difference being that we fix the
As expected, the throughput for all mechanisms decreases interactive traffic load to 0.1 and vary the number of concur-
as interactive traffic load increases. Note that, in corner-case rent bulk transfers. With larger number of concurrent bulk
scenarios, the multi-path mechanism can perform slightly transfers, Steiner arborescence mechanisms slightly outper-
worse than single-path mechanism for multiple concurrent form multi-path due to improved arborescence construction.
bulk transfers due to increased interference across multiple Nevertheless, CodedBulk provides benefits across all sets of
flows sharing a link, which in turn results in increased conver- experiments, achieving 1.2−2.5× improvements over Steiner
gence time for TCP (see [46] for a concrete example). Overall, arborescence based mechanisms. The gains are more promi-
CodedBulk improves the bulk traffic throughput over single- nent for B4 topology and for fewer number of concurrent
path, multi-path and Steiner arborescence mechanisms by transfers, since CodedBulk gets more opportunities to per-
1.9 − 2.2×, 1.4 − 1.6× and 1.2 − 1.6×, respectively, depend- form network coding at intermediate nodes in these scenarios.
ing on the interactive traffic load and the network topology.
Single-path mechanisms perform poorly because they do not Varying number of destinations/replicas per bulk trans-
exploit all the available bandwidth in the network. Both multi- fer. Figure 10 shows the performance of the four mechanisms
path and Steiner arborescence based mechanisms exploit the with varying number of destinations/replicas for individual
available bandwidth as much as possible. However, multi- bulk transfers. For this evaluation, we use the same setup as
path mechanisms suffer since they do not allow intermediate Figure 8—6 concurrent bulk transfers, multiple runs with each
nodes to mirror and forward to the destinations. Steiner ar- run selecting different sources and set of destinations, etc.—
borescence further improves upon multi-path mechanisms by with the only difference being that we fix the interactive traffic
allowing intermediate nodes to mirror and forward data, but load to 0.1 and vary the number of destinations/replicas per
they suffer due to approximation algorithm often leading to bulk transfer from 2 to the maximum allowable replicas for
suboptimal solutions. CodedBulk’s gains over multi-path and individual topologies. Notice the results show the aggregate
Steiner arborescence mechanisms are, thus, primarily due to throughput per destination.
CodedBulk’s efficient realization of network coding—it not As the number of destinations per bulk transfer increases,
only uses all the available links, but also computes the optimal the per-destination throughput decreases for all schemes (al-
coding strategy (unlike Steiner arborescence mechanism that though, as expected, the sum of throughput of all destinations
uses an approximation algorithm). The Steiner arborescence increases). Note that multi-path outperforming single-path
Single-Path Multi-Path Steiner Arborescence CodedBulk
400 400
200 200
0 0
2 3 4 5 6 7 8 9 10 11 12 2 3 4 5 6 7 8
Number of Destinations Number of Destinations
Figure 10: Performance of various bulk transfer mechanisms for varying number of destinations/replicas per bulk transfer. CodedBulk improves
the bulk transfer throughput over single-path and multi-path mechanisms by 1.8 − 4.3× and 1.4 − 2.9×, respectively, depending on the number
of destinations in each bulk transfer and depending on the topology. CodedBulk outperforms Steiner arborescence mechanisms by up to 1.7×
when the number of destinations is not too large. When each bulk transfer creates as many replicas as the number of datacenters in the network,
CodedBulk performs comparably with Steiner arborescence. Note that the aggregate bulk throughput reduction is merely because each source
is transmitting to increasingly many destinations, but the metric only captures the average throughput per destination. Discussion in §4.2.
number of destinations and on the topology; moreover, the rel- Figure 11: CodedBulk implementation performs network coding
ative gains of CodedBulk improve as number of destinations for as much as 31Gbps worth of traffic using a commodity 16 core
increases. The comparison with Steiner arborescence based server, achieving roughly linearly coding throughput scalability with
mechanism is more nuanced. CodedBulk achieves improved number of cores.
performance when compared to Steiner arborescence based
Element Used Available Utilization
mechanism when number of destinations is less than 10 for B4
topology, and less than 6 for Internet2 topology. The perfor- LUT 69052 433200 15.94%
mance difference is minimal for larger number of destination. BRAM 1365 1470 92.86%
The reason is that for larger number of replicas/destinations,
Table 1: Resource utilization of CodedBulk implementation on Xil-
each source is multicasting to almost all other nodes in the
inx Virtex-7 XC7VX690T FPGA (250 MHz clock). Our implemen-
network; in such cases, the benefits of coding reduce when tation provides up to 31.25 Gbps throughput with 15.94% LUTs and
compared to forwarding and mirroring of data at intermediate 92.86% BRAMs. No DSP is needed in our design.
nodes and at the destination nodes as in Steiner arborescence
based mechanism. Thus, the benefits of CodedBulk may be
more prominent when the number of replicas is a bit smaller with a single 16-core server being able to perform network
than the total number of datacenters in the network. coding at line rate for as much as 31Gbps worth of traffic.
[3] Creating and using active geo-replication – [18] M. X. Goemans and Y.-S. Myung. A catalog of steiner
Azure SQL database. https : / / docs . microsoft . tree formulations. Networks, 23(1):19–28, 1993.
com / en-us / azure / azure-sql / database /
[19] A. Gupta, F. Yang, J. Govig, A. Kirsch, K. Chan, K. Lai,
active-geo-replication-overview.
S. Wu, S. G. Dhoot, A. R. Kumar, A. Agiwal, S. Bansali,
[4] Geo-replication/multi-AR. http://cloudbasic.net/ M. Hong, J. Cameron, M. Siddiqi, D. Jones, J. Shute,
documentation/geo-replication-active/. A. Gubarev, S. Venkataraman, and D. Agrawal. Mesa:
Geo-replicated, near real-time, scalable data warehous-
[5] The Internet2 network. https://internet2.edu/. ing. VLDB, 2014.
[6] Mapping Netflix: Content delivery network spans [20] J. Hansen, D. E. Lucani, J. Krigslund, M. Médard, and
233 sites. http : / / datacenterfrontier . com / F. H. Fitzek. Network coded software defined network-
mapping-netflix-content-delivery-network/. ing: Enabling 5G transmission and storage networks.
IEEE Communications Magazine, 53(9):100–107, 2015.
[7] Steiner tree problem. https://en.wikipedia.org/
wiki/Steiner_tree_problem. [21] M. Hauptmann and M. Karpiński. A compendium on
Steiner tree problems. Inst. für Informatik, 2013.
[8] Using replication across multiple data centers.
https://docs.oracle.com/cd/E19528-01/819-0992/ [22] T. Ho, M. Médard, R. Koetter, D. R. Karger, M. Effros,
6n3cn7p3l/index.html. J. Shi, and B. Leong. A random linear network coding
approach to multicast. IEEE Transactions on Informa-
[9] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung.
tion Theory, 52(10):4413–4430, 2006.
Network information flow. IEEE Transactions on Infor-
mation Theory, 46(4):1204–1216, 2000. [23] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill,
M. Nanduri, and R. Wattenhofer. Achieving high utiliza-
[10] M. Braverman, S. Garg, and A. Schvartzman. Coding
tion with software-driven WAN. In SIGCOMM, 2013.
in undirected graphs is either very helpful or not helpful
at all. In ITCS, 2017. [24] S. Jaggi, P. Sanders, P. A. Chou, M. Effros, S. Egner,
K. Jain, and L. M. Tolhuizen. Polynomial time algo-
[11] M. Charikar, C. Chekuri, T.-Y. Cheung, Z. Dai, A. Goel,
rithms for multicast network code construction. IEEE
S. Guha, and M. Li. Approximation algorithms for
Transactions on Information Theory, 51(6):1973–1982,
directed steiner problems. Journal of Algorithms,
2005.
33(1):73–91, 1999.
[25] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski,
[12] J. Cheriyan and M. R. Salavatipour. Hardness and ap-
A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu,
proximation results for packing steiner trees. Algorith-
J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Expe-
mica, 45(1):21–43, 2006.
rience with a globally-deployed software defined WAN.
[13] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, In SIGCOMM, 2013.
J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser,
[26] X. Jin, Y. Li, D. Wei, S. Li, J. Gao, L. Xu, G. Li, W. Xu,
P. Hochschild, et al. Spanner: Google’s globally dis-
and J. Rexford. Optimizing bulk transfers with software-
tributed database. ACM Transactions on Computer Sys-
defined optical WAN. In SIGCOMM, 2016.
tems, 31(3):8, 2013.
[27] S. Kandula, I. Menache, R. Schwartz, and S. R. Babbula.
[14] R.-J. Essiambre and R. W. Tkach. Capacity trends and
Calendaring for wide area networks. In SIGCOMM,
limits of optical communication networks. Proc. IEEE,
2014.
100(5):1035–1055, 2012.
[28] S. Katti, S. Gollakota, and D. Katabi. Embracing wire- [41] C. Özveren, R. Simcoe, and G. Varghese. Reliable and
less interference: Analog network coding. In SIG- efficient hop-by-hop flow control. In SIGCOMM, 1994.
COMM, 2012.
[42] G. Ramamurthy and B. Sengupta. A predictive hop-by-
[29] S. Katti, H. Rahul, W. Hu, D. Katabi, M. Médard, and
hop congestion control policy for high speed networks.
J. Crowcroft. XORs in the air: Practical wireless net-
In INFOCOM, 1993.
work coding. In SIGCOMM, 2006.
[30] R. Koetter and M. Médard. An algebraic approach to [43] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren.
network coding. IEEE/ACM Transactions on Informa- Inside the social network’s (datacenter) network. In
tion Theory, 11(5):782–795, 2003. SIGCOMM, 2015.
[31] N. Laoutaris, M. Sirivianos, X. Yang, and P. Rodriguez. [44] J. K. Sundararajan, D. Shah, M. Médard, M. Mitzen-
Inter-datacenter bulk transfers with NetStitcher. In SIG- macher, and J. Barros. Network coding meets TCP. In
COMM, 2011. INFOCOM, 2009.
[32] N. Laoutaris, G. Smaragdakis, R. Stanojevic, P. Ro-
driguez, and R. Sundaram. Delay tolerant bulk data [45] A. S. Tanenbaum and D. Wetherall. Computer Networks,
transfers on the Internet. IEEE/ACM Transactions on 5th Edition. Pearson, 2011.
Networking, 21(6):1852–1865, 2013.
[46] S.-H. Tseng, S. Agarwal, R. Agarwal, H. Ballani, and
[33] S.-Y. R. Li, R. W. Yeung, and N. Cai. Linear network A. Tang. Codedbulk: Inter-datacenter bulk transfers us-
coding. IEEE Transactions on Information Theory, ing network coding. Technical report, https://github.
49(2):371–381, 2003. com/SynergyLab-Cornell/codedbulk.
[36] E. Magli, M. Wang, P. Frossard, and A. Markopoulou. [49] Y. Wu, Z. Zhang, C. Wu, C. Guo, Z. Li, and F. C. Lau.
Network coding meets multimedia: A review. IEEE Orchestrating bulk data transfers across geo-distributed
Transactions on Multimedia, 15(5):1195–1212, 2013. datacenters. IEEE Transactions on Cloud Computing,
[37] P. P. Mishra and H. Kanakia. A hop by hop rate-based 5(1):112–125, 2017.
congestion control scheme. In SIGCOMM, 1992.
[50] Z. Wu, M. Butkiewicz, D. Perkins, E. Katz-Bassett, and
[38] M. Noormohammadpour, S. Kandula, C. S. Raghaven- H. V. Madhyastha. SPANStore: Cost-effective geo-
dra, and S. Rao. Efficient inter-datacenter bulk transfers replicated storage spanning multiple cloud services. In
with mixed completion time objectives. Computer Net- SOSP, 2013.
works, 164:106903, 2019.
[51] Y. Yi and S. Shakkottai. Hop-by-hop congestion control
[39] M. Noormohammadpour, C. S. Raghavendra, S. Kan-
over a wireless multi-hop network. IEEE/ACM Trans-
dula, and S. Rao. QuickCast: Fast and efficient inter-
actions on Networking, 15(1):133–144, 2007.
datacenter transfers using forwarding tree cohorts. In
INFOCOM, 2018.
[52] H. Zhang, K. Chen, W. Bai, D. Han, C. Tian, H. Wang,
[40] M. Noormohammadpour, C. S. Raghavendra, S. Rao, H. Guan, and M. Zhang. Guaranteeing deadlines for
and S. Kandula. DCCast: Efficient point to multipoint inter-data center transfers. IEEE/ACM Transactions on
transfers across datacenters. In HotCloud, 2017. Networking, 25(1):579–595, 2017.