0% found this document useful (0 votes)
2 views28 pages

Linear Network Coding for Robust Function

Uploaded by

SubratSahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views28 pages

Linear Network Coding for Robust Function

Uploaded by

SubratSahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

1

Linear Network Coding for Robust Function


Computation and Its Applications in
Distributed Computing
Hengjia Wei, Min Xu and Gennian Ge
arXiv:2409.10854v1 [cs.IT] 17 Sep 2024

Abstract

We investigate linear network coding in the context of robust function computation, where a sink node is tasked
with computing a target function of messages generated at multiple source nodes. In a previous work, a new distance
measure was introduced to evaluate the error tolerance of a linear network code for function computation, along
with a Singleton-like bound for this distance. In this paper, we first present a minimum distance decoder for these
linear network codes. We then focus on the sum function and the identity function, showing that in any directed
acyclic network there are two classes of linear network codes for these target functions, respectively, that attain the
Singleton-like bound. Additionally, we explore the application of these codes in distributed computing and design a
distributed gradient coding scheme in a heterogeneous setting, optimizing the trade-off between straggler tolerance,
computation cost, and communication cost. This scheme can also defend against Byzantine attacks.

Index Terms

Linear network coding, network function computation, sum networks, error correction, gradient coding

I. I NTRODUCTION

Network coding allows nodes within a network to encode the messages they receive and then transmit the
processed outputs to downstream nodes. In contrast to simple message routing, network coding has the potential
to achieve a higher information rate, which has garnered significant attention over the past two decades. When
the encoding function at each network node is linear, the scheme is referred to as linear network coding. Li et
al. [17] investigated the multicast problem, where a source node aims to send messages to multiple sink nodes,
and showed that a linear network coding approach using a finite alphabet is sufficient to achieve the maximum
information rate. Koetter and Médard [14] introduced an algebraic framework for linear network coding. Jaggi et al.

This project was supported by the National Key Research and Development Program of China under Grant 2020YFA0712100, the National
Natural Science Foundation of China under Grant 12231014 and Grant 12371523, Beijing Scholars Program, and the Pengcheng National
Laboratory project under Grant PCL2024AS103.
H. Wei (e-mail: [email protected]) is with the Peng Cheng Laboratory, Shenzhen 518055, China. He is also with the School of Mathematics
and Statistics, Xi’an Jiaotong University, Xi’an 710049, China, and the Pazhou Laboratory (Huangpu), Guangzhou 510555, China.
M. Xu (e-mail: [email protected]) is with the School of Statistics and Data Science, LPMC & KLMDASR, Nankai University, Tianjin
300071, China.
G. Ge (e-mail: [email protected]) is with the School of Mathematical Sciences, Capital Normal University, Beijing 100048, China.
2

[12] demonstrated that there is a polynomial-time algorithm for constructing maximum-rate linear network codes,
provided that the field size is at least as large as the number of sink nodes.
Network communications can encounter various types of errors, such as random errors from channel noise, erasure
errors due to traffic congestion, and malicious attacks by adversaries. Error correction in network communications
is more complex than in traditional point-to-point communications, as even a single error on one link can spread
to all downstream links, potentially corrupting all messages received by a sink node. Cai and Yeung [4], [5], [34]
addressed this issue by integrating network coding with error correction and introduced a new coding technique
called network error-correction coding, which mitigates errors by adding redundancy in the spatial domain rather
than the temporal domain. In [4], [31], [34], three well-known bounds from classical coding theory, including the
Singleton bound, are extended to network error-correction coding. Various methods have been proposed in [6], [7],
[18], [31], [35] to construct linear network codes that meet the Singleton bound.
This work focuses on linear network coding for robust function computation. In this scenario, a sink node is
required to compute a target function of source messages which are generated at multiple source nodes, while
accounting for the possibility that communication links may be corrupted by errors. Each intermediate node can
encode the messages it receives and transmit the resulting data to downstream nodes. Multiple communication
links between any two nodes are permitted, under the reasonable assumption that each link has a limited (unit)
communication capacity. The computing rate of a network code is defined as the average number of times the target
function can be computed without error per use of the network. The maximum achievable computing rate is known
as robust computing capacity, or computing rate in the error-free case. Some upper bounds on computing capacity
were provided in [2], [3], [8], [11] for the error-free case, with achievability demonstrated for certain network
topologies and target functions. These were recently extended in [29] to account for robust computing capacity.
For general network topologies and target functions, characterizing the (robust) computing capacity is a challeng-
ing problem. In this paper, we focus on linear target functions, specifically f (x) = x · T , where T ∈ Fqs×l . In the
error-free model, it has been proved that linear network coding can achieve the computing capacity for an arbitrary
directed acyclic network when l ∈ {1, s}, see [2], [22]. However, for 2 ≤ l ≤ s − 1, determining the computing
rate for a generic network remains an open problem. For robust computing, the authors of this paper proposed a
new distance measure in [29] to assess the error tolerance of a linear network code and derived a Singleton-like
bound on this distance. In the same paper, we also demonstrated that this bound is tight when the target function
is the sum of source messages and the network is a three-layer network. However, it is still unclear whether this
Singleton-like bound can be achieved in general networks.
In this paper, we continue the study on linear network coding for robust function computation and design linear
network codes that meet the Singleton bounds. Additionally, we explore the applications of these codes in distributed
computing, where a computation task is divided into smaller tasks and distributed across multiple worker nodes.
Our contributions are as follows:
1) In Section III we present a decoder for linear network codes designed for robust computing. While this decoder
is based on the minimum distance principle and may involve high time complexity, it offers valuable insights
into the workings of robust network function computation.
3

2) In Section IV and Section V, we consider the sum function and the identity function, respectively. For these
two target functions, we demonstrate the existence of linear network codes in any directed acyclic network
with distances meeting the Singleton bound, assuming the field size is sufficiently large. Using these codes, in
Section VI we derive some lower bounds on the robust computing capacity for f (x) = x · T with T ∈ Fs×l
q . In

particular, when l = 1 or s, we show that (scalar) linear network coding can either achieve the cut-set bound
on robust computing capacity or match its integral part, respectively.
3) Section VII establishes a connection between robust computing in a three-layer network and a straggler problem
in the context of distributed computing, where a straggler refers to a worker node that performs significantly
slower than other nodes. By applying linear network codes for the sum function, we design a distributed gradient
coding scheme in a heterogeneous setting, optimizing the trade-off between straggler tolerance, computation
cost, and communication cost.

II. P RELIMINARY

A. Network function computation model

Let G = (V, E) be a directed acyclic graph with a finite vertex set V and an edge set E, where multiple edges
are allowed between two vertices. For any edge e ∈ E, we use tail(e) and head(e) to denote the tail node and the
head node of e. For any vertex v ∈ V, let In(v) = {e ∈ E | head(e) = v} and Out(v) = {e ∈ E | tail(e) = v},
respectively.
In this paper, a network N over G contains a set of source nodes S = {σ1 , σ2 , . . . , σs } ⊆ V and a sink node
γ ∈ V \ S. Such a network is denoted by N = (G, S, γ). Without loss of generality, we assume that every source
node has no incoming edges. We further assume that there exists a directed path from every node u ∈ V \ {γ} to
γ in G. Then it follows from the acyclicity of G that the sink node γ has no outgoing edges.
In the network function computing problem, the sink node γ needs to compute a target function f of the form

f : As −→ O,

where A and O are finite alphabets, and the i-th argument of f is generated at the source node σi . Let k and n be
two positive integers, and let B be a finite alphabet. A (k, n) network function computing code (or network code
for short) C over B enables the sink node γ to compute the target function f k times by transmitting at most n
symbols in B on each edge in E, i.e., using the network at most n times.
In this paper, we focus on the problem of computing linear functions by linear codes. We assume that A = B =
O = Fq and the target function has the form f (x) = x·T for some s×l matrix T over Fq , where 1 ≤ l ≤ s. Without
loss of generality, we further assume that T has full column rank, namely, its columns are linearly independent.
Suppose that every source node σi generates a vector xi = (xi1 , · · · , xik ) of length k over A. Denote the vector of
all the source messages by xS ≜ (x1 , · · · , xs ). Computing the target function k times implies that the sink node
requires
f (xS ) ≜ xS · (T ⊗ Ik ),
4

where Ik is the k × k identity matrix and ⊗ is the Kronecker product.


A (k, n) network code is called linear if the message transmitted by each edge e is a linear combination of the
messages received by tail(e). In this paper, we mainly study the case of n = 1, that is, the message transmitted by
each edge is an element of Fq . Such a network code is known as a scalar network code. Specifically, in a (k, 1)
linear network code over Fq , the message ue ∈ Fq transmitted via edge e has the form

k
 P xij k
(i,j),e , if tail(e) = σi for some i;



j=1
ue = P (1)
ud kd,e , otherwise,




d∈In(tail(e))

where k(i,j),e , kd,e ∈ Fq , and k(i,j),e is zero if e is not an outgoing edge of some source node σi ∈ S and kd,e
is zero if e is not an outgoing edge of head(d). So, each ue can be written as a linear combination of the source
messages:
ue = xS · fe ,

where fe ∈ Fsk
q .

Let y ∈ (Fq )|In(γ)| be the message vector received by the sink node γ. Denote F ≜ (fe : e ∈ In(γ)). Then

y = xS · F.

The matrix F is called the global encoding matrix. Denote K ≜ (kd,e )d∈E,e∈E , and Bi ≜ (k(i,j),e )j∈[k],e∈E where
i = 1, 2, . . . , s and [k] denotes the set {1, 2, . . . , k}. In this paper, K is referred to as transfer matrix and Bi ’s are
referred to as source encoding matrices. For a subset of links ρ ⊆ E, let Aρ = (Ad,e )d∈ρ,e∈E where

1, if d = e;

Ad,e =
0, otherwise.

Since the network is finite and acyclic, it is easy to see that the global encoding matrix
 
B1
 
B2 
 
−1 ⊤
F =  ..  (I − K) AIn(ρ) . (2)

 . 
 
Bs

Fnq → Fkl sk
Q
If there is a decoding function ϕ : In(γ) q such that for all xS ∈ Fq ,

ϕ(xS · F ) = xS · (T ⊗ Ik ),

then we say this (scalar) linear network code enables the sink node to compute the target function with rate k.
It is of particular interest to determine the maximum computing rate for a given network and a specific target
function. An upper bound for this rate can be derived using the network’s cut. To proceed, we first introduce
some necessary concepts. An edge sequence (e1 , e2 , · · · , en ) is called a path from node u to node v if tail(e1 ) =
5

u, head(en ) = v and tail(ei+1 ) = head(ei ) for i = 1, 2, · · · , n − 1. For a vertex v and a path P , we say v ∈ P if
there is an edge e ∈ P such that tail(e) = v or head(e) = v. For two nodes u, v ∈ V, a cut of them is an edge
set C such that every path from u to v contains at least one edge in C. If C is a cut of γ and some source node
σi , then we simply call it a cut of the network. Let Λ(N) be the collection of all cuts of the network N. For a cut
C ∈ Λ(N), define

IC ≜ {σ ∈ S | there is no path from σ to γ after deleting the edges in C from E}.

Lemma II.1 ( [29, Corollary II.1]). Given a network N and a linear target function f (x) = x · T with T ∈ Fs×l
q .

If there exists a (k, n) network code C computing f with rate k/n, then necessarily

|C|
k/n ≤ min ,
C∈Λ(N) Rank(TIC )

where TIC is the |IC | × l submatrix of T which is obtained by choosing the rows of T indexed by IC .
Ps
For the sum function, i.e., f (x) = i=1 xi , Ramamoorthy demonstrated in [23, Theorem 2] that the upper bound
presented in Lemma II.1 can be achieved using linear network coding. For the identity function, i.e., T = Is , Rasala
Lehman and Lehman showed in [16, Theorem 4.2] that this bound can be attained simply through routing. For a
target function f (x) = x · T with l = s − 1, Appuswamy and Franceschetti [1] explored the achievability of a
|C|
computing rate of one and showed that the condition 1 ≤ minC∈Λ(N) Rank(TIC ) in Lemma II.1 is also sufficient.
For general linear target functions with 2 ≤ l ≤ s − 2, the achievability of the bound in Lemma II.1 remains an
open problem.

B. Robust network function computation model

Let ue ∈ Fq be the message that is supposed to be transmitted by a link e. If there is an error in e, the message
transmitted by e, denoted by u ee = ue + ze for some ze ∈ Fq . We treat ze as a message,
ee , can be written as u
called error message, and the vector z = (ze : e ∈ E) is referred to as an error vector. An error pattern ρ is a set
of links in which errors occur. We say an error vector z matches an error pattern ρ, if ze = 0 for all e ∈
/ ρ.
According to (1), u
ee has the following form.

w
P
xij k(i,j),e + ze , if tail(e) = σi for some i;




j=1
u
ee = P
u
ed kd,e + ze , otherwise.




d∈In(tail(e))

It can also be written as a linear combination of the source messages and the errors, i.e.,

ee = (xS , z) · e
u fe ,

e ∈ (Fq )|In(γ)| be the message vector received by the sink


fe is known as extend global encoding vector. Let y
where e
fe : e ∈ In(γ)). Then
node γ. Denote Fe ≜ (e
e = (xS , z) · Fe,
y
6

and Fe is called the extended global encoding matrix. We may write


 
F
Fe =   , (3)
G

where F is the global encoding matrix and G is an |E| × |In(γ)| matrix over Fq which satisfies:

G = (I − K)−1 A⊤
In(ρ) . (4)

For a linear network code C which can compute the function f (x) = x · T with rate k, we say it is robust to τ
erroneous links if
xS F + zG ̸= x′S F + z′ G

|E|
for any xS , x′S ∈ Fsk ′
q and z, z ∈ Fq with xS (T ⊗ Ik ) ̸= x′S (T ⊗ Ik ) and wtH (z), wtH (z′ ) ≤ τ .
Denote
Φ ≜ {x · F | x(T ⊗ Ik ) ̸= 0, x ∈ Fsk
q }

and
∆(ρ) ≜ {z · G| z ∈ F|E|
q matching the error pattern ρ}.

/ Φ. The minimum distance of the network code C which computes the function f (x) = x · T with
Note that 0 ∈
rate k is defined as
dmin (C, T, k) ≜ min{|ρ| | Φ ∩ ∆(ρ) ̸= ∅}. (5)

The following result shows that the minimum distance defined above can be used to measure the error tolerance
of C for the target function f (x) = x · T .

Theorem II.1 ( [29, Theorem IV.1]). Let τ be a positive integer. For a linear network code C with target function
f (x) = x·T and computing rate k, it is robust to any error z with wtH (z) ≤ τ if and only if dmin (C, T, k) ≥ 2τ +1.

In [29], the authors derived a Singleton-like bound on dmin (C, T, k).

Theorem II.2 ( [29, Theorem IV.2]). Given a network N and a target function f (x) = x · T . Let k be a positive
integer. If there is a linear network code C computing f with rate k, then

dmin (C, T, k) ≤ min {|C| − k · Rank(TIC ) + 1},


C∈Λ(N)

where TIC is the |IC | × l submatrix of T corresponding to the source nodes in IC .

It has been shown in [29, Theoerm IV.4] that in a multi-edge tree network, for any linear target function f (x) =
x · T , this bound can be achieved if the field size is large enough.
In this paper, we focus on the cases where l ∈ {1, s} and explore the achievability of the Singleton-like bound
P
in arbitrary directed acyclic networks. For l = 1, the target function can be expressed as f (x) = i ti xi . W.l.o.g.,
7

we assume that each ti ̸= 0. Then Rank(TIC ) = 1 for every cut C ∈ Λ(N). We use min-cut(u, v) to denote the
size of the minimal cut between two nodes u and v. The Singleton-like bound then reads:

dmin (C, T, k) ≤ min {min-cut(σi , γ) − k + 1}.


σi ∈S

P
Since computing the function f (x) = i ti xi can be reduced to computing the sum by multiplying each source
message xi by a scalar ti , it suffices to consider the sum function, i.e., T = 1.
In [29], we studied a three-layer network with s source nodes in the first layer, N intermediate nodes in the
second layer, and a single sink node in the third layer. Each source node is connected to some intermediate nodes
in the second layer, and all intermediate nodes are connected to the sink node in the third layer. It is proven in [29]
that for the sum function and any arbitrary three-layer network, the Singleton-like bound can be achieved as long
as the field size is larger than the number of intermediate nodes.

Theorem II.3 ( [29, Theorem IV.3]). Let N be a three-layer network. Let c∗ = minσi ∈S |Out(σi )| be the minimum
out-degree of the source nodes1 . Assume that q − 1 ≥ N . Then there is a linear network code C over Fq which can
compute the sum of the source messages with rate k and minimum distance

dmin (C, 1, w) = c∗ − k + 1.

In Section IV, we generalize this result and show that the Singleton-like bound can be achieved for the sum
function in any directed acyclic network, provided that the field size is sufficiently large.
In Section V, we study the case of l = s. Since T ∈ Fs×s
q has full rank, we have Rank(TIC ) = |IC | for any
C ∈ Λ(N). The Singleton-like bound then reads

dmin (C, T, k) ≤ min {|C| − k · |IC | + 1}.


C∈Λ(N)

We will show that for the identity function, i.e., T = I, this bound is achievable. Since the sink node can compute
x · T as long as it recovers x, this conclusion also applies to any invertible matrix T ∈ Fqs×s .
Given a network N with a target function f and an error-tolerant capability τ , the robust computing capacity is
defined as

C(N, f, τ ) ≜ sup{k/n | there is a (k, n) network code that can compute f against τ errors}.

For τ = 0, the robust computing capacity is also referred to as computing capacity. Some upper bounds on robust
computing capacity have been derived in [29]. We will use the linear network codes presented in Section IV and
Section V, along with the time-sharing technique, to derive some lower bounds on the robust computing capacity
for any linear target function f (x) = x · T . Notably, when l = 1, the lower bound meets the upper bound; when
l = s, the lower bound can achieve the integral part of the upper bound.

1 In a three-layer network, we have Out(σi ) = min-cut(σi , γ).


8

III. D ECODING FOR ROBUST N ETWORK F UNCTION COMPUTATION

In this section, we present a minimum distance decoder to illustrate the mechanism of robust network function
computation. We first define a new metric. Let C be a linear network code for a network N to compute a target
|In(γ)|
function f (x) = x · T . For two vectors y1 , y2 ∈ Fq , their distance with respect to C, denoted by dC (y1 , y2 ),
is defined as
dC (y1 , y2 ) ≜ min{wtH (z) | z · G = y1 − y2 },

where G is the |E|×|In(γ)| submatrix of the extended global encoding matrix of C which is defined in Eq. (4). Noting
that the rows of G corresponding to the incoming links of γ form an identity matrix, dC (y1 , y2 ) is well-defined.
It is straightforward to verify that dC (y1 , y2 ) is indeed a metric.
Intuitively, dC (y1 , y2 ) represents the minimum number of communication links in which an adversary must inject
errors to transform the network output y1 into y2 . In [29], the distance of a linear network computing code was
defined using dC (·, ·), and it was shown in [29, Lemma IV.1] that this definition is equivalent to the one provided
in (5). Specifically, we have the following equality:

dmin (C, T, k) = min{dC (xF, x′ F ) | x, x′ ∈ Fsk ′


q and x(T ⊗ Ik ) ̸= x (T ⊗ Ik )}. (6)

Now, we can present the decoder. Let

A ≜ {x(T ⊗ Ik ) | x ∈ Fsk
q }

be the set of all possible computing results. For each a ∈ A, denote

Φa ≜ {xF | x(T ⊗ Ik ) = a}.

Given a received message ỹ, if there is a unique a ∈ A such that Φa contains at least one vector y with dC (y, ỹ) ≤ τ ,
the decoder then outputs a as the computing result; otherwise, it outputs an “error”.
The following theorem justifies this decoding method.

Theorem III.1. Let C be a linear network code for a network N with d(C, T, k) ≥ 2τ + 1. If there are at most τ
erroneous links in the network, then the decoder above outputs the correct computing result.

Proof: For a received message ỹ, since there are at most τ errors, ỹ can be written as ỹ = xF + zG for some
|E|
vectors x ∈ Fsk
q and z ∈ Fq with wtH (z) ≤ τ . Let y = xF , then

dC (y, ỹ) ≤ τ. (7)

This shows that the correct computing result a = x(T ⊗ Ik ) satisfies the condition that Φa contains a vector y with
dC (y, ỹ) ≤ τ .
For another vector y′ = x′ F such that x′ (T ⊗ Ik ) ̸= x(T ⊗ Ik ), by the triangle inequality, we have that

dC (y′ , ỹ) ≥ dC (y′ , y) − dC (y, ỹ).


9

By (6), we have dC (y, y′ ) ≥ dmin (C, T, k). Hence,

dC (y′ , ỹ) ≥ dC (y, y′ ) − dC (y, ỹ) ≥ dmin (C, T, k) − dC (y, ỹ) ≥ τ + 1. (8)

Eq. (8) implies that for any a′ ̸= a, Φa′ does not contain any vector y′ with dC (y′ , ỹ) ≤ τ . Thus, the decoder can
output the correct computing result.
It is worth noting that the set Φa corresponds to a codeword in conventional coding theory, and the code C
encodes the computing result a into Φa . The sink node receives an erroneous copy y
e of a vector y of Φa . Since
vectors from different Φa ’s have a large distance (with respect to C), the minimum distance decoder allows us to
decode a, rather than y, from the received y
e.
We now shift our focus to addressing link outages, i.e., links failing to transmit messages. Recall that if there is
no error, the message transmitted by a link e should be
X
ue = ud kd,e . (9)
d∈In(tail(e))

Hence, if outages occur in a subset of links ρ, in order to transmit the messages received by tail(e), the node
can set ud = 0 for d ∈ In(tail(e)) ∩ ρ and then use (9) to encode the messages. In this way, the outages can be
translated to the errors defined in Subsection II-B. Therefore, a code with dmin (C, T, k) ≥ 2τ + 1 is also robust to
τ outages.
In the following, we assume that the sink node is aware of the locations of outages. Under this assumption, similar
to conventional codes with a minimum Hamming distance d that can correct d − 1 erasure errors, the network code
C can tolerate dmin (C, T, k) − 1 outages.

Theorem III.2. Let C be a linear network code with dmin (C, T, k) ≥ τo + 1. If there are at most τo outages in the
network and their locations are known to the sink node, then C is robust against these outages.

Proof: Let ρo denote the set of links where the outages occur, where |ρo | ≤ τo . The received messages at the
sink node can be expressed as
y
e = xF + zG + zo ,

|E|
where x ∈ Fsk
q , z ∈ Fq is an imaginary error vector matching ρo , and zo ∈ {0, ⋆}|In(γ)| is an indicator vector
with the symbol ⋆ representing an outage in the incoming links of γ. We define ⋆ + x = ⋆ for all x ∈ Fq . The
nonzero entries of z are chosen such that for every d ∈ ρo \In(γ) the message received by head(d) is ũd = 0.
|E|
Now, suppose to the contrary that there is another x′ ∈ Fsk ′
q and z ∈ Fq matching ρo such that

x · (T ⊗ Ik ) ̸= x′ · (T ⊗ Ik )

and
xF + zG + zo = x′ F + z′ G + zo .
10

Noting that G contains an |In(γ)| × |In(γ)| identity matrix, we have

(x − x′ )F = (u′ − u)G

|E|
for some vectors u, u′ ∈ Fq , both of which match ρo . By (5), the size of the support of u′ − u is at least
dmin (C, T, k). However, since both u and u′ match ρo , then the support of u′ − u is contained in ρo . It follows
that |ρo | ≥ dmin (C, T, k) ≥ τo + 1, which contradicts the assumption that |ρo | ≤ τo .
The proof of the lemma above leads to the following decoder for the outages. Let A be the set of all possible
computing results. For each a ∈ A and a subset ρ ⊆ E, denote

Φa,ρ ≜ {xF + zG + zρ | x(T ⊗ Ik ) = a, z matching ρ},

where zρ ∈ {0, ⋆}In(γ) indicates the links of In(γ) ∩ ρ. Then for a received message y
e ∈ (Fq ∪ {⋆})In(γ) and the
set of outage locations ρo , the proof above shows there is a unique a ∈ A such that y
e ∈ Φa,ρo .

Remark III.1. If the outages occur only in the incoming links of γ, then the message received at the sink node,
e = x · F + zo , is an erroneous copy of the vector y = x · F with |ρo | erasures. In this case, we can use the
y
Hamming metric to design a simpler decoder. Let a ∈ A such that y ∈ Φa . For any y′ ∈ Φa′ with a′ ̸= a, the
Hamming distance between y and y′ satisfies
(∗) (∗∗)
dH (y, y′ ) ≥ dC (y, y′ ) ≥ dmin (C, T, k) > |ρo |,

where the inequality (∗) holds because dC (y, y′ ) = min{wtH (z) | z·G = y −y′ } and the rows of G corresponding
to the incoming links of γ form an identity matrix, and the inequality (∗∗) follows from Eq. (6). Hence, for the
e , there is a unique a ∈ A such that Φa contains a vector y that matches y
received message y e on all the non-star
components. The decoder then outputs this a as the computing result.

IV. C OMPUTING THE S UM F UNCTION AGAINST E RRORS

The problem of network function computation, particularly the sum function, has been extensively studied in
the literature [15], [20]–[24], [26], [28] under the assumption that there are no errors in the network. When there
is only one sink node, linear network coding can achieve the computing capacity for an arbitrary directed acyclic
network and the sum function. This coding scheme is derived from the equivalence between computing the sum
and multicasting source messages in the reverse network. For a network N, its reverse network Nr is obtained by
reversing the direction of the links and interchanging the roles of source nodes and sink nodes. In the scenario where
no error links occur, computing the sum in N is equivalent to multicasting source messages to all the sink nodes
in the reverse network Nr . Specifically, if there is a linear network code for Nr that can multicast the messages
generated at the source sink to all the sink nodes with an information rate h, then by reversing the network and
dualizing all local behaviors, one can obtain a linear network code for N that computes the sum of sources with
a computing rate h, as demonstrated in [15, Theorem 5]. Conversely, the same principle applies in the reverse
11

direction. By leveraging this connection, it can be shown that the cut-set bound on the computing capacity can be
attained for a generic network by linear network coding.
However, if there are errors in the links, this equivalence no longer holds. As shown in [29, Example IV.1], the
dual of a linear network code that is resilient to a single error in the multicast problem cannot correctly compute
the sum in the reverse network when an error occurs. In this section, we will show that although this equivalence
does not hold, the cut-set bound on the robust computing capacity still can be achieved by linear network coding.
Our approach is to modify the dual of a linear network code for the multicast problem without error tolerance to
obtain a linear network code, denoted by C, capable of computing the sum with a certain level of error tolerance,
albeit at a lower computing rate of k. Surprisingly, the distance of this code attains the Singleton bound, that is,

dmin (C, 1, k) = min {|C| − k + 1}.


C∈Λ(N)

The proposed coding scheme consists of the following two parts:


1) Internal coding. In this part, we design the transfer matrix of C, which describes the encoding functions at
all internal nodes, i.e., the nodes in V\(S ∪ {γ}). Consider a multicast problem in the reverse network without
link errors. It is well known that if the field size is larger than the number of sink nodes, then a linear network
code exists, which can multicast messages at a rate of h ≜ minC∈Λ(N) {|C|}, e.g., see [12]. Let K denote the
transfer matrix of this code. We use the transpose K ⊤ as the transfer matrix of the computing code C. Note
that the matrix G of C can then be determined via (4).
2) External/source coding. In this part, we carefully design the source encoding matrices Bi such that the
distance of the proposed network coding scheme C achieves the Singleton-like bound. Recall that

dmin (C, T, k) = min{|ρ| | Φ ∩ ∆(ρ) ̸= ∅},

where
Φ = {x · F | x(T ⊗ Ik ) ̸= 0, x ∈ Fsk
q }

and
∆(ρ) = {z · G| z ∈ F|E|
q matching the error pattern ρ}.

|In(γ)|
Given a computing rate k which is smaller than h, we first choose a subspace W of Fq which intersects
each ∆(ρ) trivially, where |ρ| ≤ h − k. Noting that F is determined by K and Bi ’s via (2), which in turn
determines Φ, we then design the source encoding Bi to ensure that Φ is contained in W . In this manner, the
condition in (5) is fulfilled, and so, the distance of code achieves the upper bound.
The internal coding part is straightforward, while the source coding part is more intricate. We first use an example
to illustrate our approach.

Example IV.1. Consider the network shown on the left of Fig. 1. It has two source nodes σ1 , σ2 and a sink node
γ. The min-cut capacity between each source σi and γ is 3. Let q ≥ 3 be a prime power and Fq be a finite field
of size q. We are going to design a linear network coding scheme over Fq which can compute the sum x1 + x2 at
12

Fig. 1. The network on the left is a sum-network, where each source σi generates a message xi and the sink node wants to compute the sum
x1 + x2 . The network on the right is a multicast network, along with a coding scheme which achieves the maximum communication rate 3.

Fig. 2. A coding scheme designed to compute x1 + x2 over Fq , where q is odd.

a rate of 1 even when one link is corrupted.


Consider the reverse network shown on the right of Fig. 1, together with a mutlcast coding scheme with an
information rate of 3. Note that in this scheme every internal node simply adds up the received messages and
transmits the result to the downstream nodes. Thus, its transfer matrix K is a binary matrix and Kd,e = 1 if and
only if head(d) = tail(e). In the sum network, we also require every internal node to add up the received messages
and transmit the result to the downstream nodes, so that the transfer matrix is K ⊤ . Once the transfer matrix is
fixed, the submatrix G in the extended global encoding matrix Fe is determined. In this instance, G consists of 12
rows, where the vectors (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 0, 1) appear three times each.
13

Fig. 3. A coding scheme designed to compute x1 + x2 over Fq , where q is even.

For the source encoding, we proceed with two cases. First assume that q is odd. In this case, we encode x1 into
−x1 , 2x1 and x1 at σ1 , and encode x2 into x2 , x2 and x2 at σ2 . The entire coding scheme C is illustrated in
Fig. 2. Then the global encoding matrix  
1 1 2
F = .
1 1 2

If there are no link errors, the sink node γ receives y = (x1 , x2 ) · F = (x1 + x2 , x1 + x2 , 2(x1 + x2 )), and so, it
can compute the sum x1 + x2 . Furthermore, note that

Φ = {(x1 , x2 ) · F | x1 + x2 ̸= 0, x = (x1 , x2 ) ∈ F2q } = {x · (1, 1, 2) | x ∈ Fq \{0}}

and that the vector (1, 1, 2) is not contained in any subspace spanned by at most two vectors of

{(1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 0, 1)}.

It follows that for any ρ ⊂ E with |ρ| = 2, we have that

Φ ∩ ∆(ρ) = ∅.

Hence, the proposed coding scheme has distance at least 3, and so, can tolerate one link error.
For the case where q is an even prime power, let ω be a primitive element of F∗q . We encode x1 into ωx1 , x1
and (1 + ω)x1 at σ1 , and encode x2 into x2 , x2 and ωx2 at σ2 . The entire coding scheme C is illustrated in Fig. 3.
Then the global encoding matrix  
1 1 1+ω
F = .
1 1 1+ω

By the same argument as above, one can show that the distance of the code is 3.
14

In the following, we give a decoding algorithm, which is simpler than the minimum distance decoder presented in
Section III. Suppose that there is a link error and q is odd. The sink node γ gets y
e = y +z where y = (x1 , x2 )·F =
(x1 + x2 , x1 + x2 , 2(x1 + x2 )) and z is contained in one of the following spaces:

⟨(1, 0, 0)⟩, ⟨(0, 1, 0)⟩, ⟨(0, 0, 1)⟩, ⟨(1, 0, 1)⟩.

Assume that y
e = (e
y1 , ye2 , ye3 ) and y = (y1 , y2 , y3 ). Consider the following decoder:

ye1 if 2e

y1 = ye3 ,
D(e y) =
ye2 otherwise.

y1 = ye3 , then z must be contained in ⟨(0, 1, 0)⟩. Thus, ye1 = y1 = x1 + x2 . If 2e


If 2e y1 ̸= ye3 , necessarily ⟨(0, 1, 0)⟩
does not contain z, and so, ye2 = y2 = x1 + x2 . Hence, D(e
y) can decode the sum x1 + x2 .

In general, the source encoding process can be outlined as follows. Once the internal encoding is fixed, the
submatrix G in the extended global encoding matrix Fe is determined. To achieve the Singleton bound, we aim to
find a k × |In(γ)| matrix2 D such that
(P1) D contains a k × k identity matrix I as submatrix;
(P2) the row space of D intersects trivially with any space spanned by at most h − k rows of G, where h =
minC∈Λ(N) {|C|}.
Our internal encoding allows us to design source encoding such that
s
!
X
xS · F = xi · D, (10)
i=1

This equality, combined with properties (P1) and (P2), ensures that the proposed code can achieve the Singleton-like
bound.
To guarantee the existence of the matrix D, we introduce the following lemma.

Lemma IV.1. Let h, k and E be fixed positive integers such that 0 < h − k ≤ E. Let q be a sufficiently large
prime power. Then we have that !
    
h k(h−k) E h
−q < ,
k q h−k k q

k−1
q h −q i
h Q
where k q ≜ q k −q i
is the Gaussian coefficient.
i=0

Proof: Let

A ≜ (q h − 1)(q h − q) · · · (q h − q k−1 ),

B ≜ (q h − q h−k )(q h − q h−k+1 ) · · · (q h − q h−1 ),

C ≜ (q k − 1)(q k − q) · · · (q k − q k−1 ).

2 In Example IV.1, we have D = (1, 1, 2).


15

Then !
k−1
X
kh i
A=q − q q h(k−1) + O(q k−1+k−2 q h(k−2) ) = q kh − O(q kh−(h−k)−1 )
i=0

and
B = q kh − q kh−1 + O(q kh−2 ).

It follows that
A − B = q kh−1 + O(q kh−2 ).

Hence, for fixed k, h, E and sufficiently large q, we have that


 
E
(A − B) < A.
h−k

Dividing both sides by C and noting that hk q = A/C and q k(h−k) = B/C, we then get
 

  !   
h k(h−k) E h
−q < .
k q h−k k q

Theorem IV.1. Let N be an arbitrary directed acyclic network and k be a positive integer such that k ≤
minC∈Λ(N) {|C|}. If q is sufficiently large, then there is a linear network code C over Fq which can compute
the sum of the source messages with

dmin (C, 1, k) = min {|C| − k + 1}.


C∈Λ(N)

Proof: Denote h ≜ minC∈Λ(N) {|C|}. We assume that |Out(σi )| = h for each source node σi ∈ S and
|In(γ)| = h for the sink node γ, by adding auxiliary source and sink nodes and connecting each of them to the
original one by h links. Let Nr denote the reverse network of N. Consider a multicast problem over Nr , where
each node σi demands h messages generated at the node γ. We use a superscript r to distinguish the notations for
Nr from the ones for N. For example, Er denotes the set of links of Nr ; Outr (γ) denotes the set of outgoing links
of γ in Nr , which can be obtained by reversing each link of In(γ).
Since
min {|C|} = min {|C|} = h,
C∈Λ(Nr ) C∈Λ(N)

when q > s, there is a linear solution to this multicast problem. In other words, there are matrices B =
(ki,e )i∈[h],e∈Er and K = (kd,e )d∈Er ,e∈Er over Fq , where ki,e = 0 if e is not an outgoing edge of γ in Nr
and kd,e = 0 if e is not an outgoing edge of head(d) in Nr , such that
 
B · (I − K)−1 · A⊤
Inr (σ1 ) A⊤
Inr (σ2 ) ··· A⊤
Inr (σs )
= (F1 , F2 , . . . , Fs ) (11)

for some full rank matrices Fi ∈ Fh×h


q , where I is the |Er | × |Er | identity matrix.
Now, we direct our attention to the network N. We use the transpose of K to give the local encoding coefficients
16

for each node u ∈ V\(S ∪ {γ}), namely, for each link e ∈


/ ∪σi ∈S Out(σi ), let
X
ue = ke′ ,d′ ud ,
d∈In(tail(e))

where e′ and d′ are the corresponding links of e and d in Er , respectively.


We now present the local encoding coefficients for each source node σi such that the distance of the proposed
code attains the Singleton-like bound, i.e., dmin (C, 1, k) = minC∈Λ(N) {|C| − k + 1} = h − k + 1.
Recall that
n o
∆(ρ) = z · G| z ∈ F|E|
q matching the error pattern ρ

can be treated as a subspace of Fhq that is generated by |ρ| rows of G, where

G = (I − K ⊤ )−1 A⊤
In(γ) .

Given an (h−k)-dimensional subspace U of Fhq , the number of k-dimensional subspaces which intersect U trivially
is
(q h − q h−k )(q h − q h−k+1 ) · · · (q h − q h−1 )
= q k(h−k) .
(q k − 1)(q k − q) · · · (q k − q k−1 )

Since q is sufficiently large, according to Lemma IV.1, we have that


  !   
h k(h−k) |E| h
−q < .
k q h−k k q

It follows that there is a k-dimensional subspace W of Fhq such that

W ∩ ∆(ρ) = {0}, (12)

for any ρ ⊆ E with |ρ| ≤ h − k. Let D ∈ Fk×h


q be a matrix whose rows form a basis of W such that it contains
the k × k identity matrix Ik as a submatrix.
Noting that the support set of B is contained in the support set of AOutr (γ) , let B̃ be an h × h matrix over Fq
such that B = B̃ · AOutr (γ) . Since Fi ’s are invertible, by (11), B̃ is also invertible. For each i = 1, 2, . . . , s, let

Fi′ ≜ Fi⊤ · (B̃ ⊤ )−1 ∈ Fqh×h ,

and
Ei ≜ D · (Fi′ )−1 · AOut(σi ) . (13)

k×|E|
Then Ei ∈ Fq . Moreover, its columns which are indexed by the edges not in Out(σi ) are all-zero vectors.
Hence, each Ei can be used to describe local encoding coefficients at σi .
Let C be the coding scheme described by Ei ’s and K ⊤ . Now, we show that C can compute the sum function and
dmin (C, 1, k) = h − k + 1. We transpose both sides of (11). Noting that AInr (σi ) = AOut(σi ) , AOutr (γ) = AIn(γ)
17

and B ⊤ = A⊤ ⊤
Outr (γ) B̃ , we have that
   
AOut(σ1 ) F1⊤
   
 ⊤
AOut(σ2 )  F 
 
 · (I − K ⊤ )−1 · A⊤ · B̃ ⊤ =  2  . (14)
..  .. 

  In(γ)

 . 

 . 
 
AOut(σs ) Fs⊤

Multiplying both sides of (14) with (B̃ ⊤ )−1 , we have that


   
AOut(σ1 ) F′
   1
 ′
AOut(σ2 )  F2 
 
 · (I − K ⊤ )−1 · A⊤
.. In(γ) =  .  . (15)
  
 .. 
 

 . 
  
AOut(σs ) Fs′

Then for the network code C, its global encoding matrix


 
E1
 
E2 
 
⊤ −1 ⊤
F =  ..  · (I − K ) · AIn(γ)

 . 
 
Es
   
D · (F1′ )−1 AOut(σ1 )
   
′ −1
D · (F ) A
   
(13) 2 Out(σ 2 )
 · (I − K ⊤ )−1 · A⊤
   
=  · ..
  
..  In(γ)
 .   . 
   
D · (Fs′ )−1 AOut(σs )
     
D · (F1′ )−1 F′ D
   1  
D · (F2′ )−1  F2′  D
     
(15) 
=   ·  ..  =  ..  .
     
..
 .   .  .
     
D · (Fs′ )−1 Fs′ D

Therefore, for a vector xs = (x1 , x2 , . . . , xs ) ∈ Fsk


q , we have that

s s
!
X X
xS · F = (xi · D) = xi · D. (16)
i=1 i=1
Ps
Since D contains the identity matrix Ik , the sink node can receive i=1 xi , i.e., the coding scheme can compute
the sum k times. Furthermore,
( s
)
X (16)
Φ= xS · F xS = (x1 , x2 , . . . , xs ) ∈ Fsk
q , xi ̸= 0 = {x · D | x ∈ Fkq \{0}},
i=1

which consists of the nonzero vectors of the subspace W . Hence, for any ρ ⊆ E with |ρ| ≤ h − k, by (12), we
18

have that
Φ ∩ ∆(ρ) = ∅,

namely,
dmin (C, 1, k) ≥ h − k + 1.

V. C OMPUTING THE I DENTITY F UNCTION AGAINST E RRORS

In this section, we examine the case of l = s and show that the Singleton-like bound can still be achieved.
As discussed in Section II, it suffices to focus on the identity function. In this scenario, the computing problem
involves transmitting multiple source messages to a single sink node. In the error-free model, the achievability of the
cut-set bound on communication capacity is established by considering an augmented network, where an auxiliary
source node is connected to each original source node σj by Ri links. Here, the Ri ’s represent information rates
that satisfy the condition imposed by the cut-set bound. A solution that only uses message routing for the unicast
problem in this augmented network yields a solution for the multi-message transmitting problem in the original
network, as shown in [16, Theorem 4.2]. However, for the error correction problem, this approach may fail: in
an error-correcting scheme for unicasting in the augmented network, the outgoing messages from different source
nodes may be dependent as they originate from the auxiliary node, whereas the topology of the original network
requires messages from different source nodes to be independent.
Our proof utilizes the ideas and concepts presented in [9], [35], where linear network error-correction codes for
the multicast problem were studied. We begin by introducing some notation. Let ρ, ρ′ ⊆ E be two error patterns.
We say ρ′ dominates ρ if ∆(ρ) ⊆ ∆(ρ′ ) for any linear network code. This relation is denoted by ρ ≺ ρ′ . Let
Rank(ρ) be the rank of an error pattern ρ, which is defined as

Rank(ρ) ≜ min{|ρ′ | | ρ ≺ ρ′ }.

For a positive integer δ, let R(δ) be a collection of error patterns which is defined as

R(δ) ≜ {ρ | |ρ| = Rank(ρ) = δ}.

When T is the identity matrix, the Singleton-like bound in Theorem II.2 reads:

dmin (C, I, k) ≤ min {|C| − k|IC | + 1}.


C∈Λ(N)

Denote δ ≜ minC∈Λ(N) {|C| − k|IC |}. In order to prove that this bound is achievable, we need to show that there
exists a linear network code with dmin (C, I, k) > δ. Due to the definition of dmin (C, I, k), it suffices to prove that
for every ρ ∈ R(δ), Φ ∩ ∆(ρ) = ∅. The proof can be divided into two steps. First, we will show that for every
ρ ∈ R(δ), there are sk + δ edge-disjoint paths, where k paths are from σi (for each i ∈ [s]) to γ, and δ paths are
from ρ to γ. Second, for every ρ ∈ R(δ), we define a dynamic set CU Tρ and update the global encoding vectors
of the edges in this set until all global encoding vectors have been updated.
19

Let N be a directed acyclic network. For an error pattern ρ ⊆ E, we construct a new network, denoted by Nρ ,
which is obtained by adding a new node σρ and creating a new link e′ = (σρ , head(e)) for each e ∈ ρ.

Lemma V.1. The rank of an error pattern ρ ⊆ E is equal to the size of the minimum cut between σρ and γ in the
network Nρ .

Proof: For an arbitrary linear network code of N, we can define a linear network code of Nρ by letting
ke′ ,d = ke,d for all e ∈ ρ and d ∈ Out(head(e)) and keeping all the other local encoding coefficients. Let G and
G′ be the submatrices of the extended global encoding matrices for N and Nρ , respectively. Then the row labeled
by e′ in G′ is equal to the row labeled by e in G. It follows that ∆(ρ) = ∆(ρ′ ), where ρ′ ≜ {e′ | e ∈ ρ}. Let Cσρ ,γ
be a minimum cut between σρ and γ. Since every path from σρ to γ must pass through Cσρ ,γ , the row labeled by
e′ must be a linear combination of the rows of G′ that are labeled by the links in Cσρ ,γ . Hence,

∆(ρ) = ∆(ρ′ ) ⊆ ∆(Cσρ ,γ ),

which implies that Rank(ρ) ≤ |Cσρ ,γ |.


On the other hand, for any linear network code, we have

Rank(ρ) = min{|ρ′ | | ρ ≺ ρ′ }

≥ min{dim(∆(ρ′ )) | ρ ≺ ρ′ }

≥ dim(∆(ρ)).

Take an arbitrary set of |Cσρ ,γ | edge-disjoint paths from σρ to γ. Construct a linear network code by setting the
local encoding coefficient kd,e = 1 if d, e belongs to the same path, and kd,e = 0 otherwise. For this particular
linear code, it is obvious that dim(∆(ρ)) = |Cσρ ,γ |. Therefore, we have Rank(ρ) ≥ |Cσρ ,γ |.

Lemma V.2. Let ρ ∈ R(δ) be an error pattern. In the network Nρ , we add a new source node σ ′ , together with
δ links from σ ′ to σρ , and k links from σ ′ to σi for each 1 ≤ i ≤ s. Then the size of the minimum cut between σ ′
and γ in this new network is equal to sk + δ.

Proof: Since |Out(σ ′ )| = sk + δ, the size of the minimum cut between σ ′ and γ is at most sk + δ. To show it
is at least this number, we consider an arbitrary cut C separating σ ′ and γ. Let C1 ≜ C ∩ Out(σ ′ ) and C2 ≜ C\C1 .
We proceed with the following cases.
1) If IC2 = ∅ and C2 is not a cut between γ and σρ , then it must be the case that C1 = Out(σ ′ ). Hence,
|C| ≥ |C1 | ≥ |Out(σ ′ )| = sk + δ.
2) If IC2 = ∅ and C2 is a cut between γ and σρ , then ∪si=1 In(σi ) ⊆ C1 , and |C2 | ≥ Rank(ρ) = δ (by
Lemma V.2). It follows that |C| = |C1 | + |C2 | ≥ sk + δ.
3) If IC2 ̸= ∅, then C2 is a cut of the original network N. It follows that |C2 | ≥ k|IC2 | + δ, as δ =
minC∈Λ(N) {|C|−k|IC |}. Note that C2 only separates γ from the source nodes in IC2 . Then ∪σi ∈S\IC2 In(σi ) ⊆
C1 , and so, |C1 | ≥ sk − |IC2 |k. Hence, |C| = |C1 | + |C2 | ≥ sk + δ.
20

Using this lemma, we can prove the following result.

Corollary V.1. For every error pattern ρ ∈ R(δ), there are (sk + δ) edge-disjoint paths, where δ paths are from
ρ to γ, each starting from a link in ρ, and k paths are from σi to γ for each 1 ≤ i ≤ s.

Proof: For every ρ ∈ R(δ), consider the network in Lemma V.2. Since the size of the minimum cut between
σ and γ is sk + δ, there are such many edge-disjoint paths from σ ′ to γ. Then we remove all the edges that are

not in the original network N from these paths. The resulting paths are the desired ones.
sk+|E|
fe ∈ F q
To present our coding scheme, we need more notations. Let e be the extended global encoding vector
fe can be indexed by the set [sk] ∪ E, that is,
of link e defined as in Section II. The components of e

fe (d) : d ∈ [sk] ∪ E).


fe = (e
e

For an error pattern ρ ⊆ E and an extended global encoding vector e


fe , we define three vectors as follows.
sk+|ρ|
feρ ∈ Fq
1) e fe (d) where d ∈
fe by removing all components e
is the vector obtained from e / [sk] ∪ ρ.
sk+|E|
2) feρ ∈ Fq fe (d), where d ∈
fe by replacing all components e
is the vector obtained from e / [sk] ∪ ρ, with 0.
c sk+|E|
3) feρ ∈ Fq fe (d), where d ∈ [sk] ∪ ρ, with 0.
fe by replacing all components e
is the vector obtained from e
The following theorem shows the attainability of the Singleton-like bound when T is an identity matrix.

Theorem V.1. Let N be a directed acyclic network. If the field size q ≥ |R(δ)|, then there is a linear network code
C for N such that dmin (C, I, k) = δ + 1, where δ = minC∈Λ(N) {|C| − k|IC |}

Proof: We extend the network N by assigning k imaginary message channels {d(i−1)k+1 , d(i−1)k+2 , · · · , dik }
to each source node σi and one imaginary error channel e′ to the tail of each edge e ∈ E. We denote this new
network as Ñ. For each ρ ∈ R(δ), let Pρ be a set of (sk + δ) edge-disjoint paths satisfying the property in
Corollary V.1. We denote the set of links on the paths in Pρ as Eρ .
We define a dynamic set of links CU Tρ for each ρ ∈ R(δ), and initialize it as

CU Tρ = {di | 1 ≤ i ≤ sk} ∪ {e′ | e ∈ ρ},

where e′ is the imaginary error channel to e. For all e ∈ E, we initialize e


fe = 0; for all d ∈ {di | 1 ≤ i ≤ sk} ∪ E′ ,
fd = 1d , where 1d denotes the binary unit vector with the entry labeled by d being ‘1’. For a set of
we initial e
vectors V , we use ⟨V ⟩ to denote the linear space that is spanned by the vectors in V . For any subset A ⊆ {di | 1 ≤
i ≤ sk} ∪ E ∪ E′ , we define four vector spaces as follows:

L̃(A) ≜ ⟨{e
fe | e ∈ A}⟩,

L̃ρ (A) ≜ ⟨{e


feρ | e ∈ A}⟩,

Lρ (A) ≜ ⟨{feρ | e ∈ A}⟩,


c c
Lρ (A) ≜ ⟨{feρ | e ∈ A}⟩.
21

sk+|ρ|
Note that the initialization above implies that L̃ρ (CU Tρ ) = Fq .
fe and CU Tρ from upstream to downstream until CU Tρ ⊆ In(γ) for all ρ ∈ R(δ). For a link
Next, we update e
e ∈ E, denote i = tail(e). If e ∈
/ ∪ρ∈R(δ) Eρ , let e
fe = 1e , and CU Tρ remains unchanged. If e ∈ ∪ρ∈R(δ) Eρ , we
choose a vector g
ee such that

c
ee ∈ L̃(In(i) ∪ {e′ })\ ∪{ρ | e∈Eρ } Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }) ,

g

where eρ is the previous link of e in Pρ , and the addition represents the sum of two vector spaces. The existence
of such a g
ee will be shown later. Next, we choose efe such that

g

e e + 1e if g
ee (e) = 0,
fe =
e
gee (e)−1 · g
ee otherwise.

For the dynamic set CU Tρ , if e ∈ CU Tρ , update CU Tρ = {CU Tρ \{eρ }} ∪ {e}. Otherwise, CU Tρ remains
unchanged.
fe for all e ∈ E, CU Tρ ⊆ In(γ) for every ρ ∈ R(δ).
After updating e
ee is equivalent to show that for q ≥ |R(δ)|,
To show the existence of g

c
L̃(In(i) ∪ {e′ })\ ∪{ρ | e∈Eρ } Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }) > 0.


Let ℓ = dim(L̃(In(i) ∪ {e′ })). For every ρ satisfying e ∈ Eρ , we have eρ ∈ In(i) ∪ {e′ }. Then e
feρ ∈ L̃(In(i) ∪
c c
{e′ }). However, e / Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }). This is because that e
feρ ∈ feρ = feρρ + feρρ , where feρρ ∈
/
c c c
/ Lρ (In(i) ∪ {e′ }), and feρρ ∈ Lρ (In(i) ∪ {e′ }). Therefore,
Lρ (CU Tρ \{eρ }), feρρ ∈
  c

dim L̃(In(i) ∪ {e′ }) ∩ Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }) ≤ ℓ − 1. (17)

Thus, we have

c
L̃(In(i) ∪ {e′ })\ ∪{ρ | e∈Eρ } Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ })


 c 
= L̃(In(i) ∪ {e′ }) − L̃(In(i) ∪ {e′ }) ∩ ∪{ρ | e∈Eρ } Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }) (18)
X
> qℓ − q ℓ−1 (19)
ρ∈R(δ)

≥ q ℓ−1 (q − |R(δ)|) ≥ 0.

Note that (18) ≥ (19) due to (17). Moreover, if the equality did hold, then necessarily |R(δ)| = 1, which is
impossible since δ < |C| for any C ∈ Λ(N).
fe ’s give rise to a linear network code C with dmin (C, I, k) =
Finally, we need to show that the encoding coefficients e
δ + 1. We will prove this by showing that during the updating process, dim(L̃ρ (CU Tρ )) = sk + δ for all ρ ∈ R(δ),
which in turn implies that ∆(ρ) ∩ Φ = ∅, as finally CU Tρ ⊆ In(γ).
In the initialization, we have dim(L̃ρ (CU Tρ )) = sk + δ. Consider a link e ∈ E, assume that all links before e
22

have been updated and dim(L̃ρ (CU Tρ )) = sk + δ. Recall that

c
ee ∈ L̃(In(i) ∪ {e′ })\ ∪{ρ | e∈Eρ } Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }) .

g

fdρ | d ∈ CU Tρ \{eρ }} are linearly independent for any ρ with e ∈ Eρ . Suppose to the
eeρ and {e
It follows that g
fdρ | d ∈ CU Tρ \{eρ }} are linearly dependent for some ρ. Then geρ ∈ Lρ (CU Tρ \{eρ }).
eeρ and {e
contrary that g
c c c
Note that geρ ∈ Lρ (In(i) ∪ {e′ }) as g
ee ∈ L̃(In(i) ∪ {e′ }). Thus, g
ee = geρ + geρ is a vector in the sum space
c
Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }), which contradicts to the choice of g fdρ | d ∈
feρ and {e
ee . Now, we show that e
CU Tρ \{eρ }} are also linearly independent.
ee (e) ̸= 0, since g
1) If g fdρ | d ∈ CU Tρ \{eρ }} are linearly independent and e
eeρ and {e ee (e)−1 g
fe = g ee , the statement
follows directly.
ee (e) = 0, we claim that e ∈
2) If g / ρ for any ρ ∈ R(δ) such that e ∈ Eρ . Suppose to the contrary that e ∈ ρ,
then eρ = e′ . Therefore, we have e fd (e) = 0 for d ∈ CU Tρ \{eρ }. Since g
feρ = 1e and e ee (e) = 0 and
dim(L̃ρ (CU Tρ )) = sk + δ, we have g
eeρ ∈ L̃ρ (CU Tρ \{eρ }). This implies that g
ee is a vector in the sum space
c
Lρ (CU Tρ \{eρ }) + Lρ (In(i) ∪ {e′ }), which also contradicts to the choice of g
ee . From the claim, it follows
feρ = g
that e fdρ | d ∈ CU Tρ \{eρ }} are linearly independent.
feρ and {e
eeρ , which in turn implies that e

Therefore, after updating CU Tρ by replacing eρ with e, we still have dim(L̃ρ (CU Tρ )) = sk + δ.

VI. B OUNDS ON ROBUST C OMPUTING C APACITY

In this section, we consider the robust computing capacity for linear target functions. First we have the following
cut-set bound.

Lemma VI.1. Let N be a directed acyclic network, f (x) = x · T be a linear function with T ∈ Fqs×l , and τ be a
positive integer. Then  
|C| − 2τ
C(N, f, τ ) ≤ min . (20)
C∈Λ(N) Rank(TIC )

Proof: This follows directly from [29, Theorem III.1] and the discussion preceding [29, Corollary II.1].
When l ∈ {1, s}, we have the following result.

Theorem VI.1. Let N be an arbitrary directed acyclic network and τ be a positive integer such that 2τ <
minC∈Λ(N) {|C|}. If q is sufficiently large, then we have

C(N, 1, τ ) ≥ min {|C| − 2τ }


C∈Λ(N)

and  
|C| − 2τ
C(N, I, τ ) ≥ min .
C∈Λ(N) |IC |

Proof: For T = 1, let k = minC∈Λ(N) {|C| − 2τ }. Theorem IV.1 shows that there is a linear network code
with
dmin (C, 1, k) = min {|C| − k + 1} = 2τ + 1.
C∈Λ(N)
23

nj ko
|C|−2τ
Similarly, for T = I, let k = minC∈Λ(N) |IC | . Then Theorem V.1 shows that there is a linear network code
with
dmin (C, I, k) = min {|C| − k|IC | + 1} ≥ 2τ + 1.
C∈Λ(N)

By Theorem II.1, these codes are resilient to τ errors.


Note that the above results show that linear network coding can achieve the cut-set bound for l = 1 and achieve
the integral part of the the cut-set bound for l = s.
When 1 < l < s, linear network coding scheme for the sum function, along with a technique of time-sharing,
can be used to derive a lower bound on the robust computing capacity for a generic linear function x · T .

Theorem VI.2. Let N be an arbitrary directed acyclic network and τ be a positive integer such that 2τ <
minC∈Λ(N) {|C|}. Let T ∈ Fs×l
q be a matrix of full column rank. If q is sufficiently large, then
 
|C| − 2τ
C(N, T, τ ) ≥ min . (21)
C∈Λ(N) l

Proof: Let w′ = minC∈Λ(N) {|C| − 2τ }. For 1 ≤ i ≤ l, let Ti denote the i-th column of T . We construct
a coding scheme for T that uses the network l times. In the i-th use of N, the sink node computes x · Ti . By
Theorem VI.1, there is a linear coding scheme that can compute the function f (x) w′ times, while tolerating τ
errors. Therefore, our scheme is able to reliably compute the target function f (x) w′ times by using the network l
times, which establishes the result.

VII. A PPLICATIONS IN D ISTRIBUTED C OMPUTING

In this section, we explore the applications of linear network codes for robust function computation within the
context of distributed computing, with a particular focus on the gradient coding problem [10], [19], [25], [27], [32].
Consider a data set D = {(xi , yi )}D p
i=1 with each tuple (xi , yi ) ∈ F × F. Numerous machine learning problems

wish to solve problems of the following form:


D
X
β ∗ = arg minp β ),
L(xi , yi ; β ) + λR(β
β ∈F
i=1

where L(·) is a loss function and R(·) is a regularization function. The most commonly used approach to solving
this problem involves gradient-based iterative methods. Let
D
X
g(t) ≜ ∇L(xi , yi ; β (t) ) ∈ Fp
i=1

be the gradient of the loss function at the tth step. Then the updates to the model are of the form:

β (t+1) = hR (β
β (t) , g(t) ),

where hR (·) is a gradient-based optimizer which also depends on R(·). As the size of the data set increases,
computing the gradient g(t) can become a bottleneck. One potential solution is to parallelize the computation by
distributing the tasks across multiple workers.
24

Assume that there are n worker nodes, denoted by W1 , W2 , · · · , Wn , and the data set D is partitioned into K
(t)
data subsets, denoted by D1 , D2 , · · · , DK . The partial gradient vector gi is defined as

(t)
X
gi ≜ ∇L(x, y; β (t) ).
(x,y)∈Di

Then
(t) (t) (t)
g(t) = g1 + g2 + · · · gk .

The master node initially assigns the data subsets D1 , D2 , · · · , DK to the worker nodes. Let Zi denote the set of
indices corresponding to the data subsets stored by worker node Wi . Each worker Wi computes the partial gradients
n o
(t) (t)
gj j ∈ Zi based on its assigned data subsets and then transmits a coded message fi (gj : j ∈ Zi ) ∈ Fp/m
to the master node, where fi is a linear function that encodes the partial gradients in Zi , and m is referred to as
communication reduction factor. Due to stragglers—worker nodes slowed down by unpredictable factors such as
network latency—the master node may not receive all the coded messages, but rather n − τ of them. It must then
decode the sum of partial gradients g(t) from these received messages. The primary problem is designing a gradient
coding scheme that includes data assignment and message encoding/decoding to increase straggler tolerance τ while
minimizing communication cost and computation cost. The communication cost can be parameterized by 1/m while
the computation cost can be parameterized by the number of worker nodes that each data subset is assigned. Since
the encoding functions fi ’s are time invariant, we omit the superscript (t) in the rest of this paper for simplicity of
notation.
The authors in [27] characterized the trade-off between the straggler tolerance and the computation cost when
the communication reduction factor m = 1. A gradient coding scheme which can achieve this trade-off was
also proposed. This scheme consists of a cyclic data assignment, where each worker node Wi is assigned with
Di , Di+1 , . . . , Di+w for some fixed w, and a random code construction. Subsequently, a deterministic code con-
struction based on cyclic MDS codes was proposed in [25] to replace the random code construction in [27]. For
general m ≥ 1, the authors in [32] characterized the optimal trade-off between straggler tolerance, computation
cost and communication cost. Among others, they proved the following converse bound.

Lemma VII.1 ( [33, Appendix A]). In a gradient coding scheme with n worker nodes, K data subsets with
communication reduction factor m and straggler tolerance τs , every data subsets must be assigned to at least
τs + m worker nodes.

A gradient coding scheme that achieves the converse bound was also proposed, where each data subset is assigned
to exactly τs + m worker nodes, and each worker node is assigned τs + m data subsets.
In the literature on gradient coding, it is typically assumed that the system is homogeneous, meaning all worker
nodes have the same storage capacity and computation speed. Consequently, in all the aforementioned works, each
worker node is assigned the same number of data subsets. In this section, we consider a heterogeneous scenario
where the worker nodes have varying storage capacities and computation speeds. Intuitively, worker nodes with
lower storage capacity and computation speed should be assigned less data to avoid slowing down the overall
25

computation time.
In the following, we first show that for an arbitrary data assignment, network coding can be used to design the
encoding functions fi ’s for the worker nodes, enabling the gradient coding scheme to achieve the converse bound
stated in Lemma VII.1. Then, we show how to design the data assignment to accommodate the heterogeneous
scenario.
For a given data assignment Z = {Zi | 1 ≤ i ≤ n}, we can construct a three-layer network N(Z ) as follows.
The nodes in the first layer are labeled by the data subsets D1 , D2 , . . . , DK , the nodes in the middle layer are
labeled by the worker nodes W1 , W2 , . . . , Wn , and the sink node corresponds to the master node. There is a link
from a node labeled by Di to a node labeled by Wj if and only if Di is assigned to Wj . Additionally, there is a
link from each node labeled by Wj to the sink node. We treat each partial gradient gi as a message generated at the
node Di . Since any worker node storing Di can compute coded message of gi , the problem of designing a gradient
coding scheme can be reduced to designing a network coding scheme that enables the sink node to compute the
Pk
sum g = i=1 gt even if there are τs outages in the incoming links to the sink node.

Proposition VII.1. Let τs and m be positive integers. For the three-layer network N(Z ), suppose that there is a
linear network code C which can compute the sum function with dmin (C, 1, m) ≥ τs + 1. Then there is a gradient
coding scheme, incorporating Z as the data assignment, with straggler tolerance τs and communication reduction
factor m.

Proof: For each partial gradient gj , we write it as3 gj = (gj (1), gj (2), . . . , gj (p/m)), where each gj (ℓ) ∈ Fm .
Let F be the global encoding matrix of C. Since there are K source nodes and n incoming links to the sink node,
then F ∈ F(mK)×n . For 1 ≤ i ≤ n, let fi be the column of F that corresponds to the link from the node labeled
by Wi to the sink node. Noting that the nonzero entries of fi correspond to the source nodes labeled by Dj such
that j ∈ Zi , we define the gradient encoding function for the worker Wi as

fi (gj : j ∈ Zi ) ≜ ((g1 (ℓ), g2 (ℓ), . . . , gK (ℓ)) · fi : 1 ≤ ℓ ≤ p/m) ∈ Fp/m .

Since dmin (C, 1, m) ≥ τs + 1, according to Remark III.1, even if there are τs outages in the incoming links of
PK PK
the master node, it still can decode the sum i=1 gi (ℓ), for all 1 ≤ ℓ ≤ p/m, and so, i=1 gi .

Theorem VII.1. Let τs and m be positive integers. Let Z be a data assignment with n worker nodes and K data
subsets such that every data subset is assigned to at least τs + m worker nodes. Then there is a gradient coding
scheme incorporating Z with communication reduction factor m and straggler tolerance τs .

Proof: Since each data subset is assigned to at least τs + m worker nodes, the minimum-degree of the source
nodes in N(Z ) is at least τs + m. By Theorem II.3, there is a linear network code C with dmin (C, 1, m) = τs + 1
as the field F is sufficiently larger. The conclusion then follows from Proposition VII.1.

3 We treat the partial gradient gj as a row vector.


26

In order to design a data assignment accommodating the heterogeneous scenario, we use the approach in [30].
For a collection of worker nodes W1 , W2 , . . . , Wn , we use a vector r = (r1 , r2 , . . . , rn ) ∈ Qn and a vector
s = (s1 , s2 , . . . , sn ) ∈ Qn to represent the storage capacity and computation speed, respectively, where ri is the
fraction of the data that is assigned to Wi and si is the fraction of data that Wi can compute per unit time.
For a collection of data subsets D = {D1 , D2 , . . . , DK } and a data assignment Z = {Z1 , Z2 , . . . , Zn }, the
vector µ = (µ1 , µ2 , . . . , µu ) ∈ Qn , where P
j∈Zi |Dj |
µi ≜ PK ,
j=1 |Dj |

is the computation load vector. Our goal is to minimize the overall computation time

µi
c(D, Z ) ≜ max ,
1≤i≤n si

while ensuring that each data subset is assigned to at least τs + m worker nodes.
This problem can be formulated as the following optimization problem:

minimize c(D, Z ) (22)


D,Z

subject to µi ≤ ri for all 1 ≤ i ≤ n, (23)

|{i | j ∈ Zi }| ≥ τs + m for all 1 ≤ j ≤ K. (24)

To solve this problem, we adopt the approach in [30] and decompose it into two sub-problems. The first one is the
following relaxed convex optimization problem to find the optimal computation load vector µ ∗ :

µi
minimize max
µ 1≤i≤n si

subject to µi ≤ ri for all 1 ≤ i ≤ n,


n
X
µi ≥ τs + m.
i=1

The solution to this problem can be found in [30, Theorem 1]. The second problem is to find data assignment
scheme Z , as well as data partition D, with computation load vector µ ∗ such that (24) holds. This is solved in
Pn
[30, Section V], using the fact that i=1 µ∗i ≥ τs + m.
Recently, the gradient coding problem was extended in [13] to compute a linearly separable function f , which
can be written as
f (D1 , D2 , . . . , DK ) = g(f1 (D1 ), f2 (D2 ), . . . , fK (DK )),

where g is a linear map defined by l linear combinations of fi (Di )’s.


Given the straggler tolerance τs , the authors in [13] examined the specific case where the computation cost
is minimum, and proposed novel schemes along with converse bounds for the optimal communication cost. The
proposed scheme is optimal under the constraint of cyclic data assignment. However, it is unknown whether this
scheme remains optimal if this constraint is removed. Therefore, it is of particular interest to investigate coding
27

schemes for other data assignments.


Using the same reasoning as in Proposition VII.1, this problem can be translated into a robust function computation
problem in a three-layer network with the target function f (x) = x · T , where T ∈ Fs×l . However, in a generic
three-layer network, this problem remains open when 2 ≤ l ≤ s − 1.
In this section, we have treated stragglers as communication outages and used linear network coding with a
distance of at least τs + 1 to mitigate their impact. It is worth noting that this approach can also be applied to
defend against Byzantine attacks, where some worker nodes send misleading or incorrect messages to the master
node, causing computation errors. In this case, we treat the incorrect messages as erroneous links and assume there
are at most τm malicious nodes. Theorem II.1 guarantees that a linear network code with a distance of at least
2τm + 1 can effectively counter such attacks. Similar to Proposition VII.1, we have the following result, the proof
of which is analogous and thus omitted here.

Proposition VII.2. Let τb and m be positive integers. For the three-layer network N(Z ), suppose there exists a
linear network code C that computes the target function f (x) = x · T with dmin (C, T, m) ≥ 2τb + 1. Then there is
a coding scheme, incorporating Z as the data assignment, which has communication reduction factor m and can
tolerate up to τb malicious nodes.

R EFERENCES

[1] R. Appuswamy and M. Franceschetti, “Computing linear functions by linear coding over networks,” IEEE Transactions on Information
Theory, vol. 60, no. 1, pp. 422–431, 2014.
[2] R. Appuswamy, M. Franceschetti, N. Karamchandani, and K. Zeger, “Network coding for computing: Cut-set bounds,” IEEE Transactions
on Information Theory, vol. 57, no. 2, pp. 1015–1030, 2011.
[3] ——, “Linear codes, target function classes, and network computing capacity,” IEEE Transactions on Information Theory, vol. 59, no. 9,
pp. 5741–5753, 2013.
[4] N. Cai and R. Yeung, “Network error correction, II: Lower bounds,” Communications in Information and Systems, vol. 6, no. 1, pp. 37–54,
2006.
[5] ——, “Network coding and error correction,” in Proceedings of the IEEE Information Theory Workshop, 2002, pp. 119–122.
[6] X. Guang, F.-W. Fu, and Z. Zhang, “Construction of network error correction codes in packet networks,” IEEE Transactions on Information
Theory, vol. 59, no. 2, pp. 1030–1047, 2013.
[7] X. Guang and R. W. Yeung, “A revisit of linear network error correction coding,” IEEE Journal on Selected Areas in Information Theory,
vol. 4, pp. 514–523, 2023.
[8] X. Guang, R. W. Yeung, S. Yang, and C. Li, “Improved upper bound on the network function computing capacity,” IEEE Transactions
on Information Theory, vol. 65, no. 6, pp. 3790–3811, 2019.
[9] X. Guang and Z. Zhang, Linear network error correction coding. Springer, 2014.
[10] W. Halbawi, N. Azizan, F. Salehi, and B. Hassibi, “Improving distributed gradient descent using reed-solomon codes,” in 2018 IEEE
International Symposium on Information Theory (ISIT), 2018, pp. 2027–2031.
[11] C. Huang, Z. Tan, S. Yang, and X. Guang, “Comments on cut-set bounds on network function computation,” IEEE Transactions on
Information Theory, vol. 64, no. 9, pp. 6454–6459, 2018.
[12] S. Jaggi, P. Sanders, P. A. Chou, M. Effros, S. Egner, K. Jain, and L. Tolhuizen, “Polynomial time algorithms for multicast network code
construction,” IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 1973–1982, Jun. 2005.
[13] W. Kai, S. Hua, M. Ji, and C. Giuseppe, “Distributed linearly separable computation,” IEEE Transactions on Information Theory, vol. 68,
no. 2, pp. 1259–1278, 2022.
[14] R. Koetter and M. Médard, “An algebraic approach to network coding,” IEEE/ACM Trans. Netw., vol. 11, no. 5, pp. 782–795, Oct. 2003.
[15] R. Koetter, M. Effros, T. Ho, and M. Médard, “Network codes as codes on graphs,” in Proceeding of CISS, 2004.
28

[16] A. R. Lehman and E. Lehman, “Complexity classification of network information flow problems,” in Proceedings of the fifteenth annual
ACM-SIAM symposium on Discrete algorithms, 2004, pp. 142–150.
[17] S. Y. R. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Transactions on Information Theory, vol. 49, no. 2, pp. 371–381,
Feb. 2003.
[18] R. Matsumoto, “Construction algorithm for network error-correcting codes attaining the singleton bound,” IEICE Trans. Fundam., vol.
E90-A, no. 9, pp. 1729–1735, 2011.
[19] E. Ozfatura, D. Gündüz, and S. Ulukus, “Speeding up distributed gradient descent by utilizing non-persistent stragglers,” in 2019 IEEE
International Symposium on Information Theory (ISIT), 2019, pp. 2729–2733.
[20] B. Rai, B. K. Dey, and S. Shenvi, “Some bounds on the capacity of communicating the sum of sources,” in Proceedings of the 2010 IEEE
Information Theory Workshop (ITW2010), Cairo, Egypt, 2010, pp. 119–122.
[21] B. K. Rai and N. Das, “Sum-networks: Min-cut = 2 does not guarantee solvability,” IEEE Communications Letters, vol. 17, no. 11, pp.
2144–2147, 2013.
[22] B. K. Rai and B. K. Dey, “On network coding for sum-networks,” IEEE Transactions on Information Theory, vol. 58, no. 1, pp. 50–63,
2012.
[23] A. Ramamoorthy, “Communicating the sum of sources over a network,” in 2008 IEEE International Symposium on Information Theory,
2008, pp. 1646–1650.
[24] A. Ramamoorthy and M. Langberg, “Communicating the sum of sources over a network,” IEEE Journal on Selected Areas in
Communications, vol. 31, no. 4, pp. 655–665, 2013.
[25] N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, “Gradient coding from cyclic mds codes and expander graphs,” IEEE Transactions on
Information Theory, vol. 66, no. 12, pp. 7475–7489, 2020.
[26] S. Shenvi and B. K. Dey, “A necessary and sufficient condition for solvability of a 3s/3t sum-network,” in 2010 IEEE International
Symposium on Information Theory, 2010, pp. 1858–1862.
[27] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: Avoiding stragglers in distributed learning,” in International
Conference on Machine Learning. PMLR, 2017, pp. 3368–3376.
[28] A. Tripathy and A. Ramamoorthy, “Sum-networks from incidence structures: Construction and capacity analysis,” IEEE Transactions on
Information Theory, vol. 64, no. 5, pp. 3461–3480, 2018.
[29] H. Wei, M. Xu, and G. Ge, “Robust network function computation,” IEEE Transactions on Information Theory, vol. 69, no. 11, pp.
7070–7081, 2023.
[30] N. Woolsey, R.-R. Chen, and M. Ji, “Coded elastic computing on machines with heterogeneous storage and computation speed,” IEEE
Transactions on Communications, vol. 69, no. 5, pp. 2894–2908, 2021.
[31] S. Yang, R. W. Yeung, and C. K. Ngai, “Refined coding bounds and code constructions for coherent network error correction,” IEEE
Transactions on Information Theory, vol. 57, no. 3, pp. 1409–1424, 2011.
[32] M. Ye and E. Abbe, “Communication-computation efficient gradient coding,” in International Conference on Machine Learning. PMLR,
2018, pp. 5610–5619.
[33] ——, “Communication-computation efficient gradient coding,” arXiv:1802.03475, 2018.
[34] R. Yeung and N. Cai, “Network error correction, I: Basic concepts and upper bounds,” Communications in Information and Systems, vol. 6,
no. 1, pp. 19–36, 2006.
[35] Z. Zhang, “Linear network error correction codes in packet networks,” IEEE Transactions on Information Theory, vol. 54, no. 1, pp.
209–218, 2008.

You might also like