0% found this document useful (0 votes)

42 views25 pages

Capacity of Dynamical Storage Systems: Ohad Elishco Alexander Barg

1. The document introduces a dynamical model of distributed storage systems where storage nodes fail independently according to Poisson processes. 2. It studies the time-average capacity of networks where a subset of nodes support higher repair bandwidth than others. Node failures generate random node permutations, and the network state is modeled as a Markov random walk on permutations. 3. The main result shows the capacity of such dynamical networks can be increased compared to static worst-case models, while maintaining the same average repair bandwidth. Estimates of this capacity increase are also derived.

Uploaded by

Gaston GB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views25 pages

Capacity of Dynamical Storage Systems: Ohad Elishco Alexander Barg

Uploaded by

Gaston GB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

1

Capacity of dynamical storage systems

Ohad Elishco Alexander Barg

Abstract
We introduce a dynamical model of node repair in distributed storage systems wherein the storage nodes
are subjected to failures according to independent Poisson processes. The main parameter that we study is the
time-average capacity of the network in the scenario where a fixed subset of the nodes support a higher repair
arXiv:1908.09900v1 [cs.IT] 26 Aug 2019

bandwidth than the other nodes. The sequence of node failures generates random permutations of the nodes
in the encoded block, and we model the state of the network as a Markov random walk on permutations of
n elements. As our main result we show that the capacity of the network can be increased compared to the
static (worst-case) model of the storage system, while maintaining the same (average) repair bandwidth, and
we derive estimates of the increase. We also quantify the capacity increase in the case that the repair center
has information about the sequence of the recently failed storage nodes.

I. Introduction
The problem of node repair based on erasure coding for distributed storage aims at optimizing the tradeoff
of network traffic and storage overhead. In this form it was established by [8] from the perspective of network
coding. This model was generalized in various ways such as concurrent failure of several nodes [6], heterogeneous
architecture [2], [17], cooperative repair [13], and others. The existing body of works focuses on the failure of
a node (or several nodes) and the ensuing reconstruction process, but puts less emphasis on the time evolution
of the entire network and the inherent stochastic nature of the node failures. The static point of view of the
system and of node repair leads to schemes based on the worst case scenario in the sense that the amount
of data to be stored is known in advance, the amount of data each node transmits is known, and the repair
capacity is determined by the least advantageous state of the network. Switching to evolving networks makes
it possible to define and study the average amount of data moved through the network to accomplish repair,
and may give slightly more comprehensive view of the system.
Several models of storage systems have been considered in the literature. The basic model of [8] assumes
that the amount of data that each node transmits to the repair center is fixed. The analysis of the network
traffic and storage overhead relies on [1] which quantifies the maximum total amount of data (or flow) that
can arrive at a specific point, but does not specify the exact amount of data that each node should transmit at
each time instant. To use the communication bandwidth more efficiently, we assume the amount of data that
each node transmits changes over time, while the total amount of communicated information averaged over
multiple repair cycles is fixed.
A similar idea appears, although not explicitly, in [16], where the authors propose to perform repair of several
failed nodes within one repair cycle with the purpose of decreasing the network traffic. The decrease can occur if
the information sent over a particular link can be used for repair of more than one node, thereby decreasing the
repair bandwidth. This scheme, which the authors called “lazy repair,” views the link capacity as a resource in
network optimization, which in general terms is similar to the underlying premises of our study. A related, more
general model of storage that accounts for time evolution of the system, given in [15], attempts to optimize
tradeoffs between storage durability, overhead, repair bandwidth, and access latency. Coding for minimizing
latency has been considered on its own in a separate line of works starting with [12]. We refer to [4] for an
overview of the literature where access latency is considered in the framework of queueing theory.

This paper was presented in part at the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, July
2019.
O. Elishco is with Institute for Systems Research, University of Maryland, College Park, MD 20742, email [email protected]. His
research is supported by NSF grant CCF 1814487.
A. Barg is with Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland,
College Park, MD 20742 and also with Institute for Problems of Information Transmission (IITP), Russian Academy of Sciences,
127051 Moscow, Russia. Email: [email protected]. His research was supported by NSF grants CCF1618603 and CCF1814487.
2

To further motivate the dynamical model, recall that cloud storage systems such as Microsoft Azure or
Google file system encode information in blocks. The information to be stored is accumulated until a block is
full, and then the block is sealed: the information is encoded, and the encoded fragments are distributed to
storage nodes [7], [14]. This implies that a storage node contains encoded fragments from several different
blocks and that the sets of storage nodes corresponding to different blocks may intersect partially. Therefore, a
storage node may participate in recovery of several failed nodes simultaneously, which implies that the capacity
of the link between the node and the repair center can be considered as a shared resource.
In this work we make first steps toward defining a dynamical model of the network with random failures. The
prevalent system model assumes homogeneous storage under which the links from the nodes to the repair center
all have the same capacity. We immediately observe that the dynamical approach does not yield an advantage
in the operation or analysis of this model. For this reason we study storage systems such that the network is
formed of two disjoint groups of nodes with unequal (average) communication costs, which was proposed in
the static case in [2]. We show that, under the assumption of uniform failure probability of the nodes, it is
possible to increase the size of the file stored in the system while maintaining the same network traffic. This
means that, while in [2] the node transmits the same amount of data each time that there is a failure, in our
model the same node will transmit the same amount of data in the average over a sequence of repair cycles
(the time). In addition, we provide a simple scheme that increases the size of the stored file compared to the
static model, and study state-aware dynamical networks in which the repair center has causal knowledge of
the sequence of the failed nodes. The idea of time averaging is motivated by the assumption that the network
exhibits some type of ergodic behavior whereby the expected capacity can be related to minimum cut averaged
over time in a sample path of the network evolution.
In Section II we present the dynamical model and give a formal definition of the storage capacity. The evolution
of the network is formalized as a random walk on the set of node permutations. Using this representation, we
argue that it suffices to limit oneself to discrete time. We also prove the basic relationship between the storage
capacity of a continuous-time network and the time-average min-cut of the corresponding discrete-time network
(Sec. II-D). The main results of this paper are collected in Sec. III where we derive estimates of the average
capacity of the fixed-cost storage model. We examine two approaches toward estimating the capacity. The
first of them is related to a specific transmission protocol while the second relies on an averaging argument.
In Section IV we analyze state-aware networks and extend the ideas of the previous section to obtain a lower
bound on their capacity. Finally, in Section V we consider the case of different failure probabilities of the nodes
and establish a partial result regarding a lower bound on capacity.

II. Model Definition

In this section we define a storage network that evolves in time and describe the basic assumptions that
characterize this evolution. We also define a sequence of information flow graphs, which enables us to define
capacity of a randomly evolving network.

A. Evolution of the network

A storage network is a set of data storage units that save information (“file”) with the purpose of being able
to retrieve it at a later time. The file is partitioned into fragments placed on different storage drives or nodes
of the system. Node failures occur regularly, and to maintain the integrity of the data, the file is encoded using
an erasure-correcting code. This incurs a penalty in terms of both the storage overhead and increased network
communication and delay in the course of repair of the failed nodes. Once a node has failed, the system initiates
the reconstruction process in the course of which the centralized computing unit (CU) downloads information
from a subset of functional storage nodes and performs the recovery of the data stored on the failed node.
The amounts of data downloaded from the different helper nodes to the CU vary over time, and are selected
with the objective of minimizing the repair bandwidth. Thus, the sequence of node repairs is a time-dependent
process which accounts for the time evolution of the network model in terms of the information flow graph.
Apart from node repair, the system also performs the operation of data collection (reading the file). This
operation is performed by Data Collector (DC) which contacts storage nodes that allow the retrieval of the
3

data. Since the file is encoded with an erasure-correcting code, the DC can retrieve the file by contacting a
subset of the storage nodes.
Let us give a formal description of our storage network model. A storage network is a pair (N , β) where N
is a triple N = (V, DC, CU) in which V is a set of n nodes (storage units) V = {v1 , . . . , vn }, DC is the data
collector node, and CU is the centralized computing unit node. The real nonnegative vector β = (β1 , . . . , βn )
gives the maximum average amount of data communicated from vi for the node repair, and will be discussed
in more detail below.
Every node vi , i ∈ [n] , {1, 2, . . . , n} has the ability to store up to α symbols over some finite alphabet F .
To store a file of size M we divide it into k information blocks viewed as (M/k)-dimensional vectors over F.
These information blocks are then encoded with an (n, k) code C. The coordinates of the codeword are vectors
over F, and each coordinate is stored in its own storage node in V. To read the file, the DC accesses at least
k nodes, obtaining the information stored in them, and retrieves the original file.
The time evolution of the storage network is related to a random process of node failures. We begin our
study assuming that the time is continuous starting at t = 0, when the encoded file is stored in the network.
The time instances t1 , t2 , . . . indicate consecutive node failures. Let s = (s1 , s2 , . . . ) ∈ V ∞ be the sequence
of failed nodes, where sj is the node that fails at time tj . We assume that in order to restore the data to a
failed node (reconstruct the node), the CU contacts a group of storage nodes, called helper nodes, accesses
some of the data stored on them, and uses this data to accomplish the recovery. In this work we assume that
CU contacts all the nodes except the failed node, i.e., we assume that the number of helper nodes is n − 1.
Further, we assume that the definition of the storage network includes a set of parameters βi , i = 1, . . . , n,
where βi is the maximum amount of information that is downloaded from vi to CU for node repair, averaged
over the time instances ti . Specifically, it is assumed that node vi provides hj (vi ) symbols of F for the repair
of node sj and that lim supℓ→∞ 1l lj =1 hj (vi ) 6 βi for all i . The case of hj (vi ) = βi will play a special role, and
P

we introduce a notation for it: Let

(
∗ βi for all j : sj 6= vi
hj (vi ) = (1)
0 for j ∈ N : sj = vj .
We will also write h, h∗ to refer to the infinite sequences {hj }j , {hj∗ }j , respectively.
Note that by definition, the weight function h∗ does not achieve β with equality. This is because hj∗ (vi ) = 0
whence sj = vi . Thus, it is possible to increase the maximum file size that can be stored by taking hj (vi ) =
1
(1 + n−1 )hj∗ (vi ). However, this increment in the file size will also increase the repair bandwidth. Although, for
simplicity, the results in this paper are compared to the constraints β, it is straightforward to compare the
1
results to the constraints (1 + n−1 )β.
Given N and a sequence s = (s1 , s2 , . . . ) of nodes, we define a sequence of directed graphs {Xjs }j ∈N , called
information flow graphs, where Xjs corresponds to t ∈ [tj , tj +1 ) and is a subgraph of Xjs+1 . When no confusion
occurs, we will write Xj . The sequence of information flow graphs is a formalization of the notion of a (time-
evolving) information flow graph that appears in the foundational paper [8]. For the ease of description (and
in accordance with [8]), we introduce a new node ṽ which is called the source node.
Definition:
1: Let V0 = V ∪ ṽ and put X0 = (V0 , E0 ), i.e., all the nodes in N and the source node ṽ , with edges
E0 = {(ṽ , vi ) : i ∈ [n]} .
The nodes in the set A0 := V are called the active nodes of the graph X0 . We define A−1 = {ṽ }.
2: Suppose that s1 = vi1 , i1 ∈ [n] and define a new node (newcomer) vi11 . The graph X1 = (V1 , E1 ) is formed as
follows:
V1 = V0 ∪ {CU1 , vi11 }
E1 = E0 ∪ {(vj , CU1 ), j ∈ [n]\{i1 }} ∪ (CU1 , vi11 ).
The set of active nodes of X1 is defined as A1 := (A0 \{vi1 }) ∪ {vi11 }.
4

′
3: Suppose we are given the graph Xj −1 , j > 2. Suppose that sj = vij and consider the corresponding node vijj
in Xj −1 for some j ′ < j. Define a new node vijj and define Xj (Vj , Ej ) as follows:
Vj = Vj −1 ∪ {CUj , vijj }
′
Ej = Ej −1 ∪ {(u, CUj ) : u ∈ Aj −1 \{vijj }} ∪ (CUj , vijj ).
′
The set of active nodes of Xj is defined as Aj = (Aj −1 \{vijj }) ∪ vijj .
The sequence of information flow graphs is an important tool used to represent the time evolution of the network.
Each graph in the sequence accounts for a new node failure, and also records the information regarding all the
past failures that occurred from the t = 0 time up to the time tj +1 . For a given j, the information for the repair
of sj is communicated over the edges in the graph Xj , wherein the edge (viℓ , CUj ) carries hj (vi ) symbols of F ,
the index ℓ < j corresponds to the last instance when the node vi has failed.
We will sometimes write (N , β, s, t, h) to denote a network (N , β) with the sequence of failed nodes s, the
sequence of failure times t, and a sequence of functions {hj }j .
In our model, the evolution of the network is random. We represent this evolution by assuming that the failure
of each node is a Poisson arrival process with rate λ, and these arrivals occur independently for different nodes.
The interarrival time between two failures of a specific node v ∈ V is an exponential random variable with pdf
λe −λt . Since node failures are independent, the overall rate of node failures in the system is a Poisson process
with parameter nλ. This implies that we can formulate the network time evolution as follows. Let (X1 , X2 , . . . )
be a sequence of i.i.d. randomP variables with pdf fX (t) = nλe −nλt . Let T = (T1 , T2 , . . . ) be the sequence of
failure times defined as Tj = ji =1 Xi for j ∈ N and let S = (S 1 , S 2 , . . . ) be the sequence of failed nodes
defined as a sequence of i.i.d. random variables distributed uniformly over [n]. Note that with probability zero
the values Tj can be infinite. Denote by µ1 the infinite direct power of the uniform distribution on V and by
µ2 the infinite power of the exponential distribution on [0, ∞). We will assume that the sequence (S, T ) is
distributed according to µ1 × µ2 .

B. Data retrieval and network capacity

To retrieve the file, the DC contacts k or more storage nodes and reads the information stored on them.
Assume that the read request occurs at time t ∈ [tj , tj +1 ) for some j ∈ N. In the information flow graph
Xj , the reading process amounts to introducing a new node DCj with at least k incoming edges. Each edge
originates in an active node. The set of these in-neighbors of DCj is denoted by Dj ⊆ Aj , |Dj | > k. The edges
{(v , DCj ) : v ∈ Dj } have infinite weight. If the (n, k) code used to encode the file has distance d, i.e., it is an
(n, k, d) code, DCj can contact any set of k ′ := n − d + 1 active nodes to retrieve the file. Specifically, for
MDS codes, any set of k active nodes will suffice.
We are interested in the storage capacity of the network which is the maximum size of the file that can be
stored in the network and be retrieved at any time while satisfying the average bandwidth constraints given by
β. Before defining the storage capacity, we need the following definition.
Definition 1 Let (N , β, s, t, h) be a storage network with a sequence of functions {hj }j . The h-capacity of
N , denoted by caph (N ), is the maximum file size that can be saved on the network N and retrieved at any
time.
Example 1 Let (N , β, s, t, h∗ )Pbe a storage network and assume βi = β0 for all i ∈ [n]. By the result in [8],
′
the h∗ -capacity is caph∗ (N ) = ki =1 min {(n − i )β, α}.
Note that the h∗ -capacity expression contains a minimum. In order to simplify notation, we assume throughout
that α is large enough, i.e., the storage nodes can contain any amount of information. This assumption allows
us
Pkto remove the minimum in the capacity expression. In the previous example, we obtain that caph∗ (N ) =
′

i =1 (n − i )β.
We now define the storage capacity.
Definition 2 Let (N , β, s, t) be a storage network and let H denote the set of all sequences of functions h
that satisfy the constraints given by β. The storage capacity (or just capacity) of (N , β, s, t), denoted by
cap(N ), is defined as
cap(N ) = sup caph (N ).
h∈H
5

The random evolution of the network makes the sequence of failed nodes a sequence of random variables
which we henceforth denote by S. This makes cap(N ) a random variable as well. As such, we will analyze the
expected value of the storage capacity which is defined as follows.
Definition 3 Let (N , β, S, t) be a (random) storage network. The expected capacity is defined as
cap(N ) , E [cap(N )] .
For any realization s of S, the storage capacity of a network can be calculated using the sequence of
information flow graphs {Xj }j . Indeed, let (N , β, s, t, h) be a storage network with a corresponding sequence of
information flow graphs {Xj }j . For a time t ∈ [tj , tj +1 ], j ∈ N, let Dt denote a selection of k ′ = n − d + 1 nodes
from Aj (from which the entire file can be retrieved). As shown in [8], the storage capacity of the network is
equal to the minimum cut between ṽ and Dt . In other words, the maximum file size that can be reliably stored
in the network and retrieved at time t is equal to the minimum cut between ṽ and Dt .
In this work, we consider the time-average minimum cut (as defined below) and hence we definethe minimum
Sj −1
cut differently. Let us denote by Cth (Dt ) the value of the minimum cut in Xj between i =−1 Ai \ Aj and Dt
h
under the weight assignment h. Further, let Ct denote the minimum cut over all selections Dt ,
Cth = Cth (Aj ) ,
h
min ′
C t (D t ) . (2)
Dt ⊆Aj , |Dt |=k
∗
∗
When h = h we will sometimes write Ct instead of Cth . In this definition we again assume that the DC is not
aware of the state of the network, i.e., the order of the failed nodes, and the minimum accounts for the worst
case. If the DC can choose which nodes to contact, the minimum should be replaced with a maximum.
Definition 4 Let (N , β, s, t, h) be a storage system. Define the average cut as
1 t h
Z
h
Cavg (N ) , lim sup Cτ dτ.
t→∞ t 0
Note that the average cut is a function of s. Hence, if the network is random s = S, then the average cut
is a random variable. As shown in the following lemma, for a storage system (N , β, s, t, h∗ ), the average cut
h∗
Cavg (N ) can be used to bound below the capacity of the network N .
Lemma 1 Let (N , β, s, t) be a storage system. Assume also that for any n − k ′ nodes vi1 , . . . , vin−k ′ ∈ V ,
′
n−k
X ∗ h∗
βij > max Cth (π) − Cavg

. (3)
π∈Sn
j =1

If every node fails infinitely often, then

h ∗
cap(N ) > Cavg (N ). (4)
Remark: Eq. (4) in effect states that there exists a weight assignment h ∈ H that caph (N ) is at least the size
of the average cut under h∗ .
Proof: We prove the lemma by describing an algorithm for weight selection. In order to simplify the notation,
for t ∈ [tj , tj +1 ) we use the subscript j instead of t, for example, we write Cj instead of Ct . Let j1 be the first
h∗
occurrence when Cj1 < Cavg . If j1 is infinite, then there is nothing to prove; otherwise, for j < j1 take hj = hj∗ . Let
vi be the node that failed at time t ∈ [tj1 , tj1 +1 ). For every vℓ 6= vi define hj1 (vℓ ) = hj∗1 (vℓ ) + ε0 for ε0 > 0 such
h∗
that the minimum cut Cjh1 (N ) = Cavg (N ). For time tj1 +1 again find ε1 such that for every vℓ ∈ Aj1 +1 , taking
h∗
hj1 +1 (vℓ ) = h∗ + ε1 yields Cjh1 +1 (N ) = Cavg (N ). Note that ε1 can be both positive and negative. Continue this
P ′ ∗
h∗
way to define hj1 +r , r > 2. The condition n−k

j =1 βij > maxπ∈Sn Ct (π) − Cavg assures that εr > − mini ∈[n] βi
h
(there are no steps in which a node transmits negative number of symbols). By construction, the average
number of symbols that node vi transmits is βi . Moreover, at any time instance tℓ , the file can be reconstructed
from any selection Dℓ of k ′ storage nodes. Hence, the minimum cut between ṽ and Dℓ (the storage capacity)
h∗
is at least Cavg (N ).
We note that the physical interpretation of the assumption given in (3) is that any set of n − k ′ nodes contain
more “new information” than the information unavailable due to the node failure. Throughout the paper we
assume that (3) holds true.
h∗
We will establish a more detailed lower bound on cap(N ) in terms of Cavg (N ) in Sec. II-D below.
6

C. Network evolution as a sequence of permutations

In this subsection we define a set of permutations related to the sequence of failed nodes s = (s1 , s2 , . . . ).
Each node in Aj is denoted by vijj for some j, and it can be identified with the node vi ∈ V . Therefore, if at
some point tj0 all the nodes have failed at least once, then for j > j0 the order in nwhich the nodesoin Aj have
failed can be identified with a permutation of the set [n]. For example, if Aj = v1j1 , v2j2 , . . . , vnjn , then the
corresponding permutation π is such that π(i ) 6 π(ℓ) iff viji , vℓjℓ ∈ Aj with ji 6 jℓ . Below we identify V and [n]
and consider πt , t > tj0 as a permutation of either of these sets as appropriate. We denote by Sn the set of
all permutations of [n]. The permutations πt are associated with the sequence of the information flow graphs
(Xj ), and we call πt the associated permutation (at time t). Note that the associated permutation πt ∈ Sn
corresponds to the order of the n most recent node failures. Hence, for t ∈ [tj , tj +1 ) we will sometimes write
πj instead of πt to refer to the associated permutation. Observe that the minimum cut Ct (Aj ) is a function
of the associated permutation πt for every t ∈ [tj , tj +1 ) and j > j0 , so we can write Cth (Aj ) = Cth (πt ).
It is possible to obtain πt , t > tj0 from s by considering only the last appearance of each node as seen in the
next example.
Example 2 Assume that |V | = 5 and assume that s = (v1 , v2 , v3 , v4 , v5 , v2 , v1 , v5 , . . . ) with t = (1, 2, 3, . . . ).
Then πt , t ∈ [1, 5) is not defined and πt = (v1 , v2 , v3 , v4 , v5 ) = i d for t ∈ [5, 6) since all the nodes had failed by
t = 5. At t = 6 the node v2 fails again, hence the new order is given by πt = (v1 , v3 , v4 , v5 , v2 ) for t ∈ [6, 7), i.e.,
vπt (1) = v1 , vπt (2) = v3 and so on. This is because the second node appears twice in s16 and we consider only the
last appearance. Following the same reasoning, πt = (v3 , v4 , v5 , v2 , v1 ) for t ∈ [7, 8) and πt = (v3 , v4 , v2 , v1 , v5 )
for t ∈ [8, 9).
We remark that if πt is an associated permutation, then vπt (i ) denotes the node that appears in the i th location
and πt−1 (i ) denotes the location of the node vi . In Example 2, we have π6 (2) = 3 since v3 occupies the second
position, and π6−1 (2) = 5.
Now suppose that the evolution of the network is random and let tj0 be the time by which all the nodes fail
at least once. The next lemma shows that such j0 exists almost surely and that the associated permutation
πtj0 is uniformly distributed on Sn .
Lemma 2 Let (S, T ) = ((S i , Ti ), i > 1) be an infinite sequence distributed according to µ1 × µ2 . Then almost
surely, there exists a finite t0 ∈ R such that all the nodes have filed at least once by t0 . Moreover, πt0 is
distributed uniformly on Sn .
Proof: We denote by t0 the first time instance when all the nodes have failed at least once and note that
t0 is a stopping time for the sequence (Ti ). Under our model, the failures of a node are independent of other
nodes and defined as a Poisson arrival process. For each node v , the probability that the node has not failed
up to time t is e −λt . Thus, we obtain
n
Y n
Pr(t0 6 t) = 1 − e −λt = 1 − e −λt .

i =1

This proves the finiteness claim. The uniform distribution of πt0 follows by symmetry.
Since t0 6 ∞ a.s., the time t ′ = t − t0 is well defined. Consider a continuous-time Markov chain X(t ′ ) with
the state space Sn constructed as follows. Let l ∈ [n] and let τl = (l , n, n − 1, . . . , l + 1) be a permutation (in
cycle notation) that moves entry l to the last position, and shifts everything to the right of l one step to the
left. Then P (π → σ) = 1n if and only if σ = τl ◦ π for some l , and P (π → σ) = 0 for all other pairs π, σ.
Let N(t ′ ) be the number of nodes that failed until time t ′ . This is a Poisson counting process with rate nλ,
i.e., N(t ′ ) ∼ Poi(nλ). At time t ′ = 0, X(0) is chosen uniformly at random. For t ′ > 0 define X(t ′ ) = πN(t ′ ) ,
where π with an integer index is defined above before Example 2. Due to the memoryless property of the
exponential distribution, we obtain that X(t ′ ) is indeed a Markov chain.
Next note that X(t ′ ) is positive recurrent since the discrete-time chain on Sn defined by the kernel P is
recurrent and the expected return time to a state in X(t ′ ) is finite for any state in Sn . For a positive recurrent
7

continuous-time Markov chain, the limiting probability distribution µ is unique, exists almost surely, and is given
by
1 τ 1
Z
µ(π) = lim 1 ′ ′
π (X(t )) dt = (5)
τ→∞ τ 0 nλE [(π → π)]
where (π → π) is the time to return to state π starting from π (See, for example, [10, p. 332]). In our model,
E [(π → π)] does not depend on π. In words, (5) implies that for t large enough, the time that the network
spends in each state is almost the same. We use this fact next to find an upper bound for the capacity.
Lemma 3 Let (N , β, S, t, h) be a storage network, where S is a (random) sequence of failed nodes and h is a
weight function satisfying the constraints given by β. Assume also that hj is a function of the last failed node,
i.e., if S j = vℓ then hj = hvℓ . Then almost surely
n
k ′ (2n − k ′ − 1) 1 X
cap(N ) 6 βi . (6)
2 n
i =1
Proof: The capacity of N is equal to the minimum weight under h of a cut between ṽ and DCt where DCt
can connect to any set of k ′ nodes from At . Assume that the set of weight functions is given by {hv }v ∈V . Since
there is a weight function for every node v , we will denote by hv (u) the weight that hv assigns to the edge
(u, CU). Let tj0 be the first time instance by which all the nodes have failed at least once. According to Lemma
2, tj0 is almost surely finite. By (5) we may assume that all permutations appear as associated permutations
with equal probability.
Let D := vi1 , . . . , vik ′ ⊂ V and assume that the associated permutation π is such that π −1 (i1 ) 6 π −1 (i2 ) 6

. . . 6 π −1 (ik ′ ). Then the weight of the cut between ṽ and D is at most [8]
′
k X ℓ−1
X X
Cth (D) 6 hvi1 (v ) − hviℓ (vir ) . (7)
ℓ=1 v ∈V \viℓ r =1

Since cap(N ) is the minimum weight value of a cut, we can bound it above by the average weight:
1 X X
cap(N ) 6 C h (D), (8)
n! k ′ D⊆V π ∈S t
n

t n
|D|=k ′

For the moment let us fix D and consider how many times the term hv (u) appears on the right-hand side of
(8) as we substitute Cth (D) from (7) and evaluate the sum on πt . If both u, v ∈ D then this term appears for
those πt in which v appears after u and does not (is canceled in (7)) if v precedes u. Thus, overall this term
appears n!/2 times. If v ∈ D and u 6∈ D then no cancellations occur, and the term hv (u) appears n! times.
Further, there are kn−2
′ −1 choices of D for the first of these options and kn−2
′ −1 for the second one of them.
Thus, for each pair of nodes u, v ∈ V the term hu (v ) appears in (8)
n − 2 n! 2n − k ′ − 1

n − 2 n!

n−2

+ n! =
k′ − 2 2 k′ − 1 k′ − 2 2 k′ − 1
times. Substituting this into (8) and performing cancellations, we obtain that
n
k ′ (2n − k ′ − 1) X X
cap(N ) 6 hvi (vj ).
2n(n − 1)
i =1 j 6=i
P
Since h is a weight function and since the nodes fail with equal probability, we obtain that i 6=j hvi (vj ) =
(n − 1)βj , j = 1, . . . , n. Thus,
Xn X n
X
hvi (vj ) = (n − 1) βj
i =1 j 6=i j =1

and the result follows.

Remark 1 Lemma 3 holds also for hj that is a function of the current associated permutation, i.e., hj = hπtj .
It is intuitively clear (and is confirmed by Lemmas 3 and 1) that if βi = β0 for every i ∈ [n], the fact that S is
random does not affect the storage capacity, which implies that cap(N ) is equal to the minimum cut. Hence,
in the case that βi = β0 , both cap(N ) and its expected value are given in [8].
8

D. Discrete Time Evolution

In this subsection we define a discrete time storage network, which will enable us to simplify the analysis
of the average cut in the information flow graph and of network capacity. A discrete-time storage network is
a network (N , β, s, t, h) with t = (1, 2, . . . ). For such a network with weight function h, the average cut is
defined as
l
h 1X h
Cavg (N ) = lim sup Ct (N ).
n→∞ l
t=1

For a discrete-time network (N , β, s, t, h) we will sometimes omit the time notion t. Also, when a weight
function is not specified we will omit the weight function notion and write (N , β, s). The following lemma
shows that for a random discrete-time storage network with h = h∗ , the limit superior in this definition is
almost surely a limit.
Lemma 4 Let (N , β, S, h∗ ) be a random discrete-time storage network with S = (S i , i > 1) a sequence of
independent RVs uniformly distributed on [n]. Then
l
h∗ 1X ∗
E[Cth (N )].

E Cavg (N ) = lim
t→∞ l
t=1

Proof: Let t0 be the first time instance by which all the nodes have failed at least once. Note that t0 is a
stopping time and each failed node is chosen uniformly and independently. Referring to the Coupon collector’s
problem [11, p.210], we obtain that Pr(t0 > cn log n) 6 n1−c , for every c > 1. Thus, t0 is finite almost surely.
By symmetry, πt0 is distributed uniformly on the set of all permutations. Moreover, since S i is chosen uniformly
and independently, for t > t0 we have that Pr(πt = π|π0t−1 ) = Pr(πt = π|πt−1 ), so the sequence {πt } is a
Markov chain, which is irreducible and aperiodic. Because of this, a limiting distribution µ exists, and is unique
∗
and positive. Hence, as t grows, Pr(πt ) → µ(πt ). P Together with the fact that Cth is uniformly bounded from
1 l h∗
P that∗ the limit limn→∞ l t=1 E[Ct (N )] exists.
above for all t, we obtain
Now define Xt = 1t ti=1 Cih (N ) and note that Xt is a function of S. Following the previous discussion, for
almost every S, the sequence Xt converges. Since Xt is non-negative and upper bounded for every t, by the
dominated convergence theorem we have limt→∞ E[Xt ] = E[limt→∞ Xt ] (the last limit exists a.s.), which is the
desired result.
Since t0 is almost surely finite and since πt is an ergodic Markov chain, defining the initial state to be π0 = id
does not affect the expected capacity. Hence, from now on we assume π0 = id.
The problem of finding the limiting distribution of our Markov chain on Sn is similar to the classic question
of the mixing time for the card shuffling problem called Top in at random shuffle. We use the following result
from [3, Thm.1].
Theorem 1 (Aldous and Diaconis) Consider a deck of n cards. At time t = 1, 2, . . . take the top card and
insert it in the deck at a random position. Let Qt denote the distribution after t such shuffles and let U be
the uniform distribution on the set of all permutations Sn . Then for all c > 0 and n > 2, the total variation
distance satisfies
kQn log n+cn − UkT V 6 e −c . (9)
To connect this result to our problem, we note that choosing the next failed node uniformly at random
corresponds to selecting a random card from the deck and putting in at the bottom. The mixing time of
this chain is stochastically equivalent to the mixing time of the Top in at random shuffle, and we obtain the
following lemma.
Lemma 5 Let N be a storage network with |V | = n > 2 nodes and let S be a random sequence of failed
nodes. Consider the sequence of associated permutations (πt , t > 0) where π0 = id. Then for any c > 0, n > 2
and any π ∈ Sn , 1
Pr(πn log n+cn = π) − 6 e −c .

n!
Proof: Let T > 1 be a value of the time. Consider the time-reversed sequence π̃t = πT −t , t 6 T. The
evolution of the sequence π̃t is described as follows: for any t take the last symbol πt (n) and insert it randomly
9

in the middle. Observe that Pr(πT = π) = Pr(π̃T = id|π˜1 = π).1 Now use (9) and the definition of k · kT V to
claim that for T = n log n + cn, c > 0, Pr(π̃T = id|π˜0 = π) − n! 6 e −c .
We now show that the value of the average cut in the random continuous time network can be obtained
from the value of the average cut in the random discrete-time network almost surely.
Lemma 6 Let (N1 , β, S, t, h∗ ) be a continuous-time storage network and let (N2 , β, S, h∗ ) be a discrete-time
storage network. Then
h∗ a.s h∗
Cavg (N1 ) = Cavg (N2 ).
Proof: First note that Lemmas 4 and 5 imply that if (N2 , β, S, h∗ P ) is a random discrete-time storage
h∗ 1
network, then Cavg (N2 ) is almost surely a constant, which is equal to n! π∈Sn Ct (π). On the other hand, if
(N1 , β, S, t, h∗ ) is a continuous-time storage network, then we can write
1 τ h∗
Z
h∗
Cavg (N1 ) = lim sup Ct (N1 )dt (10)
τ→∞ τ 0
1 τ X
Z
1π (πt )Cth (N1 )dt
∗
= lim sup
τ→∞ τ t0
π∈Sn
1 X τ
Z
1π (πt )Cth (π)dt
∗
= lim sup
τ→∞ τ t0
π∈Sn
∗
where t0 is the first time instance by which all the nodes have failed. Moreover, since Cth (π) is a function of
∗ ∗
π and not a function of t we denote Cth (π) by C h (π), and obtain
1 τ
Z
h∗
1π (πt )C h (π)dt
∗
X
Cavg (N1 ) = lim sup
τ→∞ τ t0
π∈Sn
Z τ
1 ∗
1π (X(t ′ ))dt
X
= lim C h (π)
τ→∞ τ t0
π∈Sn
∗
a.s
X C h (π)
=
nλE [(π → π)]
π∈Sn

where the last equality follows from (5). Since E [(π → π)] does not depent on π ∈ Sn we obtain that
1 1 h∗ h∗
nλE[(π→π)] = n! , which in turn implies that Cavg (N1 ) = Cavg (N2 ) almost surely.

We obtain the following statement which forms a basis of our subsequent derivations.
Theorem 2 Let (N1 , β, S, t) be a continuous-time storage network. Then ((µ1 × µ2 )-a.s.)
1 X h∗
cap(N1 ) > Ct (πt ).
n!
πt ∈Sn

Proof: From Lemma 1 we have that for any realization s such that every node fails infinitely often,
h∗
cap(N1 ) > Cavg (N1 ). According to Lemma 2, there exists a finite t0 by which all the nodes have failed at least
once and by (5) the stationary distribution of the permutations is uniform. This implies that almost surely,
all the nodes fail infinitely often. According to Lemma 6, if (N2 , β, S, h∗ ) is a discrete-time storage network,
h∗ h∗ h∗
1
P Cavg (N1h∗) = Cavg (N2 ). From Lemmas h4∗ and 5, Cavg
almost surely (N
h∗
2 ) is almost
1
Psurely a constant, which is
h∗
equal to n! πt ∈Sn Ct (πt ). Hence, almost surely, Cavg (N2 ) = E Cavg (N2 ) = n! πt ∈Sn Ct (πt ). Altogether
these statements imply the claim of the theorem.
From this point, unless stated otherwise, we restrict ourselves to discrete-time networks.

III. The Fixed-Cost Model

In this section we define the fixed-cost storage model and derive lower bounds on the storage capacity.
Suppose that the set of nodes is V = U ∪ L, where U = (v1 , . . . , vn1 ) and L = (vn1 +1 . . . , vn1 +n2 ) are disjoint
non-empty subsets. Suppose that the repair bandwidth of the node vi is given by
(
β1 if vi ∈ U
βi = .
β2 if vi ∈ L
10

where β1 > β2 > 0. Let C be the minimum cut of N in the static case (i.e., the worst-case weight of the cut):
∗ ∗
C , min {Cth (π)} = min {Cth }. (11)
π∈Sn t>0,
s∈V ∞

Let a , k − n1 and let us assume that a > 0 because otherwise the file reconstruction problem is trivially solved
by contacting k nodes in U. The minimum cut is given by the following result from [2]. (we cite it using our
assumptions of a > 0 and large α).
Lemma 7 Let (N , β, s, h∗ ) be a fixed-cost storage network. Then
n1 (n1 − 1)

a(a + 1)
C= β1 + n2 (n1 + a − 1) − β2 . (12)
2 2
In this section we consider a dynamical equivalent of the above model, where the sequence of node failures
S is random. Note that if n2 = 1 then k = n which implies that no coding is used in the storage network, so
we will assume that n2 > 2. To avoid boundary cases, we will also assume that n1 > 1 (the case of n1 = 1 is
not very interesting and can be handled using the same technique as below).
Expression (12) gives the size of the minimum cut in the static model of [8] and it also gives a lower bound
∗
for the cut Cth for all t and s in the dynamical model. We shall now demonstrate by example that by controlling
the transmission policy it is possible to increase the storage capacity of the (N , β, S, h) network compared to
(12).
The idea of the example is as follows. Let sj be the failed node. The number of symbols that node vi transmits
at time j for the repair depends on the jth failed node sj . If vi ∈ U, then for sj ∈ U the node vi transmits more
than β1 symbols hj (vi ) > β1 , otherwise if sj ∈ L, the node vi transmits fewer than β1 symbols. If vi ∈ L, then
vi always contributes β2 symbols.
Example 3 Let (N , β, S, h) be a storage network with n = 20, k ′ = 13, U = (v1 , . . . , v10 ), L = (v11 , . . . , v20 ),
and β1 = 2β2 . Assume that α is large enough (in this case taking α > 33.5β2 suffices). By (12), the value of
the minimum cut with h = h∗ is 214β2 , and thus the maximum file size that can be stored is M = 214β2 . The
task of node repair is accomplished by contacting 19 nodes.
Now we will show that under the dynamic model, it is possible to increase the file size by using the weight
function h defined as follows. Suppose that at time t (recall that time is discrete) a node v ∈ U has failed, i.e.
S t = v where v ∈ U, and define 
β2
 vi ∈ L
1
ht (vi ) = β1 + 20 β2 vi ∈ U \ v

0 vi = v .


If S t = v where v ∈ L, define 
 β2
 vi ∈ L \ v
9
ht (vi ) = β1 − 200 β2 vi ∈ U

0 vi = v .


A straightforward calculation of the minimum cut yields that

min Cth (π) = (214 + 2.25)β2

(13)
π∈S10

and it is obtained when π = i d and the active nodes selected are Dt = (v1 , v2 , . . . , v13 ). This shows an increase
over the static case estimate (12).
We now calculate the expected number of symbols a node transmits under h. Recall that in the random
1
model, each node has the same probability of failure which in this case equals to 20 . Let t0 denote the first
time instance by which all the nodes have failed. For every t > t0 we have that if v ∈ U then
9 1 10 9
E [ht (vi )] = (β1 + β2 ) + (β1 − β2 ) < β1
20 20 20 200
and if v ∈ L then
9 10
E [ht (vi )] = β2 + β 2 < β2 .
20 20
11

Therefore, the average amount of symbols each node transmits satisfies the constraints given by β.
The above simple procedure is not optimal in terms of the file size M: As we show below, it is possible to
construct a different transmission scheme which allows for storage of a larger-size file. Note also that the upper
bound (6) gives cap(N ) 6 235.5β2 , while the improvement of (13) over (12) is relatively minor.
Example 3 provides a procedure to construct the weight function h such that the maximum file size can be
increased. Below we generalize this idea and also explore other ways of using time evolution to increase the
storage capacity of a fixed-cost network

A. A protocol to increase capacity

In this section we construct a weight function that increases the storage capacity and analyze the increase.
The next theorem states the increase explicitly. In order to state the theorem, we need the following assumptions.
For ε1 > 0, assume that
n(n1 − 1)
β1 − β2 > ε1 . (14)
n2
Note that n(nn12−1) > 1 and that this assumption is satisfied in Example 3 above.
We now prove the following theorem which quantifies the increase of the average storage capacity over the
static case.
Theorem 3 Let (N , β, S) be a fixed-cost storage network. For any ε1 > 0 such that assumption (14) is
satisfied, the storage capacity is bounded below by
a.s. n1 (n1 − 1)
cap(N ) > C + ε1
2
where C is the static storage capacity given in (12).
To prove the theorem we define a weight function along the lines of Example 3. Let us put ht (vi ) = hU (vi )
if st ∈ U and ht (vi ) = hL (vi ) if st ∈ L where

β1 + ε1
 vi ∈ U \ sj
hU (vi ) = β2 vi ∈ L (15)

0 vi = sj ,


and
n1 −1

 β1 −
 n2 ε1 vi ∈ U
hL (vi ) = β2 vi ∈ L \ sj (16)

0 vi = sj


and 0 6 ε1 6 β1 .
We now show that the weight function h satisfies the constraints given by β.
Lemma 8 Let (N , β, S, h) be a fixed-cost storage network with h as defined above. Then h satisfies the average
constraints given by β.
Proof: Fix a node vi ∈ U and for each time instance t, let us calculate the expected number of symbols
vi transmits. Recall that vπt (n) denotes the node that failed at time t. Recall that the failures of the nodes are
uniformly distributed, so we obtain
1
n
 if j = i
n1 −1
Pr(vπt (n) = vj ) = n if j ∈ [n1 ] \ i

 n2
n otherwise.
Hence, the expected number of symbols that the node vi transmits is
n1 − 1 n1 − 1 n1 − 1

n2 n2
hU (vi ) + hL (vi ) = (β1 + ε1 ) + β1 − ε1 < β1 .
n n n n n2
12

If vi ∈ L we have 1
n
 if j = i
n1
Pr(vπt (n) = vj ) = n if j ∈ [n1 ]

 n2 −1
n otherwise.
In this case the expected number of trnasmitted symbols equals
n1 n2 − 1 n1 n2 − 1
hU (vi ) + hL (vi ) = β2 + β 2 < β2 .
n n n n
Thus, on average the number of symbols is within the allotted bandwidth.
The next two lemmas are used in the proof of Theorem 3 in order to estimate the minimum cut. The first
lemma shows that the minimum cut for any permutation πt , t > t0 is obtained when Dt ⊇ U. The second
lemma shows that the minimum cut is obtained for πt = i d.
Lemma 9 Let (N , β, s, h) be a network with h as defined above. If assumption (14) is satisfied, then for t > t0 ,
the value Cth (N ) is attained when Dt ⊇ U.
Proof: We formulate our question as a dynamic programming problem and provide an optimal policy for
node selection. Assume that πt is a fixed permutation that represents the order of the last n failed nodes. We
will consider the information flow graph Xt and show that the cut is minimized when all the nodes from U are
selected.
Consider a k ′ -step procedure which in each step selects one node from At . Each step entails a cost. Let t ′ 6 t
′
and assume that node vitt ′ ∈ At was selected. The cost is defined as the added weight values of the in-edges
′
of CUt ′ that are not out-edges of previously selected S nodes. Our goal is to choose k nodes that minimize the
t−1
total cost and hence minimize the cut between j =−1 Aj \ At and DCt .
In order to simplify notation, we write πt = (u1 , u2 , . . . , un ), i.e., ul = vπt (l ) is the storage node that appears
in the l th position in πt . Moreover, with a slight abuse of notation, if uj failed at time t ′ we will write hj (ui )
instead of ht ′ (ui ). For κ 6 k ′ consider the sub-problem in step κ − 1, where the DCt has already chosen κ − 1
nodes (ui1 , . . . , uiκ−1 ) and we are to choose the last node. Assume that the chosen nodes are ordered according
to their appearance in the permutation, i.e., i1 6 i2 6 . . . 6 iκ−1 . Let uj1 , . . . , ujm ∈ U be nodes that were not
selected up to step κ − 1, i.e.,
{uj1 , . . . , ujm } ∩ {ui1 , . . . , uiκ−1 } = ∅,
and assume also that j1 6 j2 6 . . . 6 jm . We show that choosing uj1 accounts for the minimum cut. First, we
claim that choosing uj1 minimizes the cut over all other nodes from U. Denote by Cκ−1 the total cost (or the
cut) in step κ − 1. Fix 2 6 ℓ ∈ [m] and note that since j1 6 jℓ , we can write
i1 6 . . . 6 ir1 6 j1 6 ir1 +1 6 . . . 6 irℓ 6 jℓ 6 iℓ+1 6 . . . ,
where the set of indices {i1 , . . . , ir } can be empty. Let C(j1 ) be the value of the cut once we add uj1 in the κth
step. The change from Cκ−1 is formed of the following components. First, we add the values of all the edges
from U\{uj1 } to uj1 and from L to uj1 , accounting for (n1 − 1)(β1 + ε1 ) + n2 β2 symbols. Further, we remove the
values of all the edges from the nodes ui1 , . . . , ur1 to uj1 and all the edges from uj1 to ur1 +1 , . . . uκ−1 . Overall
we obtain
Xr1 κ−1
X
C(j1 ) = Cκ−1 + (n1 − 1)(β1 + ε1 ) + n2 β2 − hj1 (uiq ) − hiq (uj1 ). (17)
q=1 q=r1 +1

Similarly, let Cjℓ be the value of Cκ if in step κ we select the node ujℓ , ℓ > 2. Following the same argument as
in (17), we obtain
rℓ
X κ−1
X
C(jℓ ) = Cκ−1 + (n1 − 1)(β1 + ε1 ) + n2 β2 − hjℓ (uiq ) − hiq (ujℓ ).
q=1 q=rℓ +1

Since hjℓ (ui ) = hj1 (ui ) and hi (uj1 ) = hi (ujℓ ) for all i ∈ [n], we have
rℓ
X
C(j1 ) − C(jℓ ) = hjℓ (uiq ) − hiq (uj1 ) .
q=r1 +1
13

For uiq ∈ U, we obtain

hjℓ (uiq ) − hiq (uj1 ) = β1 + ε1 − (β1 + ε1 ) = 0.
For uiq ∈ L, we obtain
n1 − 1
hjℓ (uiq ) − hiq (uj1 ) = β2 − β1 − ε1
n2
which is nonpositive by assumption (14). Therefore,
C(j1 ) − C(jℓ ) 6 0.
Now we show that uj1 minimizes the cut over a selection of any node ujℓ from L. We divide the argument
into 2 cases:
1) Assume that jℓ < j1 . Denote by (i1 , . . . , irℓ , jℓ , irℓ +1 , . . . , ir1 , j1 , . . . ) the indices of the selected nodes and let
C(jℓ ), C(j1 ) be the cut values if we choose ujℓ , uj1 , respectively. We have
rℓ κ−1
n1 − 1 X X
C(jℓ ) = Cκ−1 + n1 β1 − ε1 + (n2 − 1)β2 − hjℓ (uiq ) − hiq (ujℓ ).
n2 q=1 q=r +1 ℓ

On account of (17) and (14) we now obtain

rℓ r1
n(n1 − 1) X X
C(j1 ) − C(jℓ ) = −(β1 − β2 ) + ε1 + hjℓ (uiq ) − hj1 (uiq )
n2 q=1 q=1
κ−1
X κ−1
X
+ hiq (ujℓ ) − hiq (uj1 )
q=rℓ +1 q=r1 +1
rℓ
X r1
X r1
X κ−1
X
6

hjℓ (uiq ) − hj1 (uiq ) − hj1 (uiq ) + hiq (ujℓ ) + hiq (ujℓ ) − hiq (uj1 ) (18)
q=1 q=rℓ +1 q=rℓ +1 q=r1 +1

Our goal is to show that the right-hand side of (18) is nonpositive. Let 1 6 q 6 rℓ . For uiq ∈ U we have
n1 − 1
hjℓ (uiq ) − hj1 (uiq ) = β1 − ε1 − (β1 + ε1 ) 6 0
n2
and for uiq ∈ L we have
hjℓ (uiq ) − hj1 (uiq ) = β2 − β2 = 0.
Now let rℓ+1 6 q 6 κ − 1. For uiq ∈ U we have
hiq (ujℓ ) − hiq (uj1 ) = β2 − (β1 + ε1 )
and for uiq ∈ L we have
n1 − 1
hiq (ujℓ ) − hiq (uj1 ) = β2 − (β1 − ε1 ),
n2
both of which are non-positive by assumption
P 1 (14).
The remaining terms in (18) contribute rq=r

ℓ +1
hiq (ujℓ ) − hj1 (uiq ) to the value of the cut. As before, for
uiq ∈ U we have
hiq (ujℓ ) − hj1 (uiq ) = β2 − (β1 + ε1 ) 6 0
by (14), and for uiq ∈ L we have
hiq (ujℓ ) − hj1 (uiq ) = β2 − β2 = 0.
Thus, C(j1 ) − C(jℓ ) 6 0.
2) Assume that jℓ > j1 . This case is symmetric to the case jℓ < j1 and the analysis is similar.

By the principle of optimality in dynamic programming, which states that every optimal policy consists only
of optimal sub-policies [5, Ch. 1.3], we now conclude that the minimum cut is formed by first taking all the
nodes from U and then take the remaining nodes from L.
14

Remark 2 Suppose in forming the cut, we have added all the nodes from U, and there are a more nodes (from
L) to select. To minimize the value of the cut, these are nodes should be taken to be the a most recently failed
nodes from L. This is because choosing the most recently failed node vπ(n) assures that as few as possible of
the previously selected nodes contain information from vπ(n) .
To justify this formally, consider the proof of Lemma 9. Indeed, if uj1 , ujℓ ∈ L with j1 < jℓ , then
rℓ
X
C(j1 ) − C(jℓ ) = hjℓ (uiq ) − hiq (uj1 )
q=r1 +1

which is non-negative by assumption (14).

Before stating the second lemma, we need the following notation. Let π ∈ Sn and let D ⊂ V, |D| = k ′ . For
a node vj ∈ D, denote by fπ (vj ) the number of nodes in D ∩ L that appear before vj in π, i.e.,
n
1D∩L (vi ) · 1[1,π−1 (j )] (π−1 (i )).
X
fπ (v ) :=
i =1

Let Tj , j = 1, . . . , n − 1 be an adjacent transposition of π, i.e., Tj ◦ π exchanges π(j) and π(j + 1).

Lemma 10 Let (N , β, S, h) be a fixed-cost storage network with h as defined above. Let πt be a permutation
obtained at time t > t0 and let Dt be a set of k ′ active nodes selected by the DCt . Then, if assumption (14)
is satisfied, then
X X n1 − 1
Cth (πt ) = Cth (i d) + fπt (v )(β1 − β2 ) − fπt (v ) ε1 ,
n2
v ∈Dt ∩U v ∈Dt ∩U

where Cth (i d) is the minimum cut for πt = i d.

Proof: Recall that by assumption (14), β1 − β2 − n1n−1 2
ε1 > 0. We start with showing that for every
permutation πt ∈ Sn and for any j ∈ [n],
( )
n1 − 1 n1 − 1
Ct (Tj (πt )) ∈ Ct (πt ) + (β1 − β2 ) + ε1 , Ct (πt ), Ct (πt ) − (β1 − β2 ) − ε1 .
n2 n2
First, observe that if πt and σt are two permutations such that
{πt−1 (i ) : i ∈ [n1 ]} = {σt−1 (i ) : i ∈ [n1 ]},
i.e., if the nodes from U occupy the same positions in πt as in σt , then Ct (πt ) = Ct (σt ).
Let πt ∈ Sn and let Dt denote the k ′ storage nodes selected. Assume that vπt (j ) ∈ U, vπt (j +1) ∈ L and that
{vπt (j ) , vπt (j +1) } ⊆ Dt for some j ∈ [n − 1]. It is easy to see that
n1 − 1
Cth (Tj (πt )) = Cth (πt ) + (β1 − β2 ) − ε1 .
n2

On the other hand, if vπt (j ) ∈ L, vπt (j +1) ∈ U and vπt (j ) , vπt (j +1) ⊆ Dt , then
n1 − 1
Cth (Tj (πt )) = Cth (πt ) − (β1 − β2 ) + ε1 .
n2
Recall that according to Lemma 9, the set Dt which yields the minimum cut contains U. Hence, for i d,
the minimum cut is given by Ct (i d) and is obtained by selecting the first k ′ nodes, Dt = {vi : i ∈ [k ′ ]}.
Moreover, every permutation πt ∈ Sn can be obtained from i d by repeated applications of T , such that at
each application, the size of the minimum cut is not decreased.

Note that Lemma 7 is an immediate corollary of Lemmas 9 and 10. Indeed, taking the weight function h = h∗
implies that ε1 = 0 which satisfies assumption (14). Hence, the minimum cut is obtained when πt = i d and
Dt ⊇ U, and is equal to C.
Let us prove Theorem 3.
15

Proof of Theorem 3: From Lemma 9 we obtain that there exists ε1 > 0 such that assumption (14) is
satisfied and such that at each time t, the selection Dt that minimizes the cut, contains U. Lemma 10 implies
that the minimum cut is obtained for πt = i d. Taking πt = i d and Dt = {v1 , . . . , vk ′ }, it is straightforward to
check that
1 −1
nX a
X ∗ n1 (n1 − 1)
Cth (Dt ) = j(β1 + ε1 ) + n1 n2 β2 + (n2 − j)β2 = Cth (πt ) + ε1 (19)
2
j =1 j =1

which together with Theorem 2 concludes the proof.

Remark 3 The function h can be defined with an additional parameter 0 6 ε2 6 β2 such that when a node
vi ∈ U (vi ∈ L) fails, nodes from L transmit β2 − ε2 (resp., β2 + ε2 ) symbols instead of β2 symbols. This change
may increase the storage capacity even more, but requires additional assumptions on the parameters and can
be developed along the same ideas.
In conclusion, we have shown that the maximum file size that can be stored in a dynamical fixed-cost storage
network is always greater than its static counterpart. While it is always possible to choose ε1 so that (14) holds
true (e.g., ε1 = n(nn12−1) (β1 − β2 )), the capacity increase is relatively small because the allowable values of ε1 are
small as a proportion of β1 − β2 . In the next section we take an alternative approach to bounding the average
capacity.

B. The average min-cut bound on cap

We consider the same storage model as in the previous section and prove the following result.
Theorem 4 Let (N , β, S, h∗ ) be a fixed-cost storage network. Then almost surely,
cap(N ) > C + β1 −β 2 an1 n1 −1

2 n a + 1 + n−1 (a − 1) (20)
We will need the following two lemmas.
Lemma 11 Let (N , β, S) be a storage network, let 0 6 ℓ 6 min(n1 , a) and denote by Ptℓ the probability that
πt contains ℓ nodes from U in the last a = k ′ − n1 positions. As t → ∞,
−1
n1 n2 n
Ptℓ → .
ℓ a−ℓ a
n2 n−1
Proof: Assume that πt is distributed uniformly over Sn . We have Ptℓ = nℓ1 a−ℓ a . By Lemma 5, the
distribution of πt converges to the uniform distribution exponentially fast (after a certain time, the TV distance
decreases by a factor of 1/e every n time units). By the definition of the total variation distance, for every ℓ
−1
n1 n2 n t
6 e log n− n
ℓ
Pt −

ℓ a−ℓ a
which implies the lemma.
For the next lemma we need the following notation. Let Sℓn be the set of all permutations over [n] with
exactly ℓ numbers from U in the last a positions, i.e.,
Sℓn , {π ∈ Sn : |{π(n − a + 1), . . . , π(n)} ∩ [n1 ]| = ℓ} .
Given π = (i1 , . . . , in−a , in−a+1 , . . . , in ) ∈ Sn , let π c := (i1 , . . . , in−a , in , . . . , in−a+1 ).
Lemma 12 Let (N , β, S, h∗ ) be a fixed-cost storage network. Let πt be the permutation at time t and for
every ℓ, define µℓ (πt ) := Pr(πt |Sℓn ). Then,
∗ 1
lim Eµℓ [Cth (N )] > C + ℓ(a + ℓ)(β1 − β2 )
t→∞ 2
where C is given in Lemma 7.
Proof: As above, let t0 be time time by which all the nodes have failed at least once, and recall that
P (t0 < ∞) = 1. Therefore, πt (and hence, µℓ ) is well defined almost surely. From Lemma 5, we obtain that
16

for every ǫ > 0, there exists tǫ > t0 large enough such that µℓ (πt ) − |S1ℓ | 6 ǫ and therefore the limit exists

n
almost surely.
For t > tǫ consider
X 1 X 1 X
Pr(πt |Sℓn )Ct (πt ) > C t (πt ) > Ct (πt ) − ǫR,
ℓ
|Sℓn | − ǫ ℓ
|Sℓn | ℓ
πt ∈Sn πt ∈Sn πt ∈Sn

where R = maxπt ∈Sn Ct (πt ). To bound this sum below we fix the last a entries of the permutation. Since for
h = h∗ (i.e., ε1 = 0), assumption (14) is satisfied. Thus we can use Lemma 9, according to which Ct (πt ) is
minimized if n1 − ℓ entries from U appear in the first n1 − ℓ positions, followed by n2 − a + ℓ entries from L (in
any order). Fix the first n − a entries. Again according to Lemma 9, the minimum cut will be obtained when all
the ℓ nodes from U are in positions n − a + 1, n − a + 2, . . . , n − a + ℓ, and according to Lemma 10 it is equal
to Cmin := C + ℓ2 (β1 − β2 ). Also, the maximum cut will be obtained when all the ℓ nodes from U are located
in the last positions. This yields Cmax := C + ℓa(β1 − β2 ).
Let πt ∈ Sℓn be any permutation with vπt (i ) ∈ U for i ∈ {1, . . . , n1 − ℓ}. We claim that
Ct (πt ) + Ct (πtc ) = 2C + ℓ(a + ℓ)(β1 − β2 ) = Cmin + Cmax . (21)
Indeed, assume πt = π and let D be a selection of k active nodes that minimizes the cut. By Lemma 9
if there is at least one node from U in the last a places, the minimum cut will be obtained by selecting
the last a placesas a part of D. Moreover, if vi ∈ U with π −1 (i ) = n − a + m for some m ∈ [a], and
fπ (vi ) = b then vπ(1) , . . . , vπ(n−a+m) ∩ (D

∩ L) = b. Together with the fact that |D ∩ L| = a, this implies

c c −1
that {vπ(n−a+1) , . . . , vπ(n−a+m) } ∩ (D ∩ L) = b − ℓ. For π , we obtain that (π ) (i ) = n − m + 1 and

{vπc (n−m+1) , . . . , vπc (n) } ∩ L = b − ℓ which means that {vπc (1) , . . . , vπc (n−m+1) } ∩ (D ∩ L) = a − (b − ℓ).
With a slight abuse of notation, for a node vi we write π(vi ), π −1 (vi ), and (π c )−1 (vi ) to denote π(i ), π −1 (i ),
and (π c )−1 (i ), respectively. By Lemma 10 we have
X X
Ct (π) = C + fπ (v )(β1 − β2 ) > C + fπ (v )(β1 − β2 ).
v ∈D∩U v ∈D∩U
π −1 (v )∈{n−a+1,...,n}

For π c we obtain
X
Ct (π c ) > C + fπc (v )(β1 − β2 )
v ∈D∩U
(π c )−1 (v )∈{n−a+1,...,n}
X
=C + (a − (fπ (v ) − ℓ))(β1 − β2 ).
v ∈D∩U
(π c )−1 (v )∈{n−a+1,...,n}

This implies that

Ct (π) + Ct (π c ) > 2C + ℓ(a + ℓ)(β1 − β2 ).
Note that for every πt ∈ Sℓn , the permutation πtc ∈ Sℓn and that (πtc )c = πt . Thus, for every ǫ > 0, there
exists tǫ > t0 such that for every t > tǫ
X 1 1 X 1
µℓ (πt )Ct (π) > ℓ Ct (πt ) + Ct (πtc ) − ǫR > C + ℓ(a + ℓ)(β1 − β2 ) − ǫR,
ℓ
|S n | 2 ℓ
2
πt ∈Sn πt ∈Sn

which concludes the proof.

We can now complete the proof of Theorem 4.
Proof of Theorem 4: From Lemma 4 and Lemma 5, we have
t t
h ∗ a.s. 1X a.s. 1X X
Cavg (N ) = lim E[Ct (N )] = lim Pr(πr = π)Cr (π).
t→∞ t t→∞ t
r =1 r =t 0 π∈Sn
17

Observe that (Sℓn )ℓ partitions the set Sn , and we can continue as follows:
t min{a,n1 }
h ∗ a.s. 1X X X
Cavg (N ) = lim Pr(πr = π|Sℓn ) Pr(πr ∈ Sℓn )Cr (π)
t→∞ t
r =t 0 ℓ=0ℓ π∈Sn

1 X X 1}
t min{a,n
= lim Pr(πr ∈ Sℓn )Eµℓ [Cr (N )]
t→∞ t r =t
0 ℓ=0
(n1 )( n2 )
By Lemma 11, for every ǫ > 0, there is tǫ > t0 such that Pr(πr ∈ Sℓn ) − ℓ n a−ℓ 6 ǫ. Hence, for every ǫ > 0,
(a )
t min{a,n1 } n1 n2 tǫ

h∗
a.s. 1 X X ℓ a−ℓ
X
Cavg (N ) > lim n
Eµ ℓ
[C r (N )] − n 1 R ,
t→∞ t
r =t ǫ a
ℓ=0 r =t0

where R = maxπt ∈Sn Ct (πt ).Together with Lemma 12 this yields

n1 n1
n2
a−ℓ ℓ(a + ℓ)(β1 − β2 )
a.s.
h∗
X ℓ
Cavg (N ) > C + n
. (22)
a
2
ℓ=0

By Lemma 1, the right-hand side of this inequality gives a lower bound on capacity. It can be transformed to
the expression on the right-hand side of (20) by repeated application of the Vandermonde convolution formula.

Thus, we have proved that the average minimum cut (and thus, the capacity) is almost surely bounded below
by an expression which is strictly greater than C, and accounting for the dynamics of the fixed-cost network
enables one to support storage of a larger file than in the static case of [2].
To summarize the results of this section, we have proved that
a.s. n n (n − 1) β1 − β2 an1 n1 − 1 o
1 1
cap(N ) − C > max ε1 , a+1+ (a − 1) , (23)
2 2 n n−1
where the first of the bounds on the right is valid under assumption (14). To give numerical examples, let us
return to Example 3. Applying Theorem 4 to Example 3 yields cap(N ) > 214β2 + 3.7β2 . At the same time,
Theorem 3 states that the storage capacity is bounded below by 214β2 + 49 β2 , showing that the choice of h is
not always optimal. Generally, the lower bound on capacity of Theorem 3 is C + n2n 1 n2
(β1 − β2 ) and the bound of
n1 a 2
Theorem 4 is approximately C + 2n . Therefore, Theorem 4 provides a better bound on the storage capacity
√
when a is roughly above n2 .
Since the storage capacity can be increased while the average amount of symbols each node vi transmits is
at most βi , after a long period of time (for large enough t), the total bandwidth that was used for repair in
the dynamical model is equal to the total bandwidth that was used for repair in the static model.
h∗

To conclude this section, we address the question regarding the accuracy of the derived bounds on E Cavg (N ) .
In the next proposition we derive an upper bound on this quantity.
Proposition 1 Let (N , β, S, h∗ ) be a storage network. We have (µ1 -a.s.)
h ∗ an1 (a + n1 )
Cavg (N ) 6 C + (β1 − β2 ).
2n
Proof: Given π ∈ Sℓn , denote by π the permutation in which the first n2 − a + ℓ positions contain nodes
from L, the next n1 − ℓ positions contain nodes from U, and the last a positions are the same as π. Lemma 9
and Remark 2 imply that Ct (π t ) > Ct (πt ). By (21) and by Lemma 10 we obtain
Ct (π t ) + Ct (π ct ) = 2C + ℓ(a + n1 )(β1 − β2 ).
Hence,
Ct (πt ) + Ct (πtc ) 6 Ct (π t ) + Ct (πt ) = 2C + ℓ(a + n1 )(β1 − β2 )
18

which implies that

min{a,n1 }
X X X
Pr(πt )Ct (πt ) = Pr(πt |Sℓn ) Pr(Sℓn )Ct (πt )
πt ∈Sn ℓ=0 πt ∈Sℓn
min{a,n1 }
X X
= Pr(Sℓn ) Pr(πt |Sℓn )Ct (πt ).
ℓ=0 πt ∈Sℓn

For t → ∞, Pr(πt |Sℓn ) is uniform. Hence,

min{a,n1 } 1 X
∗
X
h
Cavg (N ) = Pr(Sℓn ) C t (πt )
|Sℓn |
ℓ=0 πt ∈Sℓn
min{a,n1 } P c
πt ∈Sℓn (Ct (π t ) + Ct (π t ))
X
6 Pr(Sℓn )
2|Sℓn |
ℓ=0
min{a,n1 }
X 1
= Pr(Sℓn ) C + ℓ(a + n1 )(β1 − β2 )
2
ℓ=0
n n
1 1
n2
X ℓ a−ℓ ℓ(a + n1 )(β1 − β2 )
=C+ n
,
a
2
ℓ=0

where the last equality follows from Lemma 11. By Vandermonde’s identity we obtain that almost surely
h ∗ an1 (a + n1 )
Cavg (N ) 6 C + (β1 − β2 ).
2n

Proposition 1 and Theorem 4 jointly result in the following (a.s.) inequalities for the average cut of the
fixed-cost storage network:
an1 (β1 − β2 ) n1 − 1 h∗ an1 (β1 − β2 )
(a + 1 + (a − 1)) 6 Cavg (N ) − C 6 (a + n1 ) (24)
2n n−1 2n
h ∗
where C is given in Lemma 7. For the above example, we obtain for the gap between Cavg (N ) and C an
upper bound of 9.75. Generally, the difference between the upper and lower bounds (discounting the common
1 −1)
multiplier) is (n−a)(n
n−1 . Of course this does not directly result in an upper bound on capacity of N , which
appears to be a difficult question (a loose upper bound was obtained in (6), which in the example gives a gap
of at most 21.5).

IV. Networks With Memory

Let us assume that the data collector DCt in the dynamical fixed-cost model is aware of the state of the
network; specifically, we assume that it selects the set Dt of k ′ active nodes for data retrieval with full knowledge
of the permutation πt . Under this assumption, DCt can choose the nodes that maximize the cut between itself
and Dt .
With this in mind, we give the following definition. Let (N , β, S, h) be a storage network and let Cth (Dt )
denote the cut at time t for the selection of Dt active storage nodes. Let
Ctmax,h (N ) = Ctmax,h (At ) , max {Cth (Dt )}. (25)
Dt ⊆At , |Dt |=k ′

(cf. (2)). Although the memory property does not affect the storage capacity when βi = β0 for all i ∈ [n], using
our idea of controlling the transmission policy enables us to increase the storage capacity. As a main result of
this section, we show that the capacity of the network can be increased over the non-causal model.
Recall our notation [n] = U ∪L, where |U| = n1 , |L| = n2 . Throughout this section we denote â , k ′ −n2 > 0.
The following lemma is a natural minimax analog of Lemma 7.
19

Lemma 13 Let (N , β, s, h∗ ) be a static fixed-cost storage network and let

C ′ , min C max,h (π) = min Ctmax,h .

(26)
π∈Sn t>0,
s∈V ∞

Then
â
X n2
X
C′ = n 1 β1 + n 2 β2 − i β1 + (n1 − â)β1 + n2 β2 − jβ2 . (27)
i =1 j =1

Lemma 13 can be obtained from the next lemma which is a modified version of Lemma 9, together with the
fact that every permutation appears as an associated permutation in (N , β, S) µ1 -almost surely.
∗
Lemma 14 Let (N , β, s, h∗ ) be a storage network. For t > t0 , Ctmax,h (N ) is obtained when Dt ⊇ L.
The proof of Lemma 14 is similar to the proof of Lemma 9 and is given in the appendix. Note that according
to Lemma 9, the selection that minimizes the cut at time t is the node from U that has failed before the other
nodes in U.
Remark 4 Similarly to Remark 2, from the proof of Lemma 9 it follows that after choosing the nodes in L,
we should choose the remaining â nodes in the order reversed from the order of their failure, starting with the
most recently failed node.
For a network with memory (N , β, S) we denote the average (maximum) cut and the storage capacity by
max,h
Cavg , capm (N ), respectively. The main result of this section is stated in the following theorem.
Theorem 5 Let (N , β, S) be a (random) storage network with memory. We have (µ1 -a.s.)
β1 − β2 n1 n2 â

â − 1

capm (N ) > C ′ + 2− .
2 n n−1
In this section we denote by Ŝℓn the set of all permutations over [n] with exactly ℓ elements from U in the last
â positions. To prove Theorem 5 we need the following lemma.
Lemma 15 Let (N , β, S, h∗ ) be a storage network with memory. Let πt be the permutation at time t and
assume that πt is distributed uniformly over Ŝℓn . We have
∗ 1
E Ctmax,h (N ) > C ′ + ℓ (2n2 − â + ℓ) (β1 − β2 ),

2
where C ′ is given in (26).
Proof: For any permutation π ∈ Ŝℓn , let π ∈ Ŝℓn be a permutation in which the first n1 − ℓ positions contain
only nodes from U, and the last â positions are exactly as in π. Then by Lemma 14 and Remark 4 we have
∗
Ctm (πt ) > Ctm (π t ). This implies that by fixing the last â positions in πt , we can bound Ctmax,h (πt ) below by
∗
Ctmax,h (πt ). We claim that
∗ ∗ 1
Ctmax,h (πt ) + Ctmax,h (πtc ) > C ′ + ℓ(2n2 − â + ℓ)(β1 − β2 ).
2
∗ ∗ ∗ ∗
Note that if πt ∈ Ŝℓn then πtc ∈ Ŝℓn as well. Hence, Ctmax,h (πt ) + Ctmax,h (πtc ) > Ctmax,h (πt ) + Ctmax,h (π ct ).
By Lemma 10 we obtain
∗
X
Ctmax,h (πt ) = C ′ + fπt (v )(β1 − β2 )
v ∈Dt ∩U
X
′
=C + fπt (v )(β1 − β2 )
v ∈Dt ∩U
(π t )−1 (v )∈{n−â+1,...,n}

and the same holds for π ct .

Let v ∈ Dt ∩ U with (πt )−1 (v ) ∈ {n − â + 1, . . . , n}, meaning that v is in one of the last â positions. Let
b := |L ∩ {n − â + 1, . . . , (π t )−1 (v )}|.
20

Using definition of π t ∈ Ŝℓn and Lemma 14, we now observe that fπt (v ) = n2 − (â − ℓ) + b. For π ct we have
fπct (v ) = n2 − (â − ℓ) + (â − ℓ) − b = n2 − b.
Overall we obtain
∗ ∗
X
Ctmax,h (π t ) + Ctmax,h (π ct ) = 2C ′ +

fπt (v ) + fπct (v ) (β1 − β2 )
v ∈Dt ∩U
′
= 2C + ℓ (2n2 − â + ℓ) (β1 − β2 )
which in turn implies that
∗ ∗
Ctmax,h (πt ) + Ctmax,h (πtc ) > 2C ′ + ℓ (2n2 − â + ℓ) (β1 − β2 ) .
We conclude the proof by noticing that
∗ ∗
X
E Ctmax,h (N ) = Pr(πt )Ctmax,h (πt )

πt ∈Ŝℓn
1 1 X ∗ ∗
Ctmax,h (πt ) + Ctmax,h (πtc )

=
Ŝℓ 2
n πt ∈Ŝℓn
1
> C ′ + ℓ (2n2 − â + ℓ) (β1 − β2 ).
2

We can now prove Theorem 5.

∗
Proof of Theorem 5: Consider E[Ctmax,h (N )] and note that since (Ŝℓn )ℓ partition the set Sn we have
max,h∗
X
Eµ [Cavg (N )] = Pr(πt )Ct (πt )
πt ∈Sn
min{â,n1 }
X X
= Pr(πt |Ŝℓn ) Pr(Ŝℓn )Ct (πt )
ℓ=0 πt ∈Sℓn
min{â,n1 }
X X
= Pr(Ŝℓn ) Pr(πt |Ŝℓn )Ct (πt )
ℓ=0 πt ∈Sℓn
min{â,n1 }
X ℓ(2n2 − â + ℓ)β1 ℓ(2n2 − â + ℓ)β2
> Pr(Ŝℓn ) C ′ + +
2 2
ℓ=0
â n n

X ℓ1 â−ℓ 2
ℓ(2n2 − â + ℓ)(β1 − β2 )
=C ′ + n
,
â
2
ℓ=0

where the inequality follows from Lemma 12 and the last equality follows from Lemma 11 (with â) and since
the stationary distribution of πt is the uniform distribution. The final expression is obtained by repeated use of
the Vandermonde convolution formula. The average cut bounds the storage capacity below since we can follow
∗ ∗
the same arguments as in Lemma 1 with Ctmax,h instead of Cth .
For a numerical example we return to Example 3. If at each time t, DCt chooses the k ′ nodes which yield
the maximum cut, by Theorem 5, the storage capacity is capm (N ) > C ′ + 13 31 β2 , where C ′ = 269β2 . This is
much greater than the lower bound computed earlier for the non-causal case, and in fact even breaks above
the static-case upper bound of (6).
As seen from Theorem 5, if β1 = β2 the bound below is equal to the storage capacity of the static model.
This comes as no surprise since the network is invariant under permutations of the storage nodes.
21

V. Extensions: Different failure probabilities

The dynamical model that we studied so far assumes that all the storage nodes have the same probability
of failure. In reality, this may not be the case. An interesting extension of the above results would address a
fixed-cost model in which the failure probability depends on the node (or a group to which the node belongs).
We immediately note that the assumption of different failure probabilities does not affect the storage capacity
of the static fixed-cost model.
Switching to the dynamical models, let us first assume that nodes from L fail independently with probability
p and nodes from U fail (independently) with probability q. It is possible to adjust the weight function used in
Section III to prove a lower bound on the network capacity. As before, let (N , β, S) be a fixed-cost storage
network with n storage nodes and let |U| = n1 , |L| = n2 . For st ∈ U we put ht (vi ) = hU (vi ) and for st ∈ L we
put ht (vi ) = hL (vi ), where

β1 + ε1 vi ∈ U \ sj

hU (vi ) = β2 vi ∈ L

0 vi = sj ,


and
q(n1 −1)

 β1 −
 pn2 ε1 vi ∈ U
hL (vi ) = β2 vi ∈ L \ sj

0 vi = sj


and 0 6 ε1 6 β1 . By a calculation similar to Lemma 8 it is straightforward to check that the constraints given
by β are satisfied. Moreover, the proof of Theorem 3 does not use the fact that the stationary distribution of the
associated permutations is uniform. Thus, from Lemma 9 and Lemma 10 we obtain the following statement.
Theorem 6 Let (N , β, S, h) be a fixed-cost storage network with weight function h as defined above. Assume
that the failure probability of a node from U is q > 0 and of a node from L is p > 0. Fix ε1 > 0 such that
1 −1)
β1 − β2 > qn(npn2 ε1 . The storage capacity is bounded below by
a.s. n1 (n1 − 1)
cap(N ) > C + ε1 , (28)
2
where C is given in Lemma 7.
1 3 pn2
For a numerical example consider Example 3 with q = 40 and p = 40 . Let us choose ε1 = qn(n1 −1) β2 = 16 β2 .
From (28) we now obtain
cap(N ) > (214 + 7.5)β2 ,
where as above, C = 214β2 is the value of the min-cut in the static case. As above in this paper, the assumption
on ε1 introduced in the theorem limits the increase of the network capacity. Lifting the assumption suggests
following the path taken in Theorem 4 of Sec. III-B. To implement this idea, we need to find the stationary
distribution of the Markov random walk on Sn that arises under our assumption. This is however not an easy
task, and the classic (asymptotic) results such as in [9] seem not to be of help here. We have succeeded to
perform the analysis in the simple case of n = n2 + 1, i.e., of the “upper” set formed of a single node U = {u},
and we present this result in the remainder of this section.
Suppose that the failed nodes in the sequence S are chosen independently and that Pr(S i = v ) = p if v ∈ L
and Pr(S i = v ) = q if v ∈ U. Assuming that p, q 6= 0, almost surely there exists a finite time t0 such that all
the nodes have failed at least once by t0 . Choosing the next failed node gives rise to a permutation on Sn , and
the conditional probabilities Pr(πt |πt−1 ) between the permutations are well defined and can be found explicitly.
The probabilities Pr(πt |πt−1 ) define an ergodic Markov chain with a unique stationary distribution ν.
Define a partition of Sn into n blocks Pi , i ∈ [n]. Let π ∈ Pi if and only if π −1 (u) = i . The partition (Pi )
defines an obvious equivalence relation on Sn , and |Pi | = (n − 1)! for all i .
It turns out that the stationary probabilities of equivalent permutations are the same, i.e., ν(π) depends only
on the block Pi ∋ π. The distribution ν is given in the next lemma.
For any real number r and natural number k we define kr = r (r −1)...(r −k+1)
, and put 0r = 1.

k!
22

Lemma 16 Let (N , β, S) be a dynamical storage network with n = n1 +n2 nodes, where n1 = 1. Let 0 < q 6 p
and suppose that S i , i = 1, 2, . . . are independent random variables with Pr(S i = v ) = p if v ∈ L and
Pr(S i = v ) = q if v ∈ U. Let π ∈ Pi and define the distribution
1 −1 1
1−q p −1 p −n−1+i

ν(π) = .
(n − 1)! n − 2 i −1
Then ν is the stationary distribution of the Markov chain with state space Sn .
Proof: 1. We first note that for any t, Pr (πt+1 |πt ) = q if πt+1 ∈ Pn and Pr (πt+1 |πt ) = p otherwise. This
implies that for a fixed πt ,
Xn
(n − 1)p + 1 = Pr (πt+1 ∈ Pi |πt ) = 1.
i =1

Hence, p1 > n −1 which implies that all the binomial coefficients in ν(π) are positive. Moreover, since (n −1)p =
1 − q we obtain that if π ∈ Pn then the expression for ν(π) simplifies as follows
1 −1 1
1−q p −1 p −n−1+n

ν(π) =
(n − 1)! n − 2 n−1
1
1−q p − n + 1
=
(n − 1)! n − 1
q
= . (29)
(n − 1)!
2. Let us check that ν is a probability vector. As already remarked, ν(π) > 0 for all π ∈ Sn . Obviously, if
1
π, σ ∈ Pi for some i , then ν(π) = ν(σ) = (n−1)! ν ({Pi }).
By the definition of ν we have that ν({Pi +1 }) = ν({Pi })(1 + 1−pn
pi ) for all i 6 n − 1. Therefore,

X n
X
ν(π) = ν ({Pi })
π∈Sn i =1
n−1
X
= ν ({Pi }) + (n − 1)!ν({Pn })
i =1
j
n−2 Y
X 1 − pn
= ν({P1 }) 1 + 1+ + q.
pi
j =1 i =1

Note that
j
1 − pn j + 1−pn
Y
p
1+ =
pi j
i =1

which implies that

j
n−2 Y 1
1 − pn −1
X
p
1+ = − 1.
pi n−2
j =1 i =1

1−q
1
−1−1
Since for π ∈ P1 , ν ({P1 }) = (n − 1)!ν(π) and ν(π) = (n−1)! p
n−2
, we have
1
−1
X
ν(π) = (n − 1)! p ν(π) + q = 1.
n−2
π∈Sn

PFinally let us show that ν is a stationary vector of the transition matrix. Fix t and consider the sum
π∈Sn ν(π) Pr (πt+1 = σ|πt = π). For σ ∈ Pi , this sum has exactly n non-zero terms, of which i are for
π ∈ Pi +1 and n − i for π ∈ Pi . Therefore, if σ ∈ Pi , we obtain
X pi p(n − i )
ν(π) Pr (πt+1 = σ|πt = π) = ν({Pi +1 }) + ν({Pi }).
(n − 1)! (n − 1)!
π∈Sn
23

1−pn
Since ν ({Pi +1 }) = ν ({Pi }) (1 + pi ) for i 6 n − 1, we have

1 − pn

X p
ν(π) Pr (πt+1 = σ|πt = π) = ν({Pi }) i 1 + + (n − i )
(n − 1)! pi
π∈Sn
1
= ν({Pi })
(n − 1)!
= ν(σ).
Now assume that σ ∈ Pn . We obtain
n
X 1 X
ν(π) Pr(πt+1 = σ|πt = π) = qν({Pi }) (30)
(n − 1)!
π∈Sn i =1
n−1
!
q X
= ν({Pi }) + ν({Pn }) .
(n − 1)!
i =1
Pn−1
Using the fact that i =1 ν({Pi }) = 1 − q jointly with (30), we conclude that
X q
ν(π) Pr(πt+1 = σ|πt = π) = (1 − q + q) ,
(n − 1)!
π∈Sn

recovering (29). This concludes the proof.

We can now bound the storage capacity below. Recall that Sℓn denotes the set of all permutations on [n] with
ℓ numbers from [n1 ] in the last a positions. Let us use Lemmas 10 and 16 to calculate the expected minimum
cut. In the below calculation we are taking some liberty in dealing with the conditional distribution ν(πt |Pi )
which operationally is the limiting (conditional) probability on Sn . A more rigorous approach requires defining a
1
conditional distribution for a finite time t and arguing that it approaches ν(πt |Pi ) = (n−1)! . At the same time,
the final answer below is correct as written. We proceed as follows:
∗ ∗
X X X
ν(πt )Cth (πt ) = ν(Pi )ν(πt |Pi )Cth (πt )
πt ∈Sn i ∈[n] πt ∈Pi
∗
X X
= ν(Pi )ν(πt |Pi )Cth (i d)
i ∈[n−a] πt ∈Pi
n −1 1
(1 − q) p1 − 1 p −n+i −1
X
∗
X
+ Cth (πt )
(n − 1)! n − 2 i − 1
i =n−a+1 πt ∈Pi
1 −1 1
− −

1 a
= (1 − q) p p
C
n−2 n−a
n 1 −1 1
p −1 p −n+i −1
X
+ (1 − q) (C + (i − n + a)(β1 − β2 ))
n−2 i −1
i =n−a+1

where C is given (12). To argue that this expression can be used in the lower bound on cap(N ) similar to the
bound in Theorem 4 (or in (22)) we can repeat the arguments used in the proof of Lemma 12. Then a modified
version of (22) together with the above expression for ν gives a lower bound on the capacity.
4
To give a numerical example, assume that we have n = 20 with n2 = 19, p = 95 and q = 15 . Assume also
′
that β1 = 2β2 and k = 13 (which implies that a = 12). According to Lemma 7, the capacity in the static
model is C = 150β2 . Using the results in this section, we obtain that a.s.
cap(N ) > (0.022 · 150 + 155.4)β2 = 158.7β2 .
Lemma 3 implies that cap(N ) 6 177.45β2 and thus in the above example we have obtained the capacity
increase of more that 30% of the gap between the bounds.
24

VI. Concluding Remarks

In this work we introduced a dynamical model for distributed storage systems. We provided lower bounds
on the capacity for the fixed-cost model with no memory and with memory. For the memoryless network, we
also provided a simple transmission protocol that increases the storage capacity of the network over the static
case. We did not manage to optimize the weight assignment, which is left as an open question for future work,
or to provide explicit code families that support reliable file storage while accounting for the time evolution of
the storage network. Another extension that we did not address is to consider more than two clusters of nodes
with different values of (average) repair bandwidth. It is also possible to argue that once the node has been
repaired, it is less likely to fail for a certain period of time. At this point we do not have an approach to the
analysis of this general question.

Appendix
A. Proof of Lemma 14
Assume that πt is a fixed permutation and consider the information flow graph Xt . We consider a k ′ -step
procedure which in each step selects one node from At . Let t ′ 6 t and assume the node vitt ∈ At was selected.
The cost it entails is defined as the added weight values of the in-edges of CUt that are not out-edges of
previously selected nodes. our goal is to select k ′ nodes that maximizes the cut for πt .
In order to simplify notation, we write πt = (u1 , u2 , . . . , un ), i.e., ul = vπt (l ) is the storage node that appears
in the l th position in πt . Moreover, with a slight abuse of notation, if uj failed at time t ′ we will write hj (ui )
instead of ht ′ (ui ). For κ 6 k ′ , consider the sub-problem at step κ − 1, where the DCt has already chosen κ − 1
nodes (ui1 , . . . , uiκ−1 ) and we are to choose the last node. Assume that the chosen nodes are ordered according
to their appearance in the permutation, i.e., i1 6 i2 6 . . . 6 iκ−1 . Let uj1 , . . . , ujm ∈ L be nodes that were not
selected up to step κ − 1, i.e.,
{uj1 , . . . , ujm } ∩ {ui1 , . . . , uiκ−1 } = ∅,
and assume also that j1 6 j2 6 . . . 6 jm . We show that choosing uj1 accounts for the maximum cut.
First, we show that choosing uj1 maximizes the cut over all other nodes from L. Denote by Cκ−1 the total
cost (or the cut) in step κ − 1. Fix 2 6 ℓ ∈ [m] and note that since j1 6 jℓ we may write
i1 6 . . . 6 ir1 6 j1 6 ir1 +1 6 . . . 6 irℓ 6 jℓ 6 iℓ+1 6 . . . ,
where j1 could also be 1. Let C(j1 ) be the cut value if DCt chooses uj1 in the κth step, respectively. The change
from Cκ−1 is formed of the following components. First, we add the values of all the edges from U to uj1 and
from L{uj1 } to uj1 , accounting for (n1 − 1)(β1 + ε1 )+ n2 β2 symbols. Next, for each node uiq with r1 < q 6 κ− 1,
we subtract hi∗q (uj1 ) from the cut value. Overall we obtain
r1
X κ−1
X
C(j1 ) = Cκ−1 + n1 β1 + (n2 − 1)β2 − hj∗1 (uiq ) − hi∗q (uj1 ). (31)
q=1 q=r1 +1

Following the same steps for C(jℓ ), ℓ > 2 we obtain

rℓ
X κ−1
X
C(jℓ ) = Cκ−1 + n1 β1 + (n2 − 1)β2 − hj∗ℓ (uiq ) − hi∗q (ujℓ ).
q=1 q=rℓ +1

Since uj1 , ujℓ ∈ L, we obtain that hj∗ℓ (ui ) = hj∗1 (ui ) and hi (uj1 ) = hi (ujℓ ) for all i ∈ [n], we have
rℓ
X
C(j1 ) − C(jℓ ) = (hj∗ℓ (uiq ) − hi∗q (uj1 )).
q=r1 +1

For uiq ∈ U, we obtain hjℓ (uiq )−hiq (uj1 ) = β1 −β2 > 0 and for uiq ∈ L, we obtain hjℓ (uiq )−hiq (uj1 ) = β2 −β2 = 0,
and so
C(j1 ) − C(jℓ ) > 0.
Now we show that uj1 maximizes the cut over the selection of any node ujℓ from U. We divide the argument
into 2 cases:
25

1) Assume that jℓ < j1 . Denote by (i1 , . . . , irℓ , jℓ , irℓ +1 , . . . , ir1 , j1 , . . . ) the selected nodes and let C(jℓ ), C(j1 )
be the cut values if we choose ujℓ , uj1 , respectively. We have
rℓ
X κ−1
X
C(jℓ ) = Cκ−1 + (n1 − 1)β1 + n2 β2 − hj∗ℓ (uiq ) − hi∗q (ujℓ )
q=1 q=rℓ +1

Subtracting C(jℓ ) from C(j1 ) and using (31), we obtain

rℓ
X r1
X κ−1
X κ−1
X
C(j1 ) − C(jℓ ) = β1 − β2 + hj∗ℓ (uiq ) − hj∗1 (uiq ) + hi∗q (ujℓ ) − hi∗q (uj1 )
q=1 q=1 q=rℓ +1 q=r1 +1
rℓ
X r1
X
>

hjℓ (uiq ) − hj1 (uiq ) − hj1 (uiq )
q=1 q=rℓ +1
r1
X κ−1
X
+ hiq (ujℓ ) + hiq (ujℓ ) − hiq (uj1 )
q=rℓ +1 q=r1 +1
r1
X
> β1 − β2 + (hiq (ujℓ ) − hj1 (uiq ))
q=rℓ +1

Since uiq ∈ U we have hiq (ujℓ ) − hj1 (uiq ) = 0 and for uiq ∈ L we have hiq (ujℓ ) − hj1 (uiq ) > 0, we conclude
that C(j1 ) − C(jℓ ) > 0.
2) Now assume jℓ > j1 . This case is symmetric to the case jℓ < j1 and relies on the same analysis. We omit
the details.
According to the principle of optimality [5, Ch. 1.3], every optimal policy consists only of optimal sub-policies,
and therefore we first need to choose all the nodes from U and then choose nodes from L. This completes the
proof

References
[1] R. Ahlswede, N. Cai, S. Y. R. Li, and R. W. Yeung, “Network information flow,” IEEE Trans. Inform. Theory, vol. 46, no. 4,
pp. 1204–1216, Jul. 2000.
[2] S. Akhlaghi, A. Kiani, and M. R. Ghanavati, “Cost-bandwidth tradeoff in distributed storage systems,” Computer Communi-
cations, vol. 33, no. 17, pp. 2105–2115, 2010.
[3] D. Aldous and P. Diaconis, “Shuffling cards and stopping times,” The American Mathematical Monthly, vol. 93, no. 5, pp.
333–348, 1986.
[4] A. Badita, P. Parag, and J.-F. Chamberland, “Latency analysis for distributed storage systems,” IEEE Trans. Inf. Theory,
vol. 65, no. 6, pp. 4683–4698, 2019.
[5] D. P. Bertsekas, Dynamic programming and optimal control. Belmont, MA: Athena Scientific, 2005.
[6] V. R. Cadambe, S. A. Jafar, H. Maleki, K. Ramchandran, and C. Suh, “Asymptotic interference alignment for optimal repair
of MDS codes in distributed storage.” IEEE Trans. Inf. Theory, vol. 59, no. 5, pp. 2974–2987, 2013.
[7] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci et al., “Windows
Azure Storage: a highly available cloud storage service with strong consistency,” in Proceedings of the Twenty-Third ACM
Symposium on Operating Systems Principles. ACM, 2011, pp. 143–157.
[8] A. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,”
IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
[9] L. Flatto, A. Odlyzko, and D. Wales, “Random shuffles and group representations,” The Annals of Probability, vol. 13, no. 1,
pp. 154–178, 1985.
[10] R. G. Gallager, Stochastic processes: theory for applications. Cambridge University Press, 2013.
[11] G. Grimmett and D. Stirzaker, Probability and Random Processes, 3rd ed. Oxford Univ. Press, 2001.
[12] G. Joshi, Y. Liu, and E. Soljanin, “Coding for fast content download,” in Proc. 50th Annual Allerton Conf. Commun. Control
Comput., 2012, pp. 326–333.
[13] A. M. Kermarrec, N. L. Scouarnec, and G. Straub, “Repairing multiple failures with coordinated and adaptive regenerating
codes,” in Int. Symp. on Network Coding (NetCod). IEEE, 2011, pp. 1–6.
[14] O. Khan, R. C. Burns, J. S. Plank, W. Pierce, and C. Huang, “Rethinking erasure codes for cloud file systems: Minimizing I/O
for recovery and degraded reads.” in Proc. 2012 USENIX Conf. on File and Storage Technology (FAST), 2012, 14pp.
[15] M. Luby, R. Padovani, T. Richardson, L. Minder, and P. Aggarwal, “Liquid cloud storage,” arXiv:1705.07983, 2017.
[16] M. Silberstein, L. Ganesh, Y. Wang, L. Alvisi, and M. Dahlin, “Lazy means smart: Reducing repair bandwidth costs in erasure-
coded distributed storage,” in Proceedings of International Conference on Systems and Storage. ACM, 2014, pp. 1–7.
[17] J. Y. Sohn, B. Choi, S. W. Yoon, and J. Moon, “Capacity of clustered distributed storage,” IEEE Trans. Inf. Theory, vol. 65,
no. 1, pp. 81–107, 2019.