Capacity of Dynamical Storage Systems: Ohad Elishco Alexander Barg
Capacity of Dynamical Storage Systems: Ohad Elishco Alexander Barg
Abstract
We introduce a dynamical model of node repair in distributed storage systems wherein the storage nodes
are subjected to failures according to independent Poisson processes. The main parameter that we study is the
time-average capacity of the network in the scenario where a fixed subset of the nodes support a higher repair
arXiv:1908.09900v1 [cs.IT] 26 Aug 2019
bandwidth than the other nodes. The sequence of node failures generates random permutations of the nodes
in the encoded block, and we model the state of the network as a Markov random walk on permutations of
n elements. As our main result we show that the capacity of the network can be increased compared to the
static (worst-case) model of the storage system, while maintaining the same (average) repair bandwidth, and
we derive estimates of the increase. We also quantify the capacity increase in the case that the repair center
has information about the sequence of the recently failed storage nodes.
I. Introduction
The problem of node repair based on erasure coding for distributed storage aims at optimizing the tradeoff
of network traffic and storage overhead. In this form it was established by [8] from the perspective of network
coding. This model was generalized in various ways such as concurrent failure of several nodes [6], heterogeneous
architecture [2], [17], cooperative repair [13], and others. The existing body of works focuses on the failure of
a node (or several nodes) and the ensuing reconstruction process, but puts less emphasis on the time evolution
of the entire network and the inherent stochastic nature of the node failures. The static point of view of the
system and of node repair leads to schemes based on the worst case scenario in the sense that the amount
of data to be stored is known in advance, the amount of data each node transmits is known, and the repair
capacity is determined by the least advantageous state of the network. Switching to evolving networks makes
it possible to define and study the average amount of data moved through the network to accomplish repair,
and may give slightly more comprehensive view of the system.
Several models of storage systems have been considered in the literature. The basic model of [8] assumes
that the amount of data that each node transmits to the repair center is fixed. The analysis of the network
traffic and storage overhead relies on [1] which quantifies the maximum total amount of data (or flow) that
can arrive at a specific point, but does not specify the exact amount of data that each node should transmit at
each time instant. To use the communication bandwidth more efficiently, we assume the amount of data that
each node transmits changes over time, while the total amount of communicated information averaged over
multiple repair cycles is fixed.
A similar idea appears, although not explicitly, in [16], where the authors propose to perform repair of several
failed nodes within one repair cycle with the purpose of decreasing the network traffic. The decrease can occur if
the information sent over a particular link can be used for repair of more than one node, thereby decreasing the
repair bandwidth. This scheme, which the authors called “lazy repair,” views the link capacity as a resource in
network optimization, which in general terms is similar to the underlying premises of our study. A related, more
general model of storage that accounts for time evolution of the system, given in [15], attempts to optimize
tradeoffs between storage durability, overhead, repair bandwidth, and access latency. Coding for minimizing
latency has been considered on its own in a separate line of works starting with [12]. We refer to [4] for an
overview of the literature where access latency is considered in the framework of queueing theory.
This paper was presented in part at the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, July
2019.
O. Elishco is with Institute for Systems Research, University of Maryland, College Park, MD 20742, email [email protected]. His
research is supported by NSF grant CCF 1814487.
A. Barg is with Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland,
College Park, MD 20742 and also with Institute for Problems of Information Transmission (IITP), Russian Academy of Sciences,
127051 Moscow, Russia. Email: [email protected]. His research was supported by NSF grants CCF1618603 and CCF1814487.
2
To further motivate the dynamical model, recall that cloud storage systems such as Microsoft Azure or
Google file system encode information in blocks. The information to be stored is accumulated until a block is
full, and then the block is sealed: the information is encoded, and the encoded fragments are distributed to
storage nodes [7], [14]. This implies that a storage node contains encoded fragments from several different
blocks and that the sets of storage nodes corresponding to different blocks may intersect partially. Therefore, a
storage node may participate in recovery of several failed nodes simultaneously, which implies that the capacity
of the link between the node and the repair center can be considered as a shared resource.
In this work we make first steps toward defining a dynamical model of the network with random failures. The
prevalent system model assumes homogeneous storage under which the links from the nodes to the repair center
all have the same capacity. We immediately observe that the dynamical approach does not yield an advantage
in the operation or analysis of this model. For this reason we study storage systems such that the network is
formed of two disjoint groups of nodes with unequal (average) communication costs, which was proposed in
the static case in [2]. We show that, under the assumption of uniform failure probability of the nodes, it is
possible to increase the size of the file stored in the system while maintaining the same network traffic. This
means that, while in [2] the node transmits the same amount of data each time that there is a failure, in our
model the same node will transmit the same amount of data in the average over a sequence of repair cycles
(the time). In addition, we provide a simple scheme that increases the size of the stored file compared to the
static model, and study state-aware dynamical networks in which the repair center has causal knowledge of
the sequence of the failed nodes. The idea of time averaging is motivated by the assumption that the network
exhibits some type of ergodic behavior whereby the expected capacity can be related to minimum cut averaged
over time in a sample path of the network evolution.
In Section II we present the dynamical model and give a formal definition of the storage capacity. The evolution
of the network is formalized as a random walk on the set of node permutations. Using this representation, we
argue that it suffices to limit oneself to discrete time. We also prove the basic relationship between the storage
capacity of a continuous-time network and the time-average min-cut of the corresponding discrete-time network
(Sec. II-D). The main results of this paper are collected in Sec. III where we derive estimates of the average
capacity of the fixed-cost storage model. We examine two approaches toward estimating the capacity. The
first of them is related to a specific transmission protocol while the second relies on an averaging argument.
In Section IV we analyze state-aware networks and extend the ideas of the previous section to obtain a lower
bound on their capacity. Finally, in Section V we consider the case of different failure probabilities of the nodes
and establish a partial result regarding a lower bound on capacity.
data. Since the file is encoded with an erasure-correcting code, the DC can retrieve the file by contacting a
subset of the storage nodes.
Let us give a formal description of our storage network model. A storage network is a pair (N , β) where N
is a triple N = (V, DC, CU) in which V is a set of n nodes (storage units) V = {v1 , . . . , vn }, DC is the data
collector node, and CU is the centralized computing unit node. The real nonnegative vector β = (β1 , . . . , βn )
gives the maximum average amount of data communicated from vi for the node repair, and will be discussed
in more detail below.
Every node vi , i ∈ [n] , {1, 2, . . . , n} has the ability to store up to α symbols over some finite alphabet F .
To store a file of size M we divide it into k information blocks viewed as (M/k)-dimensional vectors over F.
These information blocks are then encoded with an (n, k) code C. The coordinates of the codeword are vectors
over F, and each coordinate is stored in its own storage node in V. To read the file, the DC accesses at least
k nodes, obtaining the information stored in them, and retrieves the original file.
The time evolution of the storage network is related to a random process of node failures. We begin our
study assuming that the time is continuous starting at t = 0, when the encoded file is stored in the network.
The time instances t1 , t2 , . . . indicate consecutive node failures. Let s = (s1 , s2 , . . . ) ∈ V ∞ be the sequence
of failed nodes, where sj is the node that fails at time tj . We assume that in order to restore the data to a
failed node (reconstruct the node), the CU contacts a group of storage nodes, called helper nodes, accesses
some of the data stored on them, and uses this data to accomplish the recovery. In this work we assume that
CU contacts all the nodes except the failed node, i.e., we assume that the number of helper nodes is n − 1.
Further, we assume that the definition of the storage network includes a set of parameters βi , i = 1, . . . , n,
where βi is the maximum amount of information that is downloaded from vi to CU for node repair, averaged
over the time instances ti . Specifically, it is assumed that node vi provides hj (vi ) symbols of F for the repair
of node sj and that lim supℓ→∞ 1l lj =1 hj (vi ) 6 βi for all i . The case of hj (vi ) = βi will play a special role, and
P
′
3: Suppose we are given the graph Xj −1 , j > 2. Suppose that sj = vij and consider the corresponding node vijj
in Xj −1 for some j ′ < j. Define a new node vijj and define Xj (Vj , Ej ) as follows:
Vj = Vj −1 ∪ {CUj , vijj }
′
Ej = Ej −1 ∪ {(u, CUj ) : u ∈ Aj −1 \{vijj }} ∪ (CUj , vijj ).
′
The set of active nodes of Xj is defined as Aj = (Aj −1 \{vijj }) ∪ vijj .
The sequence of information flow graphs is an important tool used to represent the time evolution of the network.
Each graph in the sequence accounts for a new node failure, and also records the information regarding all the
past failures that occurred from the t = 0 time up to the time tj +1 . For a given j, the information for the repair
of sj is communicated over the edges in the graph Xj , wherein the edge (viℓ , CUj ) carries hj (vi ) symbols of F ,
the index ℓ < j corresponds to the last instance when the node vi has failed.
We will sometimes write (N , β, s, t, h) to denote a network (N , β) with the sequence of failed nodes s, the
sequence of failure times t, and a sequence of functions {hj }j .
In our model, the evolution of the network is random. We represent this evolution by assuming that the failure
of each node is a Poisson arrival process with rate λ, and these arrivals occur independently for different nodes.
The interarrival time between two failures of a specific node v ∈ V is an exponential random variable with pdf
λe −λt . Since node failures are independent, the overall rate of node failures in the system is a Poisson process
with parameter nλ. This implies that we can formulate the network time evolution as follows. Let (X1 , X2 , . . . )
be a sequence of i.i.d. randomP variables with pdf fX (t) = nλe −nλt . Let T = (T1 , T2 , . . . ) be the sequence of
failure times defined as Tj = ji =1 Xi for j ∈ N and let S = (S 1 , S 2 , . . . ) be the sequence of failed nodes
defined as a sequence of i.i.d. random variables distributed uniformly over [n]. Note that with probability zero
the values Tj can be infinite. Denote by µ1 the infinite direct power of the uniform distribution on V and by
µ2 the infinite power of the exponential distribution on [0, ∞). We will assume that the sequence (S, T ) is
distributed according to µ1 × µ2 .
i =1 (n − i )β.
We now define the storage capacity.
Definition 2 Let (N , β, s, t) be a storage network and let H denote the set of all sequences of functions h
that satisfy the constraints given by β. The storage capacity (or just capacity) of (N , β, s, t), denoted by
cap(N ), is defined as
cap(N ) = sup caph (N ).
h∈H
5
The random evolution of the network makes the sequence of failed nodes a sequence of random variables
which we henceforth denote by S. This makes cap(N ) a random variable as well. As such, we will analyze the
expected value of the storage capacity which is defined as follows.
Definition 3 Let (N , β, S, t) be a (random) storage network. The expected capacity is defined as
cap(N ) , E [cap(N )] .
For any realization s of S, the storage capacity of a network can be calculated using the sequence of
information flow graphs {Xj }j . Indeed, let (N , β, s, t, h) be a storage network with a corresponding sequence of
information flow graphs {Xj }j . For a time t ∈ [tj , tj +1 ], j ∈ N, let Dt denote a selection of k ′ = n − d + 1 nodes
from Aj (from which the entire file can be retrieved). As shown in [8], the storage capacity of the network is
equal to the minimum cut between ṽ and Dt . In other words, the maximum file size that can be reliably stored
in the network and retrieved at time t is equal to the minimum cut between ṽ and Dt .
In this work, we consider the time-average minimum cut (as defined below) and hence we definethe minimum
Sj −1
cut differently. Let us denote by Cth (Dt ) the value of the minimum cut in Xj between i =−1 Ai \ Aj and Dt
h
under the weight assignment h. Further, let Ct denote the minimum cut over all selections Dt ,
Cth = Cth (Aj ) ,
h
min ′
C t (D t ) . (2)
Dt ⊆Aj , |Dt |=k
∗
∗
When h = h we will sometimes write Ct instead of Cth . In this definition we again assume that the DC is not
aware of the state of the network, i.e., the order of the failed nodes, and the minimum accounts for the worst
case. If the DC can choose which nodes to contact, the minimum should be replaced with a maximum.
Definition 4 Let (N , β, s, t, h) be a storage system. Define the average cut as
1 t h
Z
h
Cavg (N ) , lim sup Cτ dτ.
t→∞ t 0
Note that the average cut is a function of s. Hence, if the network is random s = S, then the average cut
is a random variable. As shown in the following lemma, for a storage system (N , β, s, t, h∗ ), the average cut
h∗
Cavg (N ) can be used to bound below the capacity of the network N .
Lemma 1 Let (N , β, s, t) be a storage system. Assume also that for any n − k ′ nodes vi1 , . . . , vin−k ′ ∈ V ,
′
n−k
X ∗ h∗
βij > max Cth (π) − Cavg
. (3)
π∈Sn
j =1
i =1
This proves the finiteness claim. The uniform distribution of πt0 follows by symmetry.
Since t0 6 ∞ a.s., the time t ′ = t − t0 is well defined. Consider a continuous-time Markov chain X(t ′ ) with
the state space Sn constructed as follows. Let l ∈ [n] and let τl = (l , n, n − 1, . . . , l + 1) be a permutation (in
cycle notation) that moves entry l to the last position, and shifts everything to the right of l one step to the
left. Then P (π → σ) = 1n if and only if σ = τl ◦ π for some l , and P (π → σ) = 0 for all other pairs π, σ.
Let N(t ′ ) be the number of nodes that failed until time t ′ . This is a Poisson counting process with rate nλ,
i.e., N(t ′ ) ∼ Poi(nλ). At time t ′ = 0, X(0) is chosen uniformly at random. For t ′ > 0 define X(t ′ ) = πN(t ′ ) ,
where π with an integer index is defined above before Example 2. Due to the memoryless property of the
exponential distribution, we obtain that X(t ′ ) is indeed a Markov chain.
Next note that X(t ′ ) is positive recurrent since the discrete-time chain on Sn defined by the kernel P is
recurrent and the expected return time to a state in X(t ′ ) is finite for any state in Sn . For a positive recurrent
7
continuous-time Markov chain, the limiting probability distribution µ is unique, exists almost surely, and is given
by
1 τ 1
Z
µ(π) = lim 1 ′ ′
π (X(t )) dt = (5)
τ→∞ τ 0 nλE [(π → π)]
where (π → π) is the time to return to state π starting from π (See, for example, [10, p. 332]). In our model,
E [(π → π)] does not depend on π. In words, (5) implies that for t large enough, the time that the network
spends in each state is almost the same. We use this fact next to find an upper bound for the capacity.
Lemma 3 Let (N , β, S, t, h) be a storage network, where S is a (random) sequence of failed nodes and h is a
weight function satisfying the constraints given by β. Assume also that hj is a function of the last failed node,
i.e., if S j = vℓ then hj = hvℓ . Then almost surely
n
k ′ (2n − k ′ − 1) 1 X
cap(N ) 6 βi . (6)
2 n
i =1
Proof: The capacity of N is equal to the minimum weight under h of a cut between ṽ and DCt where DCt
can connect to any set of k ′ nodes from At . Assume that the set of weight functions is given by {hv }v ∈V . Since
there is a weight function for every node v , we will denote by hv (u) the weight that hv assigns to the edge
(u, CU). Let tj0 be the first time instance by which all the nodes have failed at least once. According to Lemma
2, tj0 is almost surely finite. By (5) we may assume that all permutations appear as associated permutations
with equal probability.
Let D := vi1 , . . . , vik ′ ⊂ V and assume that the associated permutation π is such that π −1 (i1 ) 6 π −1 (i2 ) 6
. . . 6 π −1 (ik ′ ). Then the weight of the cut between ṽ and D is at most [8]
′
k X ℓ−1
X X
Cth (D) 6 hvi1 (v ) − hviℓ (vir ) . (7)
ℓ=1 v ∈V \viℓ r =1
Since cap(N ) is the minimum weight value of a cut, we can bound it above by the average weight:
1 X X
cap(N ) 6 C h (D), (8)
n! k ′ D⊆V π ∈S t
n
t n
|D|=k ′
For the moment let us fix D and consider how many times the term hv (u) appears on the right-hand side of
(8) as we substitute Cth (D) from (7) and evaluate the sum on πt . If both u, v ∈ D then this term appears for
those πt in which v appears after u and does not (is canceled in (7)) if v precedes u. Thus, overall this term
appears n!/2 times. If v ∈ D and u 6∈ D then no cancellations occur, and the term hv (u) appears n! times.
Further, there are kn−2
′ −1 choices of D for the first of these options and kn−2
′ −1 for the second one of them.
Thus, for each pair of nodes u, v ∈ V the term hu (v ) appears in (8)
n − 2 n! 2n − k ′ − 1
n − 2 n!
n−2
+ n! =
k′ − 2 2 k′ − 1 k′ − 2 2 k′ − 1
times. Substituting this into (8) and performing cancellations, we obtain that
n
k ′ (2n − k ′ − 1) X X
cap(N ) 6 hvi (vj ).
2n(n − 1)
i =1 j 6=i
P
Since h is a weight function and since the nodes fail with equal probability, we obtain that i 6=j hvi (vj ) =
(n − 1)βj , j = 1, . . . , n. Thus,
Xn X n
X
hvi (vj ) = (n − 1) βj
i =1 j 6=i j =1
For a discrete-time network (N , β, s, t, h) we will sometimes omit the time notion t. Also, when a weight
function is not specified we will omit the weight function notion and write (N , β, s). The following lemma
shows that for a random discrete-time storage network with h = h∗ , the limit superior in this definition is
almost surely a limit.
Lemma 4 Let (N , β, S, h∗ ) be a random discrete-time storage network with S = (S i , i > 1) a sequence of
independent RVs uniformly distributed on [n]. Then
l
h∗ 1X ∗
E[Cth (N )].
E Cavg (N ) = lim
t→∞ l
t=1
Proof: Let t0 be the first time instance by which all the nodes have failed at least once. Note that t0 is a
stopping time and each failed node is chosen uniformly and independently. Referring to the Coupon collector’s
problem [11, p.210], we obtain that Pr(t0 > cn log n) 6 n1−c , for every c > 1. Thus, t0 is finite almost surely.
By symmetry, πt0 is distributed uniformly on the set of all permutations. Moreover, since S i is chosen uniformly
and independently, for t > t0 we have that Pr(πt = π|π0t−1 ) = Pr(πt = π|πt−1 ), so the sequence {πt } is a
Markov chain, which is irreducible and aperiodic. Because of this, a limiting distribution µ exists, and is unique
∗
and positive. Hence, as t grows, Pr(πt ) → µ(πt ). P Together with the fact that Cth is uniformly bounded from
1 l h∗
P that∗ the limit limn→∞ l t=1 E[Ct (N )] exists.
above for all t, we obtain
Now define Xt = 1t ti=1 Cih (N ) and note that Xt is a function of S. Following the previous discussion, for
almost every S, the sequence Xt converges. Since Xt is non-negative and upper bounded for every t, by the
dominated convergence theorem we have limt→∞ E[Xt ] = E[limt→∞ Xt ] (the last limit exists a.s.), which is the
desired result.
Since t0 is almost surely finite and since πt is an ergodic Markov chain, defining the initial state to be π0 = id
does not affect the expected capacity. Hence, from now on we assume π0 = id.
The problem of finding the limiting distribution of our Markov chain on Sn is similar to the classic question
of the mixing time for the card shuffling problem called Top in at random shuffle. We use the following result
from [3, Thm.1].
Theorem 1 (Aldous and Diaconis) Consider a deck of n cards. At time t = 1, 2, . . . take the top card and
insert it in the deck at a random position. Let Qt denote the distribution after t such shuffles and let U be
the uniform distribution on the set of all permutations Sn . Then for all c > 0 and n > 2, the total variation
distance satisfies
kQn log n+cn − UkT V 6 e −c . (9)
To connect this result to our problem, we note that choosing the next failed node uniformly at random
corresponds to selecting a random card from the deck and putting in at the bottom. The mixing time of
this chain is stochastically equivalent to the mixing time of the Top in at random shuffle, and we obtain the
following lemma.
Lemma 5 Let N be a storage network with |V | = n > 2 nodes and let S be a random sequence of failed
nodes. Consider the sequence of associated permutations (πt , t > 0) where π0 = id. Then for any c > 0, n > 2
and any π ∈ Sn , 1
Pr(πn log n+cn = π) − 6 e −c .
n!
Proof: Let T > 1 be a value of the time. Consider the time-reversed sequence π̃t = πT −t , t 6 T. The
evolution of the sequence π̃t is described as follows: for any t take the last symbol πt (n) and insert it randomly
9
in the middle. Observe that Pr(πT = π) = Pr(π̃T = id|π˜1 = π).1 Now use (9) and the definition of k · kT V to
claim that for T = n log n + cn, c > 0, Pr(π̃T = id|π˜0 = π) − n! 6 e −c .
We now show that the value of the average cut in the random continuous time network can be obtained
from the value of the average cut in the random discrete-time network almost surely.
Lemma 6 Let (N1 , β, S, t, h∗ ) be a continuous-time storage network and let (N2 , β, S, h∗ ) be a discrete-time
storage network. Then
h∗ a.s h∗
Cavg (N1 ) = Cavg (N2 ).
Proof: First note that Lemmas 4 and 5 imply that if (N2 , β, S, h∗ P ) is a random discrete-time storage
h∗ 1
network, then Cavg (N2 ) is almost surely a constant, which is equal to n! π∈Sn Ct (π). On the other hand, if
(N1 , β, S, t, h∗ ) is a continuous-time storage network, then we can write
1 τ h∗
Z
h∗
Cavg (N1 ) = lim sup Ct (N1 )dt (10)
τ→∞ τ 0
1 τ X
Z
1π (πt )Cth (N1 )dt
∗
= lim sup
τ→∞ τ t0
π∈Sn
1 X τ
Z
1π (πt )Cth (π)dt
∗
= lim sup
τ→∞ τ t0
π∈Sn
∗
where t0 is the first time instance by which all the nodes have failed. Moreover, since Cth (π) is a function of
∗ ∗
π and not a function of t we denote Cth (π) by C h (π), and obtain
1 τ
Z
h∗
1π (πt )C h (π)dt
∗
X
Cavg (N1 ) = lim sup
τ→∞ τ t0
π∈Sn
Z τ
1 ∗
1π (X(t ′ ))dt
X
= lim C h (π)
τ→∞ τ t0
π∈Sn
∗
a.s
X C h (π)
=
nλE [(π → π)]
π∈Sn
where the last equality follows from (5). Since E [(π → π)] does not depent on π ∈ Sn we obtain that
1 1 h∗ h∗
nλE[(π→π)] = n! , which in turn implies that Cavg (N1 ) = Cavg (N2 ) almost surely.
We obtain the following statement which forms a basis of our subsequent derivations.
Theorem 2 Let (N1 , β, S, t) be a continuous-time storage network. Then ((µ1 × µ2 )-a.s.)
1 X h∗
cap(N1 ) > Ct (πt ).
n!
πt ∈Sn
Proof: From Lemma 1 we have that for any realization s such that every node fails infinitely often,
h∗
cap(N1 ) > Cavg (N1 ). According to Lemma 2, there exists a finite t0 by which all the nodes have failed at least
once and by (5) the stationary distribution of the permutations is uniform. This implies that almost surely,
all the nodes fail infinitely often. According to Lemma 6, if (N2 , β, S, h∗ ) is a discrete-time storage network,
h∗ h∗ h∗
1
P Cavg (N1h∗) = Cavg (N2 ). From Lemmas h4∗ and 5, Cavg
almost surely (N
h∗
2 ) is almost
1
Psurely a constant, which is
h∗
equal to n! πt ∈Sn Ct (πt ). Hence, almost surely, Cavg (N2 ) = E Cavg (N2 ) = n! πt ∈Sn Ct (πt ). Altogether
these statements imply the claim of the theorem.
From this point, unless stated otherwise, we restrict ourselves to discrete-time networks.
where β1 > β2 > 0. Let C be the minimum cut of N in the static case (i.e., the worst-case weight of the cut):
∗ ∗
C , min {Cth (π)} = min {Cth }. (11)
π∈Sn t>0,
s∈V ∞
Let a , k − n1 and let us assume that a > 0 because otherwise the file reconstruction problem is trivially solved
by contacting k nodes in U. The minimum cut is given by the following result from [2]. (we cite it using our
assumptions of a > 0 and large α).
Lemma 7 Let (N , β, s, h∗ ) be a fixed-cost storage network. Then
n1 (n1 − 1)
a(a + 1)
C= β1 + n2 (n1 + a − 1) − β2 . (12)
2 2
In this section we consider a dynamical equivalent of the above model, where the sequence of node failures
S is random. Note that if n2 = 1 then k = n which implies that no coding is used in the storage network, so
we will assume that n2 > 2. To avoid boundary cases, we will also assume that n1 > 1 (the case of n1 = 1 is
not very interesting and can be handled using the same technique as below).
Expression (12) gives the size of the minimum cut in the static model of [8] and it also gives a lower bound
∗
for the cut Cth for all t and s in the dynamical model. We shall now demonstrate by example that by controlling
the transmission policy it is possible to increase the storage capacity of the (N , β, S, h) network compared to
(12).
The idea of the example is as follows. Let sj be the failed node. The number of symbols that node vi transmits
at time j for the repair depends on the jth failed node sj . If vi ∈ U, then for sj ∈ U the node vi transmits more
than β1 symbols hj (vi ) > β1 , otherwise if sj ∈ L, the node vi transmits fewer than β1 symbols. If vi ∈ L, then
vi always contributes β2 symbols.
Example 3 Let (N , β, S, h) be a storage network with n = 20, k ′ = 13, U = (v1 , . . . , v10 ), L = (v11 , . . . , v20 ),
and β1 = 2β2 . Assume that α is large enough (in this case taking α > 33.5β2 suffices). By (12), the value of
the minimum cut with h = h∗ is 214β2 , and thus the maximum file size that can be stored is M = 214β2 . The
task of node repair is accomplished by contacting 19 nodes.
Now we will show that under the dynamic model, it is possible to increase the file size by using the weight
function h defined as follows. Suppose that at time t (recall that time is discrete) a node v ∈ U has failed, i.e.
S t = v where v ∈ U, and define
β2
vi ∈ L
1
ht (vi ) = β1 + 20 β2 vi ∈ U \ v
0 vi = v .
If S t = v where v ∈ L, define
β2
vi ∈ L \ v
9
ht (vi ) = β1 − 200 β2 vi ∈ U
0 vi = v .
and it is obtained when π = i d and the active nodes selected are Dt = (v1 , v2 , . . . , v13 ). This shows an increase
over the static case estimate (12).
We now calculate the expected number of symbols a node transmits under h. Recall that in the random
1
model, each node has the same probability of failure which in this case equals to 20 . Let t0 denote the first
time instance by which all the nodes have failed. For every t > t0 we have that if v ∈ U then
9 1 10 9
E [ht (vi )] = (β1 + β2 ) + (β1 − β2 ) < β1
20 20 20 200
and if v ∈ L then
9 10
E [ht (vi )] = β2 + β 2 < β2 .
20 20
11
Therefore, the average amount of symbols each node transmits satisfies the constraints given by β.
The above simple procedure is not optimal in terms of the file size M: As we show below, it is possible to
construct a different transmission scheme which allows for storage of a larger-size file. Note also that the upper
bound (6) gives cap(N ) 6 235.5β2 , while the improvement of (13) over (12) is relatively minor.
Example 3 provides a procedure to construct the weight function h such that the maximum file size can be
increased. Below we generalize this idea and also explore other ways of using time evolution to increase the
storage capacity of a fixed-cost network
and
n1 −1
β1 −
n2 ε1 vi ∈ U
hL (vi ) = β2 vi ∈ L \ sj (16)
0 vi = sj
and 0 6 ε1 6 β1 .
We now show that the weight function h satisfies the constraints given by β.
Lemma 8 Let (N , β, S, h) be a fixed-cost storage network with h as defined above. Then h satisfies the average
constraints given by β.
Proof: Fix a node vi ∈ U and for each time instance t, let us calculate the expected number of symbols
vi transmits. Recall that vπt (n) denotes the node that failed at time t. Recall that the failures of the nodes are
uniformly distributed, so we obtain
1
n
if j = i
n1 −1
Pr(vπt (n) = vj ) = n if j ∈ [n1 ] \ i
n2
n otherwise.
Hence, the expected number of symbols that the node vi transmits is
n1 − 1 n1 − 1 n1 − 1
n2 n2
hU (vi ) + hL (vi ) = (β1 + ε1 ) + β1 − ε1 < β1 .
n n n n n2
12
If vi ∈ L we have 1
n
if j = i
n1
Pr(vπt (n) = vj ) = n if j ∈ [n1 ]
n2 −1
n otherwise.
In this case the expected number of trnasmitted symbols equals
n1 n2 − 1 n1 n2 − 1
hU (vi ) + hL (vi ) = β2 + β 2 < β2 .
n n n n
Thus, on average the number of symbols is within the allotted bandwidth.
The next two lemmas are used in the proof of Theorem 3 in order to estimate the minimum cut. The first
lemma shows that the minimum cut for any permutation πt , t > t0 is obtained when Dt ⊇ U. The second
lemma shows that the minimum cut is obtained for πt = i d.
Lemma 9 Let (N , β, s, h) be a network with h as defined above. If assumption (14) is satisfied, then for t > t0 ,
the value Cth (N ) is attained when Dt ⊇ U.
Proof: We formulate our question as a dynamic programming problem and provide an optimal policy for
node selection. Assume that πt is a fixed permutation that represents the order of the last n failed nodes. We
will consider the information flow graph Xt and show that the cut is minimized when all the nodes from U are
selected.
Consider a k ′ -step procedure which in each step selects one node from At . Each step entails a cost. Let t ′ 6 t
′
and assume that node vitt ′ ∈ At was selected. The cost is defined as the added weight values of the in-edges
′
of CUt ′ that are not out-edges of previously selected S nodes. Our goal is to choose k nodes that minimize the
t−1
total cost and hence minimize the cut between j =−1 Aj \ At and DCt .
In order to simplify notation, we write πt = (u1 , u2 , . . . , un ), i.e., ul = vπt (l ) is the storage node that appears
in the l th position in πt . Moreover, with a slight abuse of notation, if uj failed at time t ′ we will write hj (ui )
instead of ht ′ (ui ). For κ 6 k ′ consider the sub-problem in step κ − 1, where the DCt has already chosen κ − 1
nodes (ui1 , . . . , uiκ−1 ) and we are to choose the last node. Assume that the chosen nodes are ordered according
to their appearance in the permutation, i.e., i1 6 i2 6 . . . 6 iκ−1 . Let uj1 , . . . , ujm ∈ U be nodes that were not
selected up to step κ − 1, i.e.,
{uj1 , . . . , ujm } ∩ {ui1 , . . . , uiκ−1 } = ∅,
and assume also that j1 6 j2 6 . . . 6 jm . We show that choosing uj1 accounts for the minimum cut. First, we
claim that choosing uj1 minimizes the cut over all other nodes from U. Denote by Cκ−1 the total cost (or the
cut) in step κ − 1. Fix 2 6 ℓ ∈ [m] and note that since j1 6 jℓ , we can write
i1 6 . . . 6 ir1 6 j1 6 ir1 +1 6 . . . 6 irℓ 6 jℓ 6 iℓ+1 6 . . . ,
where the set of indices {i1 , . . . , ir } can be empty. Let C(j1 ) be the value of the cut once we add uj1 in the κth
step. The change from Cκ−1 is formed of the following components. First, we add the values of all the edges
from U\{uj1 } to uj1 and from L to uj1 , accounting for (n1 − 1)(β1 + ε1 ) + n2 β2 symbols. Further, we remove the
values of all the edges from the nodes ui1 , . . . , ur1 to uj1 and all the edges from uj1 to ur1 +1 , . . . uκ−1 . Overall
we obtain
Xr1 κ−1
X
C(j1 ) = Cκ−1 + (n1 − 1)(β1 + ε1 ) + n2 β2 − hj1 (uiq ) − hiq (uj1 ). (17)
q=1 q=r1 +1
Similarly, let Cjℓ be the value of Cκ if in step κ we select the node ujℓ , ℓ > 2. Following the same argument as
in (17), we obtain
rℓ
X κ−1
X
C(jℓ ) = Cκ−1 + (n1 − 1)(β1 + ε1 ) + n2 β2 − hjℓ (uiq ) − hiq (ujℓ ).
q=1 q=rℓ +1
Since hjℓ (ui ) = hj1 (ui ) and hi (uj1 ) = hi (ujℓ ) for all i ∈ [n], we have
rℓ
X
C(j1 ) − C(jℓ ) = hjℓ (uiq ) − hiq (uj1 ) .
q=r1 +1
13
Our goal is to show that the right-hand side of (18) is nonpositive. Let 1 6 q 6 rℓ . For uiq ∈ U we have
n1 − 1
hjℓ (uiq ) − hj1 (uiq ) = β1 − ε1 − (β1 + ε1 ) 6 0
n2
and for uiq ∈ L we have
hjℓ (uiq ) − hj1 (uiq ) = β2 − β2 = 0.
Now let rℓ+1 6 q 6 κ − 1. For uiq ∈ U we have
hiq (ujℓ ) − hiq (uj1 ) = β2 − (β1 + ε1 )
and for uiq ∈ L we have
n1 − 1
hiq (ujℓ ) − hiq (uj1 ) = β2 − (β1 − ε1 ),
n2
both of which are non-positive by assumption
P 1 (14).
The remaining terms in (18) contribute rq=r
ℓ +1
hiq (ujℓ ) − hj1 (uiq ) to the value of the cut. As before, for
uiq ∈ U we have
hiq (ujℓ ) − hj1 (uiq ) = β2 − (β1 + ε1 ) 6 0
by (14), and for uiq ∈ L we have
hiq (ujℓ ) − hj1 (uiq ) = β2 − β2 = 0.
Thus, C(j1 ) − C(jℓ ) 6 0.
2) Assume that jℓ > j1 . This case is symmetric to the case jℓ < j1 and the analysis is similar.
By the principle of optimality in dynamic programming, which states that every optimal policy consists only
of optimal sub-policies [5, Ch. 1.3], we now conclude that the minimum cut is formed by first taking all the
nodes from U and then take the remaining nodes from L.
14
Remark 2 Suppose in forming the cut, we have added all the nodes from U, and there are a more nodes (from
L) to select. To minimize the value of the cut, these are nodes should be taken to be the a most recently failed
nodes from L. This is because choosing the most recently failed node vπ(n) assures that as few as possible of
the previously selected nodes contain information from vπ(n) .
To justify this formally, consider the proof of Lemma 9. Indeed, if uj1 , ujℓ ∈ L with j1 < jℓ , then
rℓ
X
C(j1 ) − C(jℓ ) = hjℓ (uiq ) − hiq (uj1 )
q=r1 +1
Note that Lemma 7 is an immediate corollary of Lemmas 9 and 10. Indeed, taking the weight function h = h∗
implies that ε1 = 0 which satisfies assumption (14). Hence, the minimum cut is obtained when πt = i d and
Dt ⊇ U, and is equal to C.
Let us prove Theorem 3.
15
Proof of Theorem 3: From Lemma 9 we obtain that there exists ε1 > 0 such that assumption (14) is
satisfied and such that at each time t, the selection Dt that minimizes the cut, contains U. Lemma 10 implies
that the minimum cut is obtained for πt = i d. Taking πt = i d and Dt = {v1 , . . . , vk ′ }, it is straightforward to
check that
1 −1
nX a
X ∗ n1 (n1 − 1)
Cth (Dt ) = j(β1 + ε1 ) + n1 n2 β2 + (n2 − j)β2 = Cth (πt ) + ε1 (19)
2
j =1 j =1
for every ǫ > 0, there exists tǫ > t0 large enough such that µℓ (πt ) − |S1ℓ | 6 ǫ and therefore the limit exists
n
almost surely.
For t > tǫ consider
X 1 X 1 X
Pr(πt |Sℓn )Ct (πt ) > C t (πt ) > Ct (πt ) − ǫR,
ℓ
|Sℓn | − ǫ ℓ
|Sℓn | ℓ
πt ∈Sn πt ∈Sn πt ∈Sn
where R = maxπt ∈Sn Ct (πt ). To bound this sum below we fix the last a entries of the permutation. Since for
h = h∗ (i.e., ε1 = 0), assumption (14) is satisfied. Thus we can use Lemma 9, according to which Ct (πt ) is
minimized if n1 − ℓ entries from U appear in the first n1 − ℓ positions, followed by n2 − a + ℓ entries from L (in
any order). Fix the first n − a entries. Again according to Lemma 9, the minimum cut will be obtained when all
the ℓ nodes from U are in positions n − a + 1, n − a + 2, . . . , n − a + ℓ, and according to Lemma 10 it is equal
to Cmin := C + ℓ2 (β1 − β2 ). Also, the maximum cut will be obtained when all the ℓ nodes from U are located
in the last positions. This yields Cmax := C + ℓa(β1 − β2 ).
Let πt ∈ Sℓn be any permutation with vπt (i ) ∈ U for i ∈ {1, . . . , n1 − ℓ}. We claim that
Ct (πt ) + Ct (πtc ) = 2C + ℓ(a + ℓ)(β1 − β2 ) = Cmin + Cmax . (21)
Indeed, assume πt = π and let D be a selection of k active nodes that minimizes the cut. By Lemma 9
if there is at least one node from U in the last a places, the minimum cut will be obtained by selecting
the last a placesas a part of D. Moreover, if vi ∈ U with π −1 (i ) = n − a + m for some m ∈ [a], and
fπ (vi ) = b then vπ(1) , . . . , vπ(n−a+m) ∩ (D
∩ L) = b. Together with the fact that |D ∩ L| = a, this implies
c c −1
that {vπ(n−a+1) , . . . , vπ(n−a+m) } ∩ (D ∩ L) = b − ℓ. For π , we obtain that (π ) (i ) = n − m + 1 and
{vπc (n−m+1) , . . . , vπc (n) } ∩ L = b − ℓ which means that {vπc (1) , . . . , vπc (n−m+1) } ∩ (D ∩ L) = a − (b − ℓ).
With a slight abuse of notation, for a node vi we write π(vi ), π −1 (vi ), and (π c )−1 (vi ) to denote π(i ), π −1 (i ),
and (π c )−1 (i ), respectively. By Lemma 10 we have
X X
Ct (π) = C + fπ (v )(β1 − β2 ) > C + fπ (v )(β1 − β2 ).
v ∈D∩U v ∈D∩U
π −1 (v )∈{n−a+1,...,n}
For π c we obtain
X
Ct (π c ) > C + fπc (v )(β1 − β2 )
v ∈D∩U
(π c )−1 (v )∈{n−a+1,...,n}
X
=C + (a − (fπ (v ) − ℓ))(β1 − β2 ).
v ∈D∩U
(π c )−1 (v )∈{n−a+1,...,n}
Observe that (Sℓn )ℓ partitions the set Sn , and we can continue as follows:
t min{a,n1 }
h ∗ a.s. 1X X X
Cavg (N ) = lim Pr(πr = π|Sℓn ) Pr(πr ∈ Sℓn )Cr (π)
t→∞ t
r =t 0 ℓ=0ℓ π∈Sn
1 X X 1}
t min{a,n
= lim Pr(πr ∈ Sℓn )Eµℓ [Cr (N )]
t→∞ t r =t
0 ℓ=0
(n1 )( n2 )
By Lemma 11, for every ǫ > 0, there is tǫ > t0 such that Pr(πr ∈ Sℓn ) − ℓ n a−ℓ 6 ǫ. Hence, for every ǫ > 0,
(a )
t min{a,n1 } n1 n2 tǫ
h∗
a.s. 1 X X ℓ a−ℓ
X
Cavg (N ) > lim n
Eµ ℓ
[C r (N )] − n 1 R ,
t→∞ t
r =t ǫ a
ℓ=0 r =t0
By Lemma 1, the right-hand side of this inequality gives a lower bound on capacity. It can be transformed to
the expression on the right-hand side of (20) by repeated application of the Vandermonde convolution formula.
Thus, we have proved that the average minimum cut (and thus, the capacity) is almost surely bounded below
by an expression which is strictly greater than C, and accounting for the dynamics of the fixed-cost network
enables one to support storage of a larger file than in the static case of [2].
To summarize the results of this section, we have proved that
a.s. n n (n − 1) β1 − β2 an1 n1 − 1 o
1 1
cap(N ) − C > max ε1 , a+1+ (a − 1) , (23)
2 2 n n−1
where the first of the bounds on the right is valid under assumption (14). To give numerical examples, let us
return to Example 3. Applying Theorem 4 to Example 3 yields cap(N ) > 214β2 + 3.7β2 . At the same time,
Theorem 3 states that the storage capacity is bounded below by 214β2 + 49 β2 , showing that the choice of h is
not always optimal. Generally, the lower bound on capacity of Theorem 3 is C + n2n 1 n2
(β1 − β2 ) and the bound of
n1 a 2
Theorem 4 is approximately C + 2n . Therefore, Theorem 4 provides a better bound on the storage capacity
√
when a is roughly above n2 .
Since the storage capacity can be increased while the average amount of symbols each node vi transmits is
at most βi , after a long period of time (for large enough t), the total bandwidth that was used for repair in
the dynamical model is equal to the total bandwidth that was used for repair in the static model.
h∗
To conclude this section, we address the question regarding the accuracy of the derived bounds on E Cavg (N ) .
In the next proposition we derive an upper bound on this quantity.
Proposition 1 Let (N , β, S, h∗ ) be a storage network. We have (µ1 -a.s.)
h ∗ an1 (a + n1 )
Cavg (N ) 6 C + (β1 − β2 ).
2n
Proof: Given π ∈ Sℓn , denote by π the permutation in which the first n2 − a + ℓ positions contain nodes
from L, the next n1 − ℓ positions contain nodes from U, and the last a positions are the same as π. Lemma 9
and Remark 2 imply that Ct (π t ) > Ct (πt ). By (21) and by Lemma 10 we obtain
Ct (π t ) + Ct (π ct ) = 2C + ℓ(a + n1 )(β1 − β2 ).
Hence,
Ct (πt ) + Ct (πtc ) 6 Ct (π t ) + Ct (πt ) = 2C + ℓ(a + n1 )(β1 − β2 )
18
where the last equality follows from Lemma 11. By Vandermonde’s identity we obtain that almost surely
h ∗ an1 (a + n1 )
Cavg (N ) 6 C + (β1 − β2 ).
2n
Proposition 1 and Theorem 4 jointly result in the following (a.s.) inequalities for the average cut of the
fixed-cost storage network:
an1 (β1 − β2 ) n1 − 1 h∗ an1 (β1 − β2 )
(a + 1 + (a − 1)) 6 Cavg (N ) − C 6 (a + n1 ) (24)
2n n−1 2n
h ∗
where C is given in Lemma 7. For the above example, we obtain for the gap between Cavg (N ) and C an
upper bound of 9.75. Generally, the difference between the upper and lower bounds (discounting the common
1 −1)
multiplier) is (n−a)(n
n−1 . Of course this does not directly result in an upper bound on capacity of N , which
appears to be a difficult question (a loose upper bound was obtained in (6), which in the example gives a gap
of at most 21.5).
(cf. (2)). Although the memory property does not affect the storage capacity when βi = β0 for all i ∈ [n], using
our idea of controlling the transmission policy enables us to increase the storage capacity. As a main result of
this section, we show that the capacity of the network can be increased over the non-causal model.
Recall our notation [n] = U ∪L, where |U| = n1 , |L| = n2 . Throughout this section we denote â , k ′ −n2 > 0.
The following lemma is a natural minimax analog of Lemma 7.
19
Then
â
X n2
X
C′ = n 1 β1 + n 2 β2 − i β1 + (n1 − â)β1 + n2 β2 − jβ2 . (27)
i =1 j =1
Lemma 13 can be obtained from the next lemma which is a modified version of Lemma 9, together with the
fact that every permutation appears as an associated permutation in (N , β, S) µ1 -almost surely.
∗
Lemma 14 Let (N , β, s, h∗ ) be a storage network. For t > t0 , Ctmax,h (N ) is obtained when Dt ⊇ L.
The proof of Lemma 14 is similar to the proof of Lemma 9 and is given in the appendix. Note that according
to Lemma 9, the selection that minimizes the cut at time t is the node from U that has failed before the other
nodes in U.
Remark 4 Similarly to Remark 2, from the proof of Lemma 9 it follows that after choosing the nodes in L,
we should choose the remaining â nodes in the order reversed from the order of their failure, starting with the
most recently failed node.
For a network with memory (N , β, S) we denote the average (maximum) cut and the storage capacity by
max,h
Cavg , capm (N ), respectively. The main result of this section is stated in the following theorem.
Theorem 5 Let (N , β, S) be a (random) storage network with memory. We have (µ1 -a.s.)
β1 − β2 n1 n2 â
â − 1
capm (N ) > C ′ + 2− .
2 n n−1
In this section we denote by Ŝℓn the set of all permutations over [n] with exactly ℓ elements from U in the last
â positions. To prove Theorem 5 we need the following lemma.
Lemma 15 Let (N , β, S, h∗ ) be a storage network with memory. Let πt be the permutation at time t and
assume that πt is distributed uniformly over Ŝℓn . We have
∗ 1
E Ctmax,h (N ) > C ′ + ℓ (2n2 − â + ℓ) (β1 − β2 ),
2
where C ′ is given in (26).
Proof: For any permutation π ∈ Ŝℓn , let π ∈ Ŝℓn be a permutation in which the first n1 − ℓ positions contain
only nodes from U, and the last â positions are exactly as in π. Then by Lemma 14 and Remark 4 we have
∗
Ctm (πt ) > Ctm (π t ). This implies that by fixing the last â positions in πt , we can bound Ctmax,h (πt ) below by
∗
Ctmax,h (πt ). We claim that
∗ ∗ 1
Ctmax,h (πt ) + Ctmax,h (πtc ) > C ′ + ℓ(2n2 − â + ℓ)(β1 − β2 ).
2
∗ ∗ ∗ ∗
Note that if πt ∈ Ŝℓn then πtc ∈ Ŝℓn as well. Hence, Ctmax,h (πt ) + Ctmax,h (πtc ) > Ctmax,h (πt ) + Ctmax,h (π ct ).
By Lemma 10 we obtain
∗
X
Ctmax,h (πt ) = C ′ + fπt (v )(β1 − β2 )
v ∈Dt ∩U
X
′
=C + fπt (v )(β1 − β2 )
v ∈Dt ∩U
(π t )−1 (v )∈{n−â+1,...,n}
Using definition of π t ∈ Ŝℓn and Lemma 14, we now observe that fπt (v ) = n2 − (â − ℓ) + b. For π ct we have
fπct (v ) = n2 − (â − ℓ) + (â − ℓ) − b = n2 − b.
Overall we obtain
∗ ∗
X
Ctmax,h (π t ) + Ctmax,h (π ct ) = 2C ′ +
fπt (v ) + fπct (v ) (β1 − β2 )
v ∈Dt ∩U
′
= 2C + ℓ (2n2 − â + ℓ) (β1 − β2 )
which in turn implies that
∗ ∗
Ctmax,h (πt ) + Ctmax,h (πtc ) > 2C ′ + ℓ (2n2 − â + ℓ) (β1 − β2 ) .
We conclude the proof by noticing that
∗ ∗
X
E Ctmax,h (N ) = Pr(πt )Ctmax,h (πt )
πt ∈Ŝℓn
1 1 X ∗ ∗
Ctmax,h (πt ) + Ctmax,h (πtc )
=
Ŝℓ 2
n πt ∈Ŝℓn
1
> C ′ + ℓ (2n2 − â + ℓ) (β1 − β2 ).
2
where the inequality follows from Lemma 12 and the last equality follows from Lemma 11 (with â) and since
the stationary distribution of πt is the uniform distribution. The final expression is obtained by repeated use of
the Vandermonde convolution formula. The average cut bounds the storage capacity below since we can follow
∗ ∗
the same arguments as in Lemma 1 with Ctmax,h instead of Cth .
For a numerical example we return to Example 3. If at each time t, DCt chooses the k ′ nodes which yield
the maximum cut, by Theorem 5, the storage capacity is capm (N ) > C ′ + 13 31 β2 , where C ′ = 269β2 . This is
much greater than the lower bound computed earlier for the non-causal case, and in fact even breaks above
the static-case upper bound of (6).
As seen from Theorem 5, if β1 = β2 the bound below is equal to the storage capacity of the static model.
This comes as no surprise since the network is invariant under permutations of the storage nodes.
21
and
q(n1 −1)
β1 −
pn2 ε1 vi ∈ U
hL (vi ) = β2 vi ∈ L \ sj
0 vi = sj
and 0 6 ε1 6 β1 . By a calculation similar to Lemma 8 it is straightforward to check that the constraints given
by β are satisfied. Moreover, the proof of Theorem 3 does not use the fact that the stationary distribution of the
associated permutations is uniform. Thus, from Lemma 9 and Lemma 10 we obtain the following statement.
Theorem 6 Let (N , β, S, h) be a fixed-cost storage network with weight function h as defined above. Assume
that the failure probability of a node from U is q > 0 and of a node from L is p > 0. Fix ε1 > 0 such that
1 −1)
β1 − β2 > qn(npn2 ε1 . The storage capacity is bounded below by
a.s. n1 (n1 − 1)
cap(N ) > C + ε1 , (28)
2
where C is given in Lemma 7.
1 3 pn2
For a numerical example consider Example 3 with q = 40 and p = 40 . Let us choose ε1 = qn(n1 −1) β2 = 16 β2 .
From (28) we now obtain
cap(N ) > (214 + 7.5)β2 ,
where as above, C = 214β2 is the value of the min-cut in the static case. As above in this paper, the assumption
on ε1 introduced in the theorem limits the increase of the network capacity. Lifting the assumption suggests
following the path taken in Theorem 4 of Sec. III-B. To implement this idea, we need to find the stationary
distribution of the Markov random walk on Sn that arises under our assumption. This is however not an easy
task, and the classic (asymptotic) results such as in [9] seem not to be of help here. We have succeeded to
perform the analysis in the simple case of n = n2 + 1, i.e., of the “upper” set formed of a single node U = {u},
and we present this result in the remainder of this section.
Suppose that the failed nodes in the sequence S are chosen independently and that Pr(S i = v ) = p if v ∈ L
and Pr(S i = v ) = q if v ∈ U. Assuming that p, q 6= 0, almost surely there exists a finite time t0 such that all
the nodes have failed at least once by t0 . Choosing the next failed node gives rise to a permutation on Sn , and
the conditional probabilities Pr(πt |πt−1 ) between the permutations are well defined and can be found explicitly.
The probabilities Pr(πt |πt−1 ) define an ergodic Markov chain with a unique stationary distribution ν.
Define a partition of Sn into n blocks Pi , i ∈ [n]. Let π ∈ Pi if and only if π −1 (u) = i . The partition (Pi )
defines an obvious equivalence relation on Sn , and |Pi | = (n − 1)! for all i .
It turns out that the stationary probabilities of equivalent permutations are the same, i.e., ν(π) depends only
on the block Pi ∋ π. The distribution ν is given in the next lemma.
For any real number r and natural number k we define kr = r (r −1)...(r −k+1)
, and put 0r = 1.
k!
22
Lemma 16 Let (N , β, S) be a dynamical storage network with n = n1 +n2 nodes, where n1 = 1. Let 0 < q 6 p
and suppose that S i , i = 1, 2, . . . are independent random variables with Pr(S i = v ) = p if v ∈ L and
Pr(S i = v ) = q if v ∈ U. Let π ∈ Pi and define the distribution
1 −1 1
1−q p −1 p −n−1+i
ν(π) = .
(n − 1)! n − 2 i −1
Then ν is the stationary distribution of the Markov chain with state space Sn .
Proof: 1. We first note that for any t, Pr (πt+1 |πt ) = q if πt+1 ∈ Pn and Pr (πt+1 |πt ) = p otherwise. This
implies that for a fixed πt ,
Xn
(n − 1)p + 1 = Pr (πt+1 ∈ Pi |πt ) = 1.
i =1
Hence, p1 > n −1 which implies that all the binomial coefficients in ν(π) are positive. Moreover, since (n −1)p =
1 − q we obtain that if π ∈ Pn then the expression for ν(π) simplifies as follows
1 −1 1
1−q p −1 p −n−1+n
ν(π) =
(n − 1)! n − 2 n−1
1
1−q p − n + 1
=
(n − 1)! n − 1
q
= . (29)
(n − 1)!
2. Let us check that ν is a probability vector. As already remarked, ν(π) > 0 for all π ∈ Sn . Obviously, if
1
π, σ ∈ Pi for some i , then ν(π) = ν(σ) = (n−1)! ν ({Pi }).
By the definition of ν we have that ν({Pi +1 }) = ν({Pi })(1 + 1−pn
pi ) for all i 6 n − 1. Therefore,
X n
X
ν(π) = ν ({Pi })
π∈Sn i =1
n−1
X
= ν ({Pi }) + (n − 1)!ν({Pn })
i =1
j
n−2 Y
X 1 − pn
= ν({P1 }) 1 + 1+ + q.
pi
j =1 i =1
Note that
j
1 − pn j + 1−pn
Y
p
1+ =
pi j
i =1
1−q
1
−1−1
Since for π ∈ P1 , ν ({P1 }) = (n − 1)!ν(π) and ν(π) = (n−1)! p
n−2
, we have
1
−1
X
ν(π) = (n − 1)! p ν(π) + q = 1.
n−2
π∈Sn
PFinally let us show that ν is a stationary vector of the transition matrix. Fix t and consider the sum
π∈Sn ν(π) Pr (πt+1 = σ|πt = π). For σ ∈ Pi , this sum has exactly n non-zero terms, of which i are for
π ∈ Pi +1 and n − i for π ∈ Pi . Therefore, if σ ∈ Pi , we obtain
X pi p(n − i )
ν(π) Pr (πt+1 = σ|πt = π) = ν({Pi +1 }) + ν({Pi }).
(n − 1)! (n − 1)!
π∈Sn
23
1−pn
Since ν ({Pi +1 }) = ν ({Pi }) (1 + pi ) for i 6 n − 1, we have
1 − pn
X p
ν(π) Pr (πt+1 = σ|πt = π) = ν({Pi }) i 1 + + (n − i )
(n − 1)! pi
π∈Sn
1
= ν({Pi })
(n − 1)!
= ν(σ).
Now assume that σ ∈ Pn . We obtain
n
X 1 X
ν(π) Pr(πt+1 = σ|πt = π) = qν({Pi }) (30)
(n − 1)!
π∈Sn i =1
n−1
!
q X
= ν({Pi }) + ν({Pn }) .
(n − 1)!
i =1
Pn−1
Using the fact that i =1 ν({Pi }) = 1 − q jointly with (30), we conclude that
X q
ν(π) Pr(πt+1 = σ|πt = π) = (1 − q + q) ,
(n − 1)!
π∈Sn
where C is given (12). To argue that this expression can be used in the lower bound on cap(N ) similar to the
bound in Theorem 4 (or in (22)) we can repeat the arguments used in the proof of Lemma 12. Then a modified
version of (22) together with the above expression for ν gives a lower bound on the capacity.
4
To give a numerical example, assume that we have n = 20 with n2 = 19, p = 95 and q = 15 . Assume also
′
that β1 = 2β2 and k = 13 (which implies that a = 12). According to Lemma 7, the capacity in the static
model is C = 150β2 . Using the results in this section, we obtain that a.s.
cap(N ) > (0.022 · 150 + 155.4)β2 = 158.7β2 .
Lemma 3 implies that cap(N ) 6 177.45β2 and thus in the above example we have obtained the capacity
increase of more that 30% of the gap between the bounds.
24
Appendix
A. Proof of Lemma 14
Assume that πt is a fixed permutation and consider the information flow graph Xt . We consider a k ′ -step
procedure which in each step selects one node from At . Let t ′ 6 t and assume the node vitt ∈ At was selected.
The cost it entails is defined as the added weight values of the in-edges of CUt that are not out-edges of
previously selected nodes. our goal is to select k ′ nodes that maximizes the cut for πt .
In order to simplify notation, we write πt = (u1 , u2 , . . . , un ), i.e., ul = vπt (l ) is the storage node that appears
in the l th position in πt . Moreover, with a slight abuse of notation, if uj failed at time t ′ we will write hj (ui )
instead of ht ′ (ui ). For κ 6 k ′ , consider the sub-problem at step κ − 1, where the DCt has already chosen κ − 1
nodes (ui1 , . . . , uiκ−1 ) and we are to choose the last node. Assume that the chosen nodes are ordered according
to their appearance in the permutation, i.e., i1 6 i2 6 . . . 6 iκ−1 . Let uj1 , . . . , ujm ∈ L be nodes that were not
selected up to step κ − 1, i.e.,
{uj1 , . . . , ujm } ∩ {ui1 , . . . , uiκ−1 } = ∅,
and assume also that j1 6 j2 6 . . . 6 jm . We show that choosing uj1 accounts for the maximum cut.
First, we show that choosing uj1 maximizes the cut over all other nodes from L. Denote by Cκ−1 the total
cost (or the cut) in step κ − 1. Fix 2 6 ℓ ∈ [m] and note that since j1 6 jℓ we may write
i1 6 . . . 6 ir1 6 j1 6 ir1 +1 6 . . . 6 irℓ 6 jℓ 6 iℓ+1 6 . . . ,
where j1 could also be 1. Let C(j1 ) be the cut value if DCt chooses uj1 in the κth step, respectively. The change
from Cκ−1 is formed of the following components. First, we add the values of all the edges from U to uj1 and
from L{uj1 } to uj1 , accounting for (n1 − 1)(β1 + ε1 )+ n2 β2 symbols. Next, for each node uiq with r1 < q 6 κ− 1,
we subtract hi∗q (uj1 ) from the cut value. Overall we obtain
r1
X κ−1
X
C(j1 ) = Cκ−1 + n1 β1 + (n2 − 1)β2 − hj∗1 (uiq ) − hi∗q (uj1 ). (31)
q=1 q=r1 +1
Since uj1 , ujℓ ∈ L, we obtain that hj∗ℓ (ui ) = hj∗1 (ui ) and hi (uj1 ) = hi (ujℓ ) for all i ∈ [n], we have
rℓ
X
C(j1 ) − C(jℓ ) = (hj∗ℓ (uiq ) − hi∗q (uj1 )).
q=r1 +1
For uiq ∈ U, we obtain hjℓ (uiq )−hiq (uj1 ) = β1 −β2 > 0 and for uiq ∈ L, we obtain hjℓ (uiq )−hiq (uj1 ) = β2 −β2 = 0,
and so
C(j1 ) − C(jℓ ) > 0.
Now we show that uj1 maximizes the cut over the selection of any node ujℓ from U. We divide the argument
into 2 cases:
25
1) Assume that jℓ < j1 . Denote by (i1 , . . . , irℓ , jℓ , irℓ +1 , . . . , ir1 , j1 , . . . ) the selected nodes and let C(jℓ ), C(j1 )
be the cut values if we choose ujℓ , uj1 , respectively. We have
rℓ
X κ−1
X
C(jℓ ) = Cκ−1 + (n1 − 1)β1 + n2 β2 − hj∗ℓ (uiq ) − hi∗q (ujℓ )
q=1 q=rℓ +1
Since uiq ∈ U we have hiq (ujℓ ) − hj1 (uiq ) = 0 and for uiq ∈ L we have hiq (ujℓ ) − hj1 (uiq ) > 0, we conclude
that C(j1 ) − C(jℓ ) > 0.
2) Now assume jℓ > j1 . This case is symmetric to the case jℓ < j1 and relies on the same analysis. We omit
the details.
According to the principle of optimality [5, Ch. 1.3], every optimal policy consists only of optimal sub-policies,
and therefore we first need to choose all the nodes from U and then choose nodes from L. This completes the
proof
References
[1] R. Ahlswede, N. Cai, S. Y. R. Li, and R. W. Yeung, “Network information flow,” IEEE Trans. Inform. Theory, vol. 46, no. 4,
pp. 1204–1216, Jul. 2000.
[2] S. Akhlaghi, A. Kiani, and M. R. Ghanavati, “Cost-bandwidth tradeoff in distributed storage systems,” Computer Communi-
cations, vol. 33, no. 17, pp. 2105–2115, 2010.
[3] D. Aldous and P. Diaconis, “Shuffling cards and stopping times,” The American Mathematical Monthly, vol. 93, no. 5, pp.
333–348, 1986.
[4] A. Badita, P. Parag, and J.-F. Chamberland, “Latency analysis for distributed storage systems,” IEEE Trans. Inf. Theory,
vol. 65, no. 6, pp. 4683–4698, 2019.
[5] D. P. Bertsekas, Dynamic programming and optimal control. Belmont, MA: Athena Scientific, 2005.
[6] V. R. Cadambe, S. A. Jafar, H. Maleki, K. Ramchandran, and C. Suh, “Asymptotic interference alignment for optimal repair
of MDS codes in distributed storage.” IEEE Trans. Inf. Theory, vol. 59, no. 5, pp. 2974–2987, 2013.
[7] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci et al., “Windows
Azure Storage: a highly available cloud storage service with strong consistency,” in Proceedings of the Twenty-Third ACM
Symposium on Operating Systems Principles. ACM, 2011, pp. 143–157.
[8] A. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,”
IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
[9] L. Flatto, A. Odlyzko, and D. Wales, “Random shuffles and group representations,” The Annals of Probability, vol. 13, no. 1,
pp. 154–178, 1985.
[10] R. G. Gallager, Stochastic processes: theory for applications. Cambridge University Press, 2013.
[11] G. Grimmett and D. Stirzaker, Probability and Random Processes, 3rd ed. Oxford Univ. Press, 2001.
[12] G. Joshi, Y. Liu, and E. Soljanin, “Coding for fast content download,” in Proc. 50th Annual Allerton Conf. Commun. Control
Comput., 2012, pp. 326–333.
[13] A. M. Kermarrec, N. L. Scouarnec, and G. Straub, “Repairing multiple failures with coordinated and adaptive regenerating
codes,” in Int. Symp. on Network Coding (NetCod). IEEE, 2011, pp. 1–6.
[14] O. Khan, R. C. Burns, J. S. Plank, W. Pierce, and C. Huang, “Rethinking erasure codes for cloud file systems: Minimizing I/O
for recovery and degraded reads.” in Proc. 2012 USENIX Conf. on File and Storage Technology (FAST), 2012, 14pp.
[15] M. Luby, R. Padovani, T. Richardson, L. Minder, and P. Aggarwal, “Liquid cloud storage,” arXiv:1705.07983, 2017.
[16] M. Silberstein, L. Ganesh, Y. Wang, L. Alvisi, and M. Dahlin, “Lazy means smart: Reducing repair bandwidth costs in erasure-
coded distributed storage,” in Proceedings of International Conference on Systems and Storage. ACM, 2014, pp. 1–7.
[17] J. Y. Sohn, B. Choi, S. W. Yoon, and J. Moon, “Capacity of clustered distributed storage,” IEEE Trans. Inf. Theory, vol. 65,
no. 1, pp. 81–107, 2019.