Codes For Distributed Storage: Foundations and Trends in Communications and Information Theory
Codes For Distributed Storage: Foundations and Trends in Communications and Information Theory
Information Theory
Codes for Distributed Storage
Suggested Citation: Vinayak Ramkumar, S. B. Balaji, Birenjith Sasidharan, Myna
Vajha, M. Nikhil Krishnan and P. Vijay Kumar (2022), “Codes for Distributed Storage”,
Foundations and Trends® in Communications and Information Theory: Vol. 19, No. 4,
pp 547–813. DOI: 10.1561/0100000115.
Vinayak Ramkumar
Indian Institute of Science, Bengaluru
[email protected]
S. B. Balaji
Qualcomm, Bengaluru
[email protected]
Birenjith Sasidharan
Govt. Engineering College, Barton Hill, Trivandrum
[email protected]
Myna Vajha
Qualcomm, Bengaluru
[email protected]
M. Nikhil Krishnan
International Institute of Information Technology Bangalore
[email protected]
P. Vijay Kumar
Indian Institute of Science, Bengaluru
[email protected]
This article may be used only for the purpose of research, teaching,
and/or private study. Commercial use or systematic downloading (by
robots or other automatic processes) is prohibited without explicit
Publisher approval.
Boston — Delft
Contents
1 Introduction 549
1.1 Conventional Repair of an MDS Code . . . . . . . . . . . 549
1.2 Regenerating Codes and Locally Recoverable Codes . . . . 551
1.3 Overview of the Monograph . . . . . . . . . . . . . . . . . 553
Acknowledgements 782
References 783
Codes for Distributed Storage
Vinayak Ramkumar1 , S. B. Balaji2 , Birenjith Sasidharan3 , Myna
Vajha4 , M. Nikhil Krishnan5 and P. Vijay Kumar6
1 Indian Institute of Science, Bengaluru, India; [email protected]
2 Qualcomm, Bengaluru, India; [email protected]
3 Govt. Engineering College, Barton Hill, Trivandrum, India;
[email protected]
4 Qualcomm, Bengaluru, India; [email protected]
5 International Institute of Information Technology Bangalore, India;
[email protected]
6 Indian Institute of Science, Bengaluru, India; [email protected]
ABSTRACT
In distributed data storage, information pertaining to a given
data file is stored across multiple storage units or nodes in
redundant fashion to protect against the principal concern,
namely, the possibility of data loss arising from the failure
of individual nodes. The simplest form of such protection
is replication. The explosive growth in the amount of data
generated on a daily basis brought up a second major con-
cern, namely minimization of the overhead associated with
such redundant storage. This concern led to the adoption by
the storage industry of erasure-recovery codes such as Reed-
Solomon (RS) codes and more generally, maximum distance
separable codes, as these codes offer the lowest-possible
storage overhead for a given level of reliability.
In the setting of a large data center, where the amount of
stored data can run into several exabytes, a third concern
549
550 Introduction
i=1
1.2. Regenerating Codes and Locally Recoverable Codes 551
is associated to a codeword
h = (h1 h2 . . . hn )
i=1
For the operation of a data center, equation (1.1) has two implications.
Firstly, that the replacement of the failed node must necessarily contact
k “helper nodes”, i.e., nodes that store the code symbols {ci | hh1i ̸= 0}.
Secondly, equation (1.1) suggests that each helper node must transfer
its entire contents (represented by ci ) for repair of the failed node. The
number of helper nodes contacted (at least k in the case of an MDS
code) is called the repair degree of the code. The total amount of data
downloaded for repair of the failed node is termed the repair bandwidth.
In the case of an MDS code, it is clear that the repair bandwidth is at
least k times the amount of data stored in the failed node.
This is illustrated below in the case of an [14, 10] MDS code. Assume
a data file of size equal to 1 GB. The data file is partitioned into 10
fragments, each of size 100 MB and each data fragment is stored in a
different node. Four parity nodes are then created, corresponding to
the four parity symbols of the MDS code. The contents of the 14 nodes
can be regarded as the layering of 108 codewords, each belonging to the
[14, 10] MDS code over F28 . Fig. 1.1 shows repair of a failed node. As
552 Introduction
1 2 3 4 5 6 7 8 9 10 11 12 13 14
100 100 100 100 100 100 100 100 100 100 100 100 100
MB MB MB MB MB MB MB MB MB MB MB MB MB
Failed
Node
Data Node
10 X 100MB
Parity Node
100 Replacement
MB Node
Figure 1.1: Illustrating the repair degree and repair bandwidth involved in the
conventional repair of a failed node in a [14, 10] MDS code
Minimize repair bandwidth Minimize repair degree Minimize both repair Improved repair of
bandwidth and repair degree RS codes
Figure 1.2: An overview of the coverage of codes for distributed storage in this
monograph.
Regenerating Codes The next seven sections deal with RGCs. The
definition of an RGC along with a fundamental upper bound on file
size is presented in Section 3. The bound reveals that there is a tradeoff
between the storage overhead and the repair bandwidth. Sections 4
and 5 present constructions for the two main classes of RGCs, namely
minimum bandwidth regenerating (MBR) codes and minimum storage
regenerating (MSR) codes, that lie at the two ends of the storage-repair
554 Introduction
Section 11, represent one such example. This class of codes has the
additional feature that in the case of a single erased node, there are
multiple, node-disjoint means of recovering from the node failure. This
can be a very useful feature to have in practice, particularly as a means
of handling cases when there are multiple simultaneous demands for
the data contained within a particular node.
Sequential-recovery LRCs place the least stringent conditions on an
LRC for the local recovery from multiple erasures, and consequently,
have smallest possible storage overhead. These are discussed in Sec-
tion 12. If an LRC has large block length and small value of repair
degree r, and a particular local code is overwhelmed by erasures, the
only option is to fall back on the properties of the full-length block code
to recover from the erasure pattern, leading to a sharp increase in the
repair degree. Codes with hierarchical locality, discussed in Section 13,
are designed to address this situation, provide layers of local codes hav-
ing increasing block length as well as erasure-recovery capability, and
permit a more graceful degradation in repair degree with an increasing
number of erasures.
Maximally recoverable codes (MRCs), discussed in Section 14, may
be regarded as the subclass of LRCs that are as MDS as possible in the
sense that every set of k columns of the generator matrix of an MRC is
a linearly independent set, unless the locality constraints imposed make
it impossible for this to happen. An MRC is maximal in the sense that
if an MRC is not able to recover from an erasure pattern, then no other
code satisfying the same locality constraints can possibly recover from
the same erasure pattern.
Codes in Practice The final section, Section 17, discusses the impact
that the theoretical developments discussed in this monograph have
had in practice.
2
Maximum Distance Separable Codes
Let Fq be a finite field of q elements. Let Fq [x] denote the set of all
polynomials in x over Fq :
( d )
Fq [x] = ui xi | ui ∈ Fq , d ∈ {0, 1, 2, · · · } .
X
i=0
557
558 Maximum Distance Separable Codes
j=1
has exactly (k − 1) zeros and it follows from this that dmin = wmin =
(n − k + 1) in the case of an [n, k] RS code.
We use [n, k, dmin ] to denote an [n, k] linear code having minimum
distance dmin . Analogously, we will use (n, M, dmin ) to denote an (n, M )
code having minimum distance dmin . It follows that the RS code CRS is
2.2. Singleton Bound 559
M ≤ |A|n−dmin +1 .
It follows that
n n n−2 n−2 n
!
ui h(θi ) = hj θij = ui θij = 0.
X X X X X
ui hj
i=1 i=1 j=0 j=0 i=1
n
=⇒ ui f (θi )g(θi ) = 0.
X
i=1
It follows from this that the dual of an RS code having generator matrix
of the form
1 1
...
θ1 ... θn
G =
.. .. ..
. . .
θ1k−1 . . . θnk−1
is the block code having generator matrix of the form
1 1
... u1
θ1 ... θn
u2
H =
.. .. ..
..
. . . .
θ1n−k−1 ... θnn−k−1 un
with all ui ̸= 0. We will refer to any code having generator matrix G of
the form
1 1
... u1
θ1 ... θn
u2
G = with all ui ̸= 0
.. .. ..
..
. . . .
θ1k−1 ... θnk−1 un
562 Maximum Distance Separable Codes
G = [Ik | P ],
i=1 i=1
564 Maximum Distance Separable Codes
The formal derivative A′ (x) of A(x) is given from the product formula
by:
m m
A′ (x) = (x − aj ),
X Y
i=1 j=1,j̸=i
so that
m
A′ (aℓ ) = (aℓ − aj ).
Y
j=1,j̸=ℓ
Define:
m
A(x) (x − aj )
Ai (x) = = (2.3)
Y
.
(x − ai )A′ (ai ) j=1,j̸=i
(ai − aj )
Then
1, x = ai
Ai (x) =
0, x = aℓ , ℓ ̸= i,
i=1 j=1,j̸=i
(ai − aj ) i=1
It is evident from the above that the (m × m) matrix H whose (u, i)th
entry is given by
−Bu (ai )A(bu )
Hui = , 1 ≤ u, i ≤ m, (2.4)
A′ (ai )
is the inverse of the Cauchy matrix. An alternate, more symmetric
expression
Hui = (ai − bu )Bu (ai )Ai (bu ), 1 ≤ i, u ≤ m,
can be obtained by noting in (2.4) that
A(x) A(bu )
Ai (x) = =⇒ ′ = Ai (bu )(bu − ai ).
(x − ai )A (ai )
′ A (ai )
Notes
1. MDS codes of block length q + 1, q +2: Let {θ1 , θ2 , . . . , θq } denote
the q elements in Fq . It is straightforward to verify that the code
having generator matrix
1 1 0
...
θ1 ... θq 0
G =
.. .. ..
. . .
θ1k−1 . . . θqk−1 1
is an [n = q + 1, k] MDS code [156]. If q is even, then the generator
matrix
1 ... 1 0 0
G = θ 1 . . . θq 0 1
θ12 . . . θq2 1 0
566 Maximum Distance Separable Codes
567
568 Regenerating Codes
where the role of the various parameters is explained below. The aim of
the RGC is to store in efficient and reliable fashion data pertaining to a
data file B that is comprised of B symbols, termed the message symbols,
belonging to an underlying finite field Fq . The B message symbols are
first mapped onto a set of nα symbols over Fq and the nα symbols are
then distributed evenly, across a set of n storage units called nodes,
so that each node stores exactly α symbols. The creation of the nα
1
1
Data
2 Collector 1’
2
3
k
k+1
d+1
n n
codes symbols and their distribution across n nodes should be such that
the two key properties described below and illustrated in Fig. 3.1, are
satisfied:
2. M = q B ,
3. dmin ≥ (n − k + 1),
4. and where, for every index i ∈ [n], and every subset S ⊆ [n] \ {i}
of size |S| = d, there are functions
hi,j,S : Fαq → Fβq , ∀j ∈ S,
as well as functions
fi,S : Fdβ
q → Fαq
1
This is in analogy with the analogous term first introduced in the context of a
locally recoverable code. Locally recoverable codes are introduced in Section 10.
570 Regenerating Codes
such that
1. A = Fαq ,
2. M = q B ,
4. And where, for every index ℓ ∈ [L], i ∈ [n], and every subset
S ⊆ [n] \ {i} of size |S| = d, there are functions
(ℓ)
hi,j,S : Fαq → Fβq , ∀j ∈ S,
as well as functions
(ℓ)
fi,S : Fdβ
q → Fαq ,
3.1. Definition and Terminology 571
we have
c1 , · · · , ci−1 , ĉi , ci+1 , · · · , cn | (c1 , · · · , cn ) ∈ Cℓ = Cℓ′
′ ′
for some ℓ , 1 ≤ ℓ ≤ L. An FR RGC is then simply a code that is
an element of such a collection of FR RGCs and shares the same
parameter set as does the collection of FR RGCs.
Remark 2. The naive approach would be to define an FR RGC by
defining it as a code that, apart from the data collection property, has
the property that following node repair, one arrives at a second code that
is also an FR RGC. The above approach aims to avoid such a circular
definition. The presence of a collection of FR RGCs, as appearing in
the definition above, can be seen in the construction of an FR RGC
appearing in [212]. The case of ER may be regarded as corresponding to
the special case of FR when there is just a single code in the collection,
i.e., when L = 1.
(b) The mapping used to recover the B message symbols from the kα
contents of a specific set of k nodes,
(c) The mapping Fαq → Fβq used by node j to determine the β sym-
bols to be forwarded to the replacement of failed node i, given
knowledge of the remaining (d − 1) helper nodes,
(d) The mapping used by the replacement node to extract the α sym-
bols to be stored from the dβ symbols supplied to the replacement
node, by a specific set of d helper nodes.
We will say that an RGC is linear if all four mappings above are linear.
All the RGCs that will be encountered in this monograph will be linear.
572 Regenerating Codes
H(Wi ) ≤ α, (3.3)
where we have taken the unit of entropy as log2 (|Fq |) = log2 (q) bits. In
all of our discussion here, the entropy will always be measured in units
of log2 (q) bits. For A ⊆ [n], we use the notation WA to denote the set
WA = {Wi | i ∈ A}.
|D ∪ {x}| = d,
2
The original proof of file size upper bound for RGCs by Dimakis et al. [50] was
using a network coding approach, which we discuss in Section 3.4.
3
Strictly speaking M is a random vector, but we will use the term random variable
to refer to either a random vector or random variable. Also, random variables typically
take on real values, however, this is not an essential restriction.
3.2. Bound on File Size 573
when the set of d helper nodes is the set D ∪ {x}. We will drop the pre-
script D and simply write Sxy if D is understood from the context. As an
example, this can happen if n = (d + 1), in which case D = [n] \ {x, y}.
Given subsets X, Y, D ⊆ [n], with
|D ∪ X| = d, D ∩ {X ∪ Y } = ϕ,
We will focus our attention from here on, on the code C D instead of the
code C, and establish the upper bound on file size B. The same bound
will then continue to apply to the code C. We will also assume without
loss of generality, that the n nodes are indexed so that D = [d + 1].
From the data collection property of an RGC, we have
k
B = H(W1 , . . . , Wk ) = H(Wi | W[i−1] ),
X
i=1
574 Regenerating Codes
Lemma 1.
H(Wi | W[i−1] ) ≤ (d − i + 1)β.
Proof. We will prove this lemma in three steps.
Step 1 - We set A := [d + 1] \ [i], and note that
i
H(Wi | S[i−1] i
, SA ) = 0,
which follows from the repair property of an RGC.
Step 2 - We will show using Lemma 2 below, that this implies
i
H(Wi | S[i−1] ) ≤ H(SA
i
). (3.8)
Since, from the definition of an RGC, we have
i
H(SA ) ≤ |A|β = (d − i + 1)β,
this in turn implies that
i
H(Wi | S[i−1] ) ≤ (d − i + 1)β. (3.9)
Step 3 - The information passed on by a helper node to a replacement
of the failed node i, is clearly a function of the contents of the helper
node. This implies that
i
H(S[i−1] | W[i−1] ) = 0.
We will use this observation, coupled with Lemma 3 below, to show
that
H(Wi | W[i−1] ) ≤ H(Wi | S[i−1]
i
), (3.10)
thus completing the proof of Lemma 1.
Remark 3. While the proof given above is for the case of exact repair,
it extends in straightforward fashion to the case of functional repair. In
an RGC with functional repair, the contents of the nodes, as well as
the data transferred for node repair can change with time.
Let us assume that we are at time instant t in the functional-repair
setting. As in the case of ER, we restrict attention to a subset of (d + 1)
nodes that are numbered 1 through (d + 1). Let ti denote the last time
instant at which node i was repaired prior to time t. We assume without
loss of generality, that
With respect to the proof given above for the ER case, we now interpret
Wi , for 1 ≤ i ≤ k, as the contents of node i at time t. We interpret
i
S[i−1] , i = 1, 2, · · · , k as the data passed on by helper node j ∈ [i − 1]
to the replacement of the ith node at time ti , i.e., the time instant
which node i failed. Similarly, SA i denotes the helper information passed
For a given file size B, the storage overhead and normalized repair
dβ
bandwidth are given respectively by nα B and B . Thus for a fixed value
of file size B, block length n, and repair degree d, the parameter α
is indicative of the amount of storage overhead while β determines
the normalized repair bandwidth. We will say that an RGC having
parameters {(n, k, d), (α, β), B, Fq } is optimal if (a) the file-size bound
in (3.7) is met with equality and if further, (b) reducing either α or β
causes the bound to be violated4 .
4
The latter condition is inserted since at the extreme MSR case, one could have
B = αk and β very large while satisfying (3.7), while at the same time the inequality
α
could also be satisfied with β = (d−k+1) . At the other extreme MBR end, equality
k
could hold with B = (dk − 2
)β and α very large, while α = dβ would suffice for
equality to hold.
3.3. Storage-Repair-Bandwidth Tradeoff 577
0.33
0.32
MSR
0.31
0.3
0.29
0.28
0.27
0.26
0.25 MBR
0.24
0.19 0.2 0.21 0.22 0.23 0.24 0.25
i=1
For fixed (k, d), the locus of all pairs (ᾱ, dβ̄) that satisfy (3.13) with
equality will be shown in Section 6 to be a piece-wise linear curve as
can be seen in Fig. 3.2. For fixed value of block length n, this curve
represents a tradeoff between storage overhead nᾱ on the one hand,
and normalized repair bandwidth dβ̄ on the other. The network coding
approach to deriving the fundamental bound on file size (see Section 3.4)
tells us that for every set of parameters {(n, k, d), (α, β)} there exists an
RGC having file size B satisfying (3.7). However, network coding only
guarantees the existence of an RGC that is repaired using functional
repair. For this reason, the plot of the pairs (ᾱ, dβ̄) that satisfy (3.13)
with equality is referred to as the FR tradeoff.
The corresponding tradeoff under exact repair, called the ER tradeoff,
is harder to characterize and is discussed further in Sections 6 and 7.
578 Regenerating Codes
time, nodes will undergo failures and every failed node will be replaced
by a replacement node. Let us assume to begin with, that we are only
interested in the behavior of the RGC over a finite-but-large number
N ≫ n of node repairs. Moreover, we will assume that nodes are
repaired using functional repair. For simplicity, we assume that repair
is carried out instantaneously. Then, at any given time instant t, there
are n functioning nodes whose collective contents constitute an RGC
and a data collector should be able to connect to any subset of k nodes,
download all of the contents of these k nodes and use these to recover
the B message symbols, {ui ∈ Fq }B i=1 . Clearly, in all, there are at most
N nk distinct data collectors, each corresponding to a distinct choice of
d in
S
d-1
d-2
d-3
cut
Figure 3.3: The directed, capacitated graph associated to an RGC that over time,
undergoes a finite number N of node repairs. The label on each edge, indicates the
capacity of that edge. Here DC denotes the data collector.
and an ∞ capacity with all other edges. Each node can only store α
symbols over Fq . We incorporate this constraint by using a standard
graph-theory construct, in which a node is replaced by 2 nodes separated
by a directed edge (leading towards a data collector) of capacity α. We
have in this way, arrived at a graph (see Fig. 3.3) in which there is a
single source S and at most N nk sinks {Ti }.
3.4.2 Achievability
Network coding also employs the Combinatorial Nullstellensatz [5] to
show that when only a finite number of node failures and correspond-
ing regenerations take place, this bound is achievable, and moreover,
achievable using linear network coding, i.e., achievable using only linear
operations at each node in the network for a sufficiently large value q
of the finite field Fq . In a subsequent result [251], Wu used the specific
structure of the graph to show that even in the case when the number
of sinks is infinite, the upper bound in (3.7) continues to be achievable
using linear network coding.
582 Regenerating Codes
In this way, one can draw upon principles of network coding to char-
acterize the maximum file size of an RGC given parameters {k, d, α, β}
for the case of functional repair.
Regenerating codes
Storage-
MBR codes MSR codes repair-bandwidth Interior points Variants
tradeoff
Sub-packetization
lower bounds Fractional Secure Rack-aware
Near-optimal Cooperative
repetition RGCs RGCs
MDS codes RGCs
codes
Figure 3.4: Topics related to RGCs that are covered elsewhere in the monograph.
Table 3.1: Constructions for MBR codes, MSR codes and interior-point ER RGCs
that are presented in the monograph. All of the constructions appearing in the
table are explicit. (We only provide brief descriptions however, of the Small-d MSR,
Cascade and Moulin code constructions.)
Notes
584
4.1. Polygonal MBR Code 585
edges. The B = 9 symbols of the data file are encoded using a [10, 9, 2]
MDS code to produce ten code symbols. Each code symbol is assigned
to a distinct edge. Each node of the pentagon MBR code stores the
code symbols assigned to the edges incident on that node (see Fig. 4.1).
We will now verify that the example construction indeed satisfies both
data collection and RBT properties.
Data Collection: Any collection of k = 3 nodes contains nine distinct
code symbols of the [10, 9, 2] MDS code. This is sufficient to recover all
10 code symbols and in this way, the 9 message symbols that make up
the data file.
586 MBR Codes
1 2 3 4 5 6 7 8 9 P
2,5,6,P
P 6
1,4,9,P 1 1,3,6,7
5 2
4 3
9 7
3,5,8,9 8 2,4,7,8
Figure 4.1: An example RBT MBR code construction for the parameter set
(n = 5, k = 3, d = 4). The file size B = 9 here.
distinct code symbols from Csc . The MDS property of Csc now allows all
the B message symbols associated to the data file to be recovered. Note
that the requirement of an MDS code of block length N = n2 places a
where:
• Every (k × k) sub-matrix of G1 is non-singular
C ≜ |{z}
|{z} G |{z}
M .
n×d n×d d×d
If cTi denotes the ith row of the code matrix C, i ∈ {1, . . . , n}, the
contents of the ith node are then precisely the components of ci .
Data Collection: Consider any collection of k nodes indexed by
the subset K ⊆ {1, 2, . . . , n} of size |K| = k. Let [GK,1 GK,2 ] denote
the (k × d) sub-matrix of G = [G1 G2 ] obtained by selecting the rows
indexed by K, where GK,1 and GK,2 are the corresponding sub-matrices
of G1 and G2 respectively. Let CK denote the corresponding (k × d)
sub-matrix of C. Then we can write:
" #" # " #
GK,1 GK,2 S V CK,1 CK,2
C K = := ,
VT
| {z } | {z } | {z } | {z }
|{z} k×k k×(d−k) 0 k×k k×(d−k)
k×d
so that
During data recovery, both CK,1 and CK,2 are accessible. As any (k × k)
sub-matrix of G1 is non-singular by design, in particular, the sub-matrix
GK,1 is non-singular. This allows us to recover the matrix V from:
Notes
1. Fractional repetition codes: Fractional repetition codes [59] may
be regarded as generalizing the polygonal MBR construction. In
a fractional repetition code, the underlying scalar code symbols
are obtained by replicating an MDS code ρ ≥ 2 times. However,
unlike in the case of an MBR code, for the repair of each node,
only a specific set of d ≤ n − 1 helper nodes is guaranteed to be
able to help in node repair. For this reason, fractional repetition
codes are said to have table-based repair. Fractional repetition
codes are discussed in greater detail in Section 9.2.
590 MBR Codes
2. Binary MBR codes: There exist MBR codes over the binary field F2
with β = 1 if the parameters {n, k, d} satisfy any of the following
conditions (i) k = d − 1 = n − 2 (ii) k = d = n − 2 and (iii)
k = d − 1 = n − 3. Details can be found in [132], [179].
Among the class of RGCs, MSR codes have received the greatest at-
tention for reasons that include the fact that MSR codes are MDS
codes, have storage overhead that can be made as small as desired,
and have been challenging to construct. An MSR code with parame-
ters (n, k, d, α) has file size B = kα and repair bandwidth β = d−k+1 α
.
MSR codes can also be viewed as vector MDS codes that incur the
least-possible repair-bandwidth for the repair of a failed node.
While only β symbols are passed on to the replacement of a failed
node by each of the d helper nodes, the number of symbols accessed
by the helper node in order to generate these β symbols could be
significantly larger than β. There is interest in practice, in the subclass
of MSR codes having the property that the number of scalar symbols
accessed at each helper node is also equal to the number β of symbols
that are passed on for node repair. Such MSR codes are termed as
optimal-access MSR codes.
An early construction of an MSR code with parameters (n, k, d) and
(α, β) satisfying d = (n − 1) ≥ 2k − 1, β = 1, can be found in [227] and is
briefly discussed in the notes subsection. A detailed description of three
constructions of an MSR code is presented in the present section, along
591
592 MSR Codes
Table 5.1: Explicit MSR code constructions described in this section. Here r = (n−k)
and s = (d − k + 1). The ∗ in the last row is to indicate that Small-d MSR codes
have lowest possible sub-packetization level under the assumption of helper-set-
independent repair, see Section 5.4.
ΛG
J = G |{z}
|{z} ,
n×α n×α
|{z}
n×d
1 θ1 . . . θ1α−1 θ1α
1 θ2 . . . θ2
α−1
θ2α
G = and Λ =
.. .
.. .
.. .
.. ..
.
. .
1 θn . . . θnα−1 θnα
As in the case of the PM-MBR code, the ith node stores the α
symbols contained in the ith row ci of C. Let the ith row of G be
denoted by g Ti and let λi be the ith diagonal element of Λ.
Node Repair: Suppose node f has failed. The f th node stores the
f th row of C given by
" #
h i S1
cTf = g Tf λf g Tf = g Tf S1 + λf g Tf S2 .
S2
Our goal in node repair, is to recreate this vector using helper data.
Let D ⊆ {1, 2, . . . , n} \ {f }, with |D| = d, be the indices of the d
helper nodes. Let JD be the sub-matrix of J obtained by selecting the
d = 2α rows of J whose indices lie in D. Let CD be the sub-matrix of
C containing rows with indices lying in D. Then
" #
S1
C D = JD ,
|{z} |{z} S2
d×α d×d
and the symbols of CD are precisely the contents of the d helper nodes.
5.1. Product-Matrix MSR Code 595
eT P = eT GK S1 GTK = 0T .
In each of the k equations here there is only one unknown namely, the
diagonal element pii . In this way, the diagonal elements of P can be
recovered. The diagonal entries of Q can be recovered in identical fashion.
Given P and Q, the matrices S1 , S2 can be recovered in straightforward
fashion from P = GK S1 GTK and Q = GK S2 GTK . This completes the
data-collection process.
and the first k code symbols in any codeword of C are message symbols.
Let C1 ⊆ C be the subcode of C corresponding to code symbol c1 = 0,
5.1. Product-Matrix MSR Code 597
code that is linear as an RGC (see Section 3). Thus, the nα symbols
stored across the n storage nodes are linear functions of the B message
symbols.
The size of the data file equals kα which is precisely the number
of Fq symbols contained in any set of k nodes. Clearly, by making an
appropriate linear transformation of code symbols, we may assume that
the contents of the first k nodes {ci }ki=1 are precisely the B message
symbols. Consider the subcode C 1 of C that corresponds to the contents
of the first s, 1 ≤ s ≤ (k − 1) nodes being equal to zero. It can be
verified that if one deletes or removes these nodes, one will be left with
an MSR code having parameters:
(n − s, k − s, d − s), (α, β), B = α(k − s) .
Shortening with respect to s nodes then converts this into an MSR code
having parameters
(n, k, d = 2(k − 1) + s), (α = k + s − 1, β = 1), B = αk .
598 MSR Codes
Ai =
X
λi,ai ea eTa ,
a∈[α]
and the vectors ea ∈ Fαq are unit vectors such that the a-th element
of ea is 1 and all other elements are zero. Thus, the matrix ea eTa is an
(α × α) diagonal matrix having a 1 in the a-th row and a-th column
and zeros everywhere else. The elements {λi,u | i ∈ [n], u ∈ [0, s − 1]}
are chosen to be distinct and hence form a subset of Fq of size ≥ ns.
Thus the ith matrix Ai is an (α×α) matrix, whose diagonal elements
are indexed by the variable a, where a takes on values in the set [sn ] = [α].
The ‘a’th diagonal element equals λi,ai , and thus is a function of i and
the ith component ai of a.
Let c = (cT1 , · · · , cTn )T be a codeword in the Diagonal MSR code,
where ci = (ci (1), · · · , ci (α))T ∈ Fαq is stored in node i ∈ [n]. Then,
Hc = 0,
n
Aji ci = 0 for all j ∈ [0, r − 1],
X
⇔
i=1
n X
λji,ai ea eTa ci = 0 for all j ∈ [0, r − 1],
X
⇔
i=1 a∈[α]
n
j
λi,ai ci (a) = 0 for all j ∈ [0, r − 1], a ∈ [α]. (5.2)
X
⇔
i=1
u=0
λji0 ,u ci0 (a(i0 , u)) = − λji,ai ci (a(i0 , u)), all u ∈ [0, s − 1].
X
i∈[n]\{i0 }
i∈[n]\{i0 }
5.2. Diagonal-Matrix-Based MSR Code 601
Spelling out these p-c equations for all j ∈ [0, r − 1] in matrix form, we
obtain:
1 1 1 ci0 (a(i0 , 0))
···
i0 ,0 λi0 ,1 · · · λi0 ,s−1 ci0 (a(i0 , 1))
λ
. .. .. ..
. ..
. . . .
.
r−1 r−1 r−1
λi0 ,0 λi0 ,1 · · · λi0 ,s−1 ci0 (a(i0 , s − 1))
| {z }
Vi0
h1,i0 (a)
1 ··· 1 1 ··· 1
..
.
λ
1,a1 · · · λi0 −1,ai0 −1 λi0 +1,ai0 +1 · · · λn,an hi0 −1,i0 (a)
= −
.. .. .. ... (5.3)
.. .. .
. . . . . i0 +1,i0 (a)
h
..
λr−1 r−1
λr−1 r−1
1,a1 · · · λi0 −1,ai · · · λn,an .
0 −1
i0 +1,ai 0 +1
hn,i0 (a)
| {z }
Li0
s s−1
where f (x) = fi xi = (x − λi0 ,u ). Then we have:
P Q
i=0 u=0
Ni0 Li0
f (λ1,a1 ) ··· f (λi0 −1,ai0 −1 ) f (λi0 +1,ai0 +1 ) ··· f (λn,an )
λ1,a1 f (λ1,a1 )· · · λi0 −1,ai0 −1 f (λi0 −1,ai0 −1 ) λi0 +1,ai0 +1 f (λi0 +1,ai0 +1 ) · · · λn,an f (λn,an )
= .. .. .. .. .. ..
. . . . . .
λ1,a1 f (λ1,a1 ) · · · λi0 −1,ai −1 f (λi0 −1,ai0 −1 ) λi0 +1,ai +1 f (λi0 +1,ai0 +1 ) · · · λn,an f (λn,an )
r−s−1 r−s−1 r−s−1 r−s−1
0 0
1 ··· 1 1 ··· 1
λ1,a1 · · · λi0 −1,ai0 −1 λi0 +1,ai0 +1 ··· λn,an
= .. .. .. ..
.. ..
. . . . . .
λr−s−1
1,a1 · · · λr−s−1
i0 −1,ai λr−s−1
i0 +1,ai · · · λr−s−1
n,an
0 −1 0 +1
f (λ1,a1 )
..
.
f (λi0 −1,ai0 −1 )
.
×
f (λi0 +1,ai0 +1 )
..
.
f (n, an )
h1,i0 (a)
..
.
hi0 −1,i0 (a)
Ni0 Li0
= 0.
hi0 +1,i0 (a)
..
.
hn,i0 (a)
By the GRS property we can recover all the n − 1 symbols in {hi,i0 (a) |
i ∈ [n]\{i0 }} from any d-symbol subset. This implies that the symbols in
5.3. Coupled-Layer MSR Code 603
the RHS of the equation (5.3) are known. Therefore by the invertibility
of the sub-matrix of Vi0 comprising of the first s rows of Vi0 , we can
recover the failed node symbols
By varying a ∈ [sn ] such that ai0 = 0, we can recover all the failed node
symbols:
{ci0 (a(i0 , u)) | u ∈ [0, s − 1], a ∈ [sn ], ai0 = 0} = {ci0 (a) | a ∈ [sn ]}.
Remark 4. Diagonal MSR codes turn out to also satisfy the optimal-
update property where, to update a single symbol out of the α symbols
in a systematic node, one is required to update only (n − k) parity
symbols.
An extension of the Diagonal MSR code that has the (h, d) optimal-
repair property for any h ∈ [2, n − k], d ∈ [k, n − h] appears in [255]. By
(h, d) optimal-repair property is meant, the recovery of h erasures by
downloading
αh
d−k+h
symbols each from d helper nodes, which is the minimal repair band-
width possible for MDS codes [31]. The sub-packetization level of these
extended codes is of the form sn where s = lcm(2, 3, · · · , n − k). The h
node repair discussed here assumes a centralized repair setting whereas
an alternate, cooperative repair approach is discussed in Section 9.3.
(a) An [n = rt, k = r(t − 1)] scalar MDS code CMDS is first selected,
(e) The symbols within the uncoupled data cube are transformed
using a simple, linear pairwise-symbol transformation that replaces
selected pairs of symbols over Fq contained within the uncoupled
data cube, by their transformed versions. The data cube obtained
via this transformation is called the coupled data cube.
Let
Figure 5.1: An example uncoupled data cube for the case (r = 2, t = 3). As can be
seen, the location of the red dots within a plane, provides a pictorial representation
of the index z associated to the plane.
column. The symbols in the uncoupled data cube then satisfy the
equations:
(x,y)∈Zr ×Zt
PFT
PRT
Figure 5.2: Paired symbols within either the uncoupled or coupled data cube are
depicted using yellow rectangles connected by dotted lines. The pairwise forward
transform (PFT) and pairwise reverse transform (PRT) are used to transform symbol-
pairs between the two data cubes.
Next, the symbols in the uncoupled data cube B(·) are paired. The
symbol B(x, y, z) with zy =
̸ x is paired with the symbol B(zy , y, z(y, x))
where, we use the notation z(y, x) to denote the vector in which the
y-th component of z is replaced by x:
denote the nα symbols of a second data cube, termed the coupled data
cube. The contents of the coupled data cube will shortly be related to
the contents of the uncoupled data cube, as depicted in Fig. 5.2. There is
5.3. Coupled-Layer MSR Code 607
an analogous pairing of symbols within the coupled data cube. Thus the
symbol A(x, y, z) with zy =
̸ x is paired with the symbol A(zy , y, z(y, x))
and the symbols A(x, y, z) with zy = x are paired with themselves.
Let u be a nonzero element in the finite field Fq , satisfying u2 ̸= 1.
The symbols of the coupled data cube are derived from those of the
uncoupled data cube via the following transformation:
" # " #−1 " #
A(x, y, z) 1 u B(x, y, z)
= ,
A(zy , y, z(y, x)) u 1 B(zy , y, z(y, x))
for zy ̸= x, (5.6)
A(x, y, z) = B(x, y, z), for zy = x.
{A(x, y, z), A(zy , y, z(y, x)), B(x, y, z), B(zy , y, z(y, x))}
can be computed from knowledge of any two symbols from the 4-symbol
set, i.e., the four symbols form a [4, 2] MDS code.
608 MSR Codes
{A(x, y, z) | z ∈ Ztr },
{A(x, y, z) | z ∈ P(x0 , y0 )}
x̸=x0 , x∈Zr
5.3. Coupled-Layer MSR Code 609
= {A(x0 , y0 , z) | z ∈ Ztr },
0 1 2 3 4
0
Serious
1
Serious
x
2
Figure 5.3: Shown above is plane with index z = (2, 1, 0, 1, 2) for the case (r =
4, t = 5) where the four black circles indicate the four erasures. The intersection
score of this plane is 2 as it has two serious erasures, corresponding to the circles
enclosing red dots.
610 MSR Codes
θℓ,(x,y) B(x, y, z) = κ∗ ,
X
(x,y)∈E
Now for the case when (x, y) ∈ / E and (zy , y) ∈ E, the plane z(y, x) has
IS= i−1. Therefore we would have recovered the symbol B(zy , y, z(y, x))
in round i − 1. We know symbol A(x, y, z) as it is unerased. Using the
symbols A(x, y, z) and B(zy , y, z(y, x)) and the 4-symbol MDS property
noted above, the symbol B(x, y, z) can be computed. In this way, we
know B(x, y, z) for any (x, y) ∈ / E. As a result, equation (5.5) can be
reduced to the form
θℓ,(x,y) B(x, y, z) = κ∗ ,
X
(x,y)∈E
5.4. Small-d MSR Codes 611
n = st, k , d ∈ {k + 1, k + 2, k + 3}, α = st , s = (d − k + 1) ,
612 MSR Codes
with s ∈ {2, 3, 4} and t ≥ 2, and field size q linear in block length, i.e.,
q = O(n).
These codes have two additional attributes. Consider a setting in a
linear RGC where node f has failed and we are interested in the data
transferred by helper node h to the failed node f , and where the indices
of the remaining (d − 1) helper nodes are specified by a set D ⊂ [n]
of size (d − 1). Since the RGC is linear, the data transferred can be
represented in the form
(D)
Shf ch
(D)
where Shf is a (β × α) matrix and ch represents the (α × 1) vector
corresponding to the data stored in node h. It turns out in the case of the
(D)
Small-d MSR code construction, that the matrix Shf appearing above,
is a function of the failed node f alone, and so we can simply write Sf
(D)
in place of Shf . This property is termed the constant-repair-matrix
property. We note as an aside, that since the code is an optimal-access
MSR code, the entries of each matrix Sf are either 0 or 1 with each
row of Sf containing a single 1.
It turns out that not only do Small-d MSR code possess the constant-
repair-matrix property, they also have the smallest possible sub-packeti-
zation level α possible, of any linear, optimal-access MSR code having
(D)
the property that the repair matrix Shf is independent of the remaining
(D)
helper nodes in D, so that we can write Shf = Shf . We term this latter
property with respect to repair matrices, the helper-set-independence
property. Clearly, the constant-repair-matrix property implies the helper-
set-independence property.
By shortening a Small-d MSR code, one can construct additional
optimal-access (n, k) MSR codes that also have constant repair matrices.
These also have minimum sub-packetization level possible of a linear,
optimal-access MSR code having the helper-set-independence property,
provided n ̸= 1 (mod s) where s = d − k + 1. Details can be found in
[239].
Open Problem 2. Construct an optimal-access MSR code having least-
possible sub-packetization level for the case when d = (n − 1) and n = 1
mod r.
5.4. Small-d MSR Codes 613
Notes
3. Systematic MSR codes: Vector MDS codes for which the optimal
repair property holds only for the systematic nodes are referred to
as systematic MSR codes. An early construction of a systematic
MSR code with β = 1 for the case d = (n − 1) ≥ (2k − 1) can
be found in [212]. In a subsequent paper [227] that builds upon
[212], the authors provide a construction for MDS codes that can
repair both systematic as well as parity nodes under the restriction
d ≥ 2k − 1, n ≥ 2k, under the assumption that all the un-erased
systematic nodes participate in node repair. Thus for the case
d = (n − 1) ≥ 2k − 1, the construction in [227] yields an MSR
code. Other early constructions of systematic MSR codes with
d = n − 1 can be found in [32], [229]. A general construction,
valid for all (n, k, d) parameters sets, first appeared in [76]. A
k−1
lower bound α ≥ r r , where r = n − k, on the sub-packetization
level of linear, systematic MSR codes with d = n − 1 having the
optimal-access property is derived in [233]. It is shown in [11]
k−1
that this can extended to the slightly tighter bound α ≥ r⌈ r ⌉ .
In [2], [33], [249], non-explicit, optimal-access, linear systematic
MSR code constructions with d = n − 1 having α matching the
k−1
lower bound α ≥ r⌈ r ⌉ for k ̸= 1 (mod r) are presented. Explicit
constructions of optimal-access, linear systematic MSR codes with
k−1
d = (n − 1) and α = r⌈ r ⌉ for k = ̸ 1 (mod r) are provided for
(n − k) = 2, 3, in [186]. Optimal-access, linear systematic MSR
k−1
codes with d = n−1 having optimal sub-packetization level r⌈ r ⌉
for k ̸= 1 (mod r) can be constructed over a field of size q ≥ n
using the transformation presented in [137].
i=0
The aim here is to show that the locus of the set of pairs (ᾱ, dβ̄) with
ᾱ ≥ 0, dβ̄ ≥ 0, satisfying (6.1), is a piecewise-linear curve, with k corner
points. We begin by partitioning the first quadrant in the (x = ᾱ, y = dβ̄)
plane into the (k + 1) pairwise disjoint regions {Rℓ | ℓ = 0, 1, · · · , k}
identified in Table 6.1.
616
6.1. Piecewise Linear Nature of FR Tradeoff 617
Table 6.1: Partitioning the first quadrant in the (x = ᾱ, y = dβ̄) plane into the
(k + 1) pairwise disjoint regions {Rℓ | ℓ = 0, 1, · · · , k}. The storage-repair-bandwidth
tradeoff under functional repair, is a piecewise-linear curve, represented by a straight
line in each of the (k + 1) regions {Rℓ }.
1.4
1.2
1 (MSR)
0.8
0.6
0.4
(MBR)
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6
Figure 6.1: Illustrating the piecewise-linear nature (in red) of the normalized FR
tradeoff for (k = 4, d = 4). The {Pi } denote the k = 4 corner points with P0 , P3
representing the MBR and MSR points respectively.
i=ℓ
k ᾱ = 1,
i=0 i=1
i.e.,
d
dβ̄ = k
,
dk − 2
ᾱ = dβ̄.
This corner point P0 = d
, d corresponds to the nor-
dk−(k2) dk−(k2)
malized values (ᾱ, dβ̄) of an MBR code.
i=ℓ i=ℓ+1
6.2. ER Tradeoff 619
i.e.,
k−1
ℓᾱ + (d − i)β̄ = 1,
X
i=ℓ
ᾱ = (d − ℓ)β̄.
6.2 ER Tradeoff
Our aim here is to characterize the normalized pairs (ᾱ, dβ̄) for which
it is possible to construct an ER RGC having parameters (n, k, d) over
some finite field Fq . We begin by noting that for any parameter set
(n, k, d), there exist constructions of ER MSR and ER MBR codes.
In the case of MBR codes this is apparent from the product-matrix
construction of an MBR code. In the case of an MSR code, this is clear
from the Diagonal MSR construction appearing in Section 5.2. Thus
at the points on the FR tradeoff corresponding to the MSR and MBR
points, there exist ER RGCs having the same normalized parameters
as FR RGCs. Since the FR tradeoff represents an outer bound to the
ER tradeoff1 , this tells us that the ER and FR tradeoffs share the MSR
and MBR points in common.
With this in mind, we define the ER tradeoff for fixed (n, k, d)
as the locus of all normalized pairs (ᾱ, dβ̄) that meet the following
requirements:
1
Meaning that for the same parameter set {(n, k, d), (α, β)}, the file size B under
ER is no larger than the file size under FR.
620 Storage-Repair-Bandwidth Tradeoff
• 1
≤ ᾱ ≤ d
,
k dk−(k2)
on the FR tradeoff do not exist, except possibly for a small region in the
(ᾱ, dβ̄) plane corresponding to the range given below for the parameter
ᾱ:
d−k+1
(d − k + 1)β̄ < ᾱ ≤ (d − k + 2) − β̄.
d−k+2
with equality for α of the form α = (d − p)β for some integer p, with
1 ≤ p ≤ (k − 2). We refer the reader to [211] for the complete proof.
Proof: We follow the derivation in [211]. Let C be an ER RGC
having parameter set {(n, k, d), (α, β), B, Fq } with α = (d − p)β, 1 ≤
p ≤ (k − 2), that satisfies the cut-set bound in (6.2) with equality. We
will show that this leads to a contradiction. We restrict attention in
the proof, to a subset D of (d + 1) nodes that by themselves, form a
regenerating code C D having parameter set {(d + 1, k, d), (α, β), B} that
clearly, also achieves the cut-set bound with equality. We continue to
adopt the notation WA , SA B etc., that was introduced in Section 3.2. Let
B = H(WA ) ≤ H(WD )
n o
= I WD ; SD\ℓ
ℓ
ℓ∈D
n o
ℓ
≤ H SD\ℓ
ℓ∈D
n o
= H D\{m}
Sm
m∈D
X
D\{m}
≤ H Sm
m∈D
=
X
β
m∈D
= (d + 1)β.
and this appears as Property 3 in [211]. For any three distinct nodes
{ℓ1 , ℓ2 , m} ⊆ D, we have :
ℓ1
H Sm ℓ2
| Sm ) = 0,
and this appears as part of Property 5 in [211] (after setting the pa-
rameter θ appearing in [211] to 0 since our focus here is only on corner
points that lie within the interior).
On the other hand, if an ER RGC attains the cut-set upper bound in
(6.2) with α = (d−p)β, for p an integer lying in the range 1 ≤ p ≤ (k−2),
we must have
k−1
B = min{α, (d − i)β}
X
i=0
k−1
= min{(d − p)β, (d − i)β}
X
i=0
k−1
= 2(d − p)β + min{(d − p)β, (d − i)β}
X
i=2
≥ 2(d − p)β + (k − 2)β
≥ (d + 2)β,
leading to a contradiction. □
1.0 ER Tradeoff
FR Tradeoff
0.9
0.8
d
0.7
0.6
0.5
0.325 0.350 0.375 0.400 0.425 0.450 0.475 0.500
The (n, d, d) Case For the case k = d, the following outer bound on
the maximum file size of a linear RGC,
d+1 d
B ≤ ℓα + β , (6.3)
ℓ+2 ℓ+1
where ℓ = ⌊dβ/α⌋ ∈ {0, 1, . . . , d}, was derived in [173] by carefully
analyzing the p-c matrices of ER RGCs. This bound establishes a
piecewise linear outer bound on the ER tradeoff as it applies to linear
RGCs, that is tighter than the bound provided by the FR tradeoff. The
same bound was independently derived in [60] by solving an optimization
problem involving the file-size of a linear ER RGC.
624 Storage-Repair-Bandwidth Tradeoff
In [238], the outer bound in (6.3) is shown to hold for the specific
parameter set (n = 5, k = 4, d = 4) even in the case of a general ER
RGC, by adopting a computational approach to handling information-
theoretic inequalities. By a general ER RGC, we mean an ER RGC
that is not necessarily linear. A subsequent outer bound, appearing in
[56] and that also applies to a general ER RGC, coincides with that
in [60], [173] when specialized to the linear setting and to the case of
parameter sets of the form (n, d, d).
The class of Determinant codes [61], [63] discussed in Section 7.1,
turn out to achieve the outer bound in (6.3). This both establishes the
tightness of the bound in (6.3) as it applies to linear ER RGCs, as well
as the optimality of Determinant codes when one restricts attention to
linear ER RGCs.
Let (ᾱMSR , dβ̄ MSR ) and (ᾱMBR , dβ̄ MBR ) denote the (ᾱ, dβ̄) values at
the MSR and MBR points respectively, given by:
1 d
(ᾱMSR , dβ̄ MSR ) = , ,
k k(d − k + 1)
d d
(ᾱMBR , dβ̄ MBR ) = , .
dk − k2 dk − k2
626
7.1. Determinant Code 627
Table 7.1: A listing of some of the codes in the literature that attain, or are
conjectured to attain, some portion in the interior of the storage-repair-bandwidth
tradeoff under exact repair.
Signed Determinant code due to Elyasi and Mohajer [61], [63], [64]
for parameters (n, d, d), i.e., for the case k = d. The code is called the
Signed Determinant code because of the sign factor introduced by the
components of σ. It turns out that if one is interested solely in the case
k = d, i.e., the (n, d, d) case, one can set the vector σ = 0, i.e., σ(j) = 0,
all j ∈ [d]. As noted above, setting σ = 0 yields the Determinant code
construction appearing in [61], [63]. While both papers [61], [63] describe
the same Determinant code, the repair process described in [63] has
the advantage that the helper data supplied by a helper node does not
depend upon the identity of the remaining (d − 1) helper nodes1 . We
have retained σ in the expressions below, as the vector σ is needed when
the Signed Determinant code is used as a building block to construct
Cascade code [64]. Our description of the Signed Determinant code
below, follows the description of the code given in [64]. The repair
process of the Signed Determinant code described below is helper-set
independent.
The Signed Determinant construction is parameterized by an integer
variable m, with 1 < m < d. The associated (α, β, B) parameters are
then given by:
!
d
αm = ,
m
d−1
!
βm = , (7.1)
m−1
d+1
! ! !
d d
Bm = m +m = m . (7.2)
m m+1 m+1
Let
V = {vAj | A ⊂ [d], |A| = m, j ∈ A},
W = {wSj | S ⊆ [d], |S| = m + 1, j ∈ S}
be two sets of symbols that take on values in a finite-field F. Let
W ′ = {wSj | wSj ∈ W, j ̸= max S}
1
This is the helper-set-independent property described in Section 5.4. It turns
out that in the case of the Determinant and Signed Determinant codes that the
repair process is linear and involves constant repair matrices. These terms are defined
in Section 5.4.
7.1. Determinant Code 629
j∈S
where τS (j) is the position of j, given that the elements of S are listed
in ascending order. In other words, τS (j) = |{i ∈ S | i ≤ j}| for any
j ∈ S. The symbols in V ∪ W are used to populate two matrices V, W
having respective size ( m d d
× d) and ( m+1 × d). The two matrices
will respectively be referred to as the V -array and the W -array. The
rows of the V -array are indexed by m-subsets of [d] and the columns by
1, 2, . . . d. The symbol vAj ∈ V occupies a cell in the V -array, determined
by row A and column j ∈ A. In similar fashion, the rows of the W -array
are indexed by the (m + 1)-subsets of [d] and the columns by 1, 2, . . . d.
The symbol wSj ∈ W occupies a cell in the W -array, determined by the
row S and the column j ∈ S. Note that each row in the V -array contains
m symbols, the remaining (d − m) cells in each row are empty. Similarly
each row in the W -array contains (m + 1) symbols, the remaining
(d − m − 1) cells in each row are empty.
Next, we will construct a matrix which we will refer to as the data
matrix D (at times, we will also refer to D as the D-array) having m d
rows and d columns. Again, the rows are indexed by m-subsets of [d]
and the columns by 1, 2, . . . , d. The dAj th entry of D is given by:
(
(−1)σ(j) · vAj , if j ∈ A
dAj = .
(−1)σ(j) · wA∪{j},j if j ∈
/A
C = DΦ
1 2 3 4
1 v11 v12 1 2 3 4
2 v21 v22 1 w11 w12 w13
3 v31 v32 2 w21 w22 w23
4 v41 v42 3 w31 w32 w33
5 v51 v52 4 w41 w42 w43
6 v61 v62
1 2 3 4
1 v11 v12 w13 w23
2 v21 w12 v22 w33
3 v31 w22 w32 v32
4 w11 v41 v42 w43
5 w21 v51 w42 v52
6 w31 w41 v61 v62
Figure 7.1: The V , W and D matrices used in the construction of (n, 4, 4) Signed
Determinant code with m = 2 and σ = (0, 0, 0, 0). For simplicity, in the figure, the
m-subsets of {1, 2, 3, 4} have been ordered in lexicographically ascending order and
indexed from 1 to 6. Similarly, the (m + 1)-subsets of {1, 2, 3, 4} have also been
ordered lexicographically and indexed from 1 to 4.
Z0 = R × D × [ϕ2 ϕ3 · · · ϕd+1 ].
7.1. Determinant Code 631
Z = RD.
Our goal is to identify a matrix R such that we can recover the contents
c1 = Dϕ1 of the failed node given the product RD. Each row of c1 is
indexed by an m-subset A of [d], and we write cA1 to denote the entry
of c1 in row A.
d
The matrix R is of size m−1 × md
. The entries of R are completely
determined from the symbols {ϕi1 | 1 ≤ i ≤ d} making up ϕ1 . Thus
R is solely a function of the index of the failed node, index 1 in the
present case. The rows and columns of R are respectively indexed by
(m − 1)-subsets and m-subsets of [d]. If rP A denotes the entry in the
P -th row and A-th column of R, then
(
(−1)σ(y)+τA (y) · ϕy1 , if y exists such that P ∪ {y} = A,
rP A =
0, otherwise.
i∈A
We then have
i∈A
= (−1)σ(i)+τA (i)
X X
rA∼i ,L dLi
i∈A L⊂[d],|L|=m
i∈A
+ (−1)σ(i)+τA (i)
X X
rA∼i ,A∼i,y dA∼i,y ,i
i∈A y∈[d]\A
632 Interior-Point ER Codes
=
X
ϕi1 dAi
i∈A
σ(y)+τA∼i,y (y)
+ (−1)σ(i)+τA (i) (−1)
X X
ϕy1 dA∼i,y ,i
i∈A y∈[d]\A
=
X
ϕi1 dAi
i∈A
σ(i)+σ(y)+τA (i)+τA∼i,y (y)
+ (−1) (−1)σ(i) wAy ,i . (7.4)
X X
ϕy1
y∈[d]\A i∈A
i∈A
=
X
ϕi1 dAi
i∈A
y∈[d]\A i∈A
" #
= ϕi1 dAi + (−1) σ(y)+τAy (y)
(−1)
τAy (i)
X X X
ϕy1 − wAy ,i
i∈A y∈[d]\A i∈A
i∈A y∈[d]\A
In this way, we are able to recover all the contents {cA1 A ⊂ [n], |A| = m}
of node 1. As noted earlier, the matrix R can be shown to have rank
at most m−1d−1
[64] and thus the repair bandwidth is no larger than
d−1
βm = m−1 .
7.2. Cascade Code 633
In [64], the authors introduce a code termed as the Cascade code, that is
constructed using multiple Signed Determinant codes as building blocks.
The resulting Cascade code, as well as the Moulin code described below
in Section 7.3, both have the best-known storage-repair-bandwidth
tradeoff offered by an ER RGC, operating at an interior point. The
parameters of a Cascade code for given (n, k, d) are given by:
µ !
k
α(µ) = (d − k)
X
µ−m
,
m=0
m
µ
k−1
!
β(µ) = (d − k)
X
µ−m
,
m=0
m−1
µ ! !
k k
B(µ) = (7.5)
X
µ−m
k(d − k) − ,
m=0
m µ+1
0.55
0.5 FR tradeoff
Cascade-Moulin
0.45
0.4
0.35
0.3
0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34
The Moulin2 code is a linear ER RGC due to Duursma et al. [54] that
is described in terms of a multilinear algebra framework. In the Moulin-
code framework, each codeword is associated to a linear functional
acting on a parent vector space, i.e., a linear transformation from the
parent vector space to its underlying field of scalars. The symbols stored
in a node are obtained by evaluating the linear functional on elements of
2
Name given by the authors of [54] who indicate that their choice of name was
inspired by the words cascade (waterfall or moulin) and multilinear algebra.
7.3. Moulin Code 635
a subspace of the parent vector space. The repair data transferred from
a helper node are derived by evaluating the linear functional on elements
of a subspace properly contained within the subspace associated with
the helper node. We limit ourselves in this subsection to providing a brief
description of the Moulin-code construction, to convey a sense of the
multilinear algebra framework on which the construction is based, and
refer the reader to [54] for the more complete mathematical description.
Since the Moulin code and Cascade code have the same values of
(α, β, B) for a given (n, k, d), they offer the same performance. (We
identify auxiliary parameter s in the case of the Moulin code with the
parameter µ + 1 in the case of the Cascade code). Thus, both have
the same sub-packetization level. The Moulin code also has a linear
field-size requirement and even here, repair data passed on by a specific
helper node to a failed node, does not depend upon the identity of the
remaining (d − 1) helper nodes.
x⊗y = bj yj = ai bj (xi ⊗ yj ),
X X X
ai xi ⊗
i j i,j
where the sum is a formal sum and where wj1 ⊗ wj2 ⊗ · · · ⊗ wjp may be
regarded as an unbreakable expression. We set T 0 W = F, T 1 W = W .
636 Interior-Point ER Codes
where once again, the sum is a formal sum and wj1 ∧ wj2 ∧ · · · ∧ wjq
is to be regarded as an unbreakable expression. Here as well, we set
Λ0 W = F, Λ1 W = W . Clearly dim(T p W ) = k p and dim(Λq W ) = kq .
For any vector space S over F, we define the dual space S ∗ to be the
space of all functionals from S to F. As is well-known, in the finite-
dimensional case, S is isomorphic to S ∗ .
k−1
!
β = (d − k)p
X
,
p≥0,q≥0,
q
p+q+2=s
! !
k k
B = (d − k)p (7.6)
X X
d(d − k)p − ,
p≥0,q≥0,
q p,q:p≥0,q≥0,
q
p+q+1=s p+q=s
and where the field size |F| satisfies |F| ≥ n. In the description above, s
is an auxiliary parameter of the Moulin-code construction. Thus one
sees from (7.5) and (7.6), after setting s − 1 = µ, that the Moulin code
and Cascade codes share identical parameters.
The data file or equivalently, codeword being stored, is identified
with a linear functional ϕ acting on the vector space M given by
M = T p V ⊗ U ⊗ Λq W.
M
p≥0,q≥0
p+q+1=s
7.3. Moulin Code 637
M∗ = T p V ⊗ U ⊗ Λq W
M
p≥0,q≥0
p+q+1=s
= (T p V ⊗ U ⊗ Λq W )∗ .
M
p≥0,q≥0
p+q+1=s
M(n,k,k) = W ⊗ Λq W.
M
0≤q≤(s−1)
It is explained in [54] how the Moulin code construction for the general
(n, k, d) case, can be viewed as being made up layers of (n, k, k) Moulin
codes.
File Size Computation To estimate the file size B, we follow [54] and
introduce the following terminology:
V -spaces: {T p V ⊗ Λq W | p ≥ 0, q ≥ 0, p + q = s},
U -spaces: {T p V ⊗ U ⊗ Λq W | p ≥ 0, q ≥ 0, p + q + 1 = s},
W -spaces: {T p V ⊗ W ⊗ Λq W | p ≥ 0, q ≥ 0, p + q + 1 = s}.
M∗ = (T p V ⊗ U ⊗ Λq W )∗ .
M
p≥0,q≥0
p+q+1=s
∇(ν ⊗ ω) = ν ⊗ ω,
∇(ν ⊗ ω) = ∇(ν ⊗ ω1 ∧ w1 )
= ∇(ν ⊗ ω1 ) ∧ w1 + (−1)q−1 ν ⊗ w1 ⊗ ω1 .
We identify the column space of A with the vector space U , both having
dimension d over F. Let {u1 , i = 1, 2, · · · , n}, be the elements in U
associated with the columns of A. In the Moulin-code construction, the
ith node stores
(T p V ⊗ ui ⊗ Λq W )},
M
{ϕ(v) | v ∈
p≥0,q≥0
p+q+1=s
where ϕ is the functional associated to the data file stored. The number
of symbols stored is thus given by
!
k
α = (d − k) (7.11)
X
p
.
p≥0,q≥0
q
p+q+1=s
k−1
!
β = (d − k)p (7.12)
X
.
p≥0,q≥0,
q
p+q+2=s
Notes
As has been the case in much of this monograph, we will restrict our
attention in this section to linear RGCs, where linearity is as defined in
Section 3. We will present two lower bounds1 on the sub-packetization
level α of an MSR code having parameters
(n, k, d = (n − 1)), (α = (n − k)β, β), B = αk, Fq .
The first bound [11] is applicable to MSR codes possessing the optimal-
access property and builds on the derivation of a prior bound, applicable
in the case of systematic-node repair and appearing in [233]. The second
bound [6] applies to MSR codes in general. More recently, in independent
work, the results in [6] have been improved upon in [7] and [9]. This
improved bound is briefly discussed in the notes subsection. We set
r = (n − k) throughout the section.
1
A brief overview of known bounds on sub-packetization for the case d < (n − 1)
is given in the notes subsection.
641
642 Lower Bounds on Sub-Packetization Level of MSR Codes
Interference Alignment
In the repair of an MSR code, a phenomenon called interference align-
ment takes place. While the explanation of the phenomenon presented
below is for the case d = (n − 1) that is the focus of the present section,
the principle extends to the case d < (n − 1).
Consider the repair of a systematic node ui . The helper information
passed on by the xth parity node px is a collection of β symbols, each
of which is a linear combination of the kα message symbols {cjm |
1 ≤ j ≤ k, 1 ≤ m ≤ α} making up the data file. Of this, only the
portion of this information that is a linear combination of the α symbols
{cim | 1 ≤ m ≤ α} contributes directly to the reconstruction of the
contents of the failed node.
The helper information passed on by the jth, systematic node
uj , j ̸= i, plays only an indirect role in the reconstruction of node ui .
This is because, the set of kα message symbols constitutes a collection
of kα independent scalar random variables. It follows that the role of
the jth systematic node uj , j ̸= i, in the repair process, is to supply
a set of β symbols, that will allow the undesired contribution from
each parity node that is a linear combination of the contents of the jth
systematic node, to be cancelled out. For this to happen, the repair
information passed on by the r parity nodes must be linearly aligned
644 Lower Bounds on Sub-Packetization Level of MSR Codes
The lemma appears in [233]. A formal proof can be found for example,
in [11].
We will now use Lemma 4 to prove a second Lemma, Lemma 5 that
appears in [11] and which deals with the intersection of repair subspaces.
Lemma 5 will be used in turn, to establish the lower bounds on the
sub-packetization level of an optimal-access MSR code. The statement
of Lemma 5 is illustrated in Fig 8.1. Recall that 2 ≤ ℓ ≤ (k − 1),
U = {u1 , u2 , · · · , uℓ },
V = {uℓ+1 , uℓ+2 , · · · , uk },
P = {p1 , p2 , · · · , pr }.
8.1. Properties of Repair Subspaces 645
dim 𝐽! ∩ 𝐽" ≤
𝛽 𝐽"
𝐽! 𝑟
dim(𝐽! ) = 𝛽 dim(𝐽" ) = 𝛽
𝛽
dim(𝐽! ∩ 𝐽" ∩ 𝐽# ) ≤
𝑟"
𝛽 𝛽
dim 𝐽! ∩ 𝐽# ≤ dim 𝐽" ∩ 𝐽# ≤
𝑟 𝑟
𝐽#
dim(𝐽# ) = 𝛽
where p is an arbitrary
T node
in P , i..e, an arbitrary parity node. Fur-
thermore, dim u∈U Spu is the same for all p ∈ P . Hence for any
b
p ∈ P,
T
dim u∈U \{uℓ } Spu
! b
dim (8.3)
\
Sbpu ≤ .
u∈U
r
Proof.
Clearly, the nodes in U ∪V are the systematic nodes and Tthe nodes
in P are the parity nodes. We will first prove that dim u∈U Sbpu
is the same for all p ∈ P , i.e., that the dimension of intersection
of the ℓ subspaces {Sbpu }u∈U obtained by varying the failed node
u ∈ U is the same, regardless of the parity node p ∈ P from which
the helper data originates.
By Lemma 4, ∀p, p′ ∈ P and uj ∈ U ,
Since Aij are invertible for all i, j, equation (8.4) implies ∀p, p′ ∈
P:
ℓ ℓ
Sbpuj Apuℓ+1 = (8.5)
\ \
Sbp′ uj Ap′ uℓ+1 .
j=1 j=1
that
\
= {0}. (8.12)
X
Wj Wi
1≤i,j≤r,i̸=j
Hence, since the Aij ’s are non-singular, from equations (8.8) and
(8.12) we can conclude that:
r
!
dim ≤ dim (8.13)
X \ \
Sbpi u Sbpu ,
i=1 u∈U u∈U \{uℓ }
Figure 8.2: The bipartite graph appearing in the counting argument used to prove
Theorem 4 is shown. Each node on the left corresponds to an element of the standard
basis {e1 , ..., eα }. The nodes to the right are associated to the repair matrices
Sn1 , ..., Snn−1 . An edge connecting the vector ei to node Snj is drawn if ei is a row
vector of the repair matrix Snj .
8.2. Lower Bound for Optimal-Access MSR Codes 649
implies that
|F | ≤ ⌊logr (α)⌋. (8.17)
8.3. Lower Bound for General MSR Codes 651
Thus any given non-zero vector can belong to at most ⌊logr (α)⌋
repair subspaces among the repair subspaces {Sbn1 , . . . , Sbnn−1 }.
Su′ = Spu
′ , ∀u ∈ [k − 1], p ∈ [n − 1] \ {u} i.e., repair matrices of the new
Lemma 8. [6] For any {(n, k, d = n − 1), (α, β)} MSR code with repair
matrices {Spu : u ∈ [k], p ∈ [n] \ {u}} with Su = Spu for all u ∈ [k] and
p ∈ [n] \ {u} i.e., repair matrix Spu does not depend on p,
2r − 1
I(Sb1 , . . . , Sbt ) ≤ I(Sb1 , . . . , Sbt−1 ),
2r
where 1 ≤ t ≤ k.
LA−1 : F(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) → F(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt ).
pt
and for i = t,
Hence,
I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt ).
I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt A(k+1)t ),
I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt A(k+2)t ).
Let,
This implies,
n n
2r × I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = dim(Vpt ) + dim(Wpt ).(8.18)
X X
p=k+1 p=k+1
654 Lower Bounds on Sub-Packetization Level of MSR Codes
p=k+1
n
Wpt = F(Sb1 , Sb2 , . . . , Sbt−1 , Fαq → Sbt A(k+2)t ).
\
p=k+1
n
X n
X
2r × I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = dim(Vpt ) + dim(Wpt )
p=k+1 p=k+1
n
X n
X
≤ (2r − 1) × dim Vpt + Wpt
p=k+1 p=k+1
≤ (2r − 1) × dim F(Sb1 , Sb2 , . . . , Sbt−1 )
Theorem 5. [6] For any {(n, k, d = (n − 1)), (α, β)} MSR code
k−1
α ≥ e 4r .
Proof. From Lemma 6, we can construct an {(n − 1, k − 1, d = (n −
2)), (α, β)} MSR code with repair matrices not depending on p. By
repeated application of Lemma 8 for this new derived MSR code,
2r − 1
(k−1)
I(Sb1 , . . . , Sbk−1 ) ≤ α2 .
2r
Since we have the identity matrix which keeps all the subspaces Sbi
invariant, we have:
I(Sb1 , . . . , Sbk−1 ) ≥ 1.
8.3. Lower Bound for General MSR Codes 655
Hence we have
2r − 1
k−1
α2 ≥ 1.
2r
By manipulation of the above inequality using log(1 + x) ≥ x
1+x for
x ≥ 0, we get the bound mentioned in the theorem.
Notes
3. The case d < n − 1: Lower bounds for the d < n − 1 case can be
derived by replacing r = n − k with s = d − k + 1 and replacing n
with d + 1 in the bounds presented in this section. This is because
we can puncture the MSR code, retain only d + 1 nodes, and then
apply the bounds in this section. For optimal-access repair, when
the choice of repair matrices does not depend on the identity of
the set of remaining d − 1 helper nodes used for repair, we have
the following lower bound presented in [11],
n−1
n o
α ≥ min s⌈ s
⌉
, sk−1 .
657
658 Variants of Regenerating Codes
a1 b1 a1 b1 a1 b1
a2 b2 a2 b2 a2 b2
a 1+ a 2 b 1+ b 2 a 1+ a 2 b 1+ b 2 a 1+ a 2 b 1+ b 2
a1+ 2a2 b1+ 2b2 a1+ 2a2 b1+ 2b2+ a1 2a2- 2b2- b1 b1+ 2b2+ a1
X
a1 X
b1 a1 b1
a2 b2 a2
X b2
X
a 1+ a 2 b 1+ b 2 a 1+ a 2 b 1+ b 2
2a2- 2b2- b1 b1+ 2b2+ a1 2a2- 2b2- b1 b1+ 2b2+ a1
Figure 9.1: In this example, two codewords of a systematic [4,2] MDS code over the
finite field F5 , are piggybacked and appear as columns in the upper-left table. Each
row represents the contents of one of the 4 nodes. The piggyback modification results
in the code shown on the upper right. The tables in the bottom row correspond to
failure of the first and second systematic nodes. The first systematic node can be
repaired by reading {b2 , (b1 + b2 ), (b1 + 2b2 + a1 )}, (shown in blue), the second by
reading {b1 , (b1 + b2 ), (2a2 − 2b2 − b1 )}.
Let CII be a second (not necessarily linear) code having block length
N , size M and minimum distance D = δN over an alphabet A of size
|A| ≤ n that we identify with a subset A ⊆ [0, n − 1]. We associate with
every codeword c = (c1 c2 · · · cN ) of CII , an (rN α × N α) matrix:
Hc =
..
. .
ur,c Diag(Ar−1,c1 , . . . , Ar−1,cN )
n−1
!
q > α + 1.
r−1
662 Variants of Regenerating Codes
C3 , the field size requirement is reduced to q > rn′ ⌈ rs ⌉ with r|(q − 1),
′
q > r⌈ nr ⌉(s − 1) + n′ and q > sr respectively, by identifying a specific
assignment of the {xi,j } as opposed to appealing to the Combinatorial
Nullstellensatz.
The fifth vector MDS code C5 in [138] also has same repair bandwidth
as the four codes described above. This code is constructed directly
without using the transformation and draws upon the form of the
Diagonal MSR code [255]. The structure of C5 is similar to C1 .
Open Problem 9. Characterize the tradeoff between repair bandwidth,
sub-packetization level and field size for the general class of vector MDS
codes.
C1 C4 C7
C2 C5 C8
C3 C6 C9
code symbols, from which the file can be retrieved. Next, suppose one
of the nodes has failed. There are d = 3 lines intersecting the line
corresponding to the failed node in a point and these d = 3 nodes
will serves as helper nodes. The failed node can be repaired by just
downloading one code symbol each from d = 3 helper nodes.
Note that an MBR code with identical parameters, i.e., (n = 6, k =
d = 3, β = 1), can only store a file of size (dk − k2 )β = 6, whereas
this fractional repetition code has file size B = 7. Thus the relaxation
in code-design requirement arising from permitting restricted choice
of helper-node sets has allowed, in this case, a fractional repetition
code to store a larger number of message symbols in comparison to the
corresponding MBR code. An upper bound on file size of fractional
repetition codes is derived in [59] and constructions achieving this bound
for some parameters are presented in [221].
In [125], the authors study fractional repetition codes that have sub-
packetization α much larger than the replication degree ρ. A randomized
version of fractional repetition codes can be found in [170]. Different
generalizations of fractional repetition codes have been studied in the
literature, including those in [81], [165], [264].
9.3. Cooperative Regenerating Codes 665
In related work [4], the (n, k, d) parameter range over which table-
based repair results in a strictly-improved, storage-repair-bandwidth
tradeoff when compared with the corresponding tradeoff that applies to
an FR RGC having the same (n, k, d) parameters, is characterized.
Two approaches have been adopted in the RGC literature to handle the
case when t > 1 nodes fail simultaneously. Under centralized-repair, a
single repair center downloads helper data from a set of d helper nodes
and uses this data to determine the contents of the t replacement nodes.
In the case of an [n, k] vector MDS code with sub-packetization α, the
least amount of data download required from d helper nodes for the
simultaneous repair of t failed nodes under centralized-repair [31] is
given by
αtd
d−k+t
and codes achieving this can be found described in [255]. The FR
storage repair bandwidth tradeoff under centralized-repair of multiple
node failures is explored in [97], [191], [265].
An alternate method of repairing multiple failed nodes simultane-
ously is cooperative-repair, under which there is a separate repair center
for each replacement node. The repair centers are permitted to exchange
data. The potential benefit of allowing such data exchange was first
investigated by Hu et al. in [100]. As with an RGC, in a cooperative
RGC, each of the n nodes store α symbols and the contents of any k
nodes are sufficient to reconstruct the stored data file of size B.
The cooperative-repair of t node failures takes place in two phases.
In the first phase, each of the t replacement nodes selects a set of d
helper nodes and downloads β1 symbols from each of them. In the
second phase, every replacement node downloads β2 symbols from each
of the other (t − 1) replacement nodes. Hence, the repair bandwidth
per node is given by
γ = dβ1 + (t − 1)β2 .
666 Variants of Regenerating Codes
For the case when α is unconstrained, i.e., α > (d − ℓ)β, the resultant
bandwidth-limited secrecy capacity Bs,BL is determined in [169] for
the case d = (n − 1), where a bound and matching construction are
presented. It was also shown that the resiliency capacity satisfies
k
min{(d − i + 1)β, α},
X
Br ≤
i=i0
[255], the authors extend this model to the repair of multiple nodes
and provide MSR constructions that are resilient to t errors during the
repair process.
In [182], the authors extend the passive eavesdropper model to the
setting where out of the ℓ nodes accessed, the eavesdropper can read
the contents of ℓ1 nodes and can observe the information passed on for
the repair of ℓ2 = ℓ − ℓ1 nodes. The upper bound in (9.1) also holds
for this case. In the case of an MBR code, since the amount of data
stored equals the amount of data received for node repair, the breakup
of ℓ between ℓ1 , ℓ2 is immaterial. This is not true in the case of an
MSR code where dβ > α. In [182], the authors provide secure MBR
code constructions matching the upper bound in (9.1) for all possible
parameters. A secure, low-rate MSR code construction that achieves
the upper bound (9.1) for ℓ2 = 0 is also presented in [182]. This secure
MSR construction establishes a lower bound to the secure file size of an
MSR code: Bs ≥ (k − ℓ)(α − ℓ2 β) for ℓ2 > 0.
The upper bound on secure MSR file size Bs ≤ (k − ℓ)α given by
(9.1) is improved in [75], [105], [187], [235]. In [188], the authors establish
that the secrecy capacity of an MSR code with d = n − 1 is given by
1 ℓ2
Bs = (k − ℓ) 1 − α
n−k
by providing a secure MSR construction matching the upper bound
on secure file size given in [75]. The authors of [177] extended this
construction to determine the secrecy capacity of MSR codes with
d < n − 1, for the ℓ1 = 0 case. In [113], secure MSR codes having smaller
field size are constructed for all parameters. In [120], [121], [216], [253]
the ER tradeoff for secure RGCs is studied.
The storage nodes in a data center are typically organized into racks
that contain an equal number of nodes. The communication between
nodes within a rack is less expensive than cross-rack communication.
With this in mind, rack-aware regenerating codes (RRGCs) [99] focus
on minimizing the number of symbols that are exchanged across racks
during node repair.
9.5. Rack-Aware Regenerating Codes 669
In an RRGC, the n nodes are divided into r racks, such that each
rack contains nr nodes, where n is a multiple of r. Each node continues
to store α symbols. The data file of size B stored using an RRGC must
be retrievable from any k nodes, as in the case of an RGC. For the repair
of a failed node in an RRGC, the replacement node is given access to
the entire content of all the nodes belonging to the same rack, as well as
to an additional set of dβ symbols, obtained by downloading β symbols
from each of d other, helper racks. The β symbols downloaded from any
such helper rack can be a function of the entire content of that helper
rack. Communication between nodes lying within the same rack does
not count towards the repair bandwidth, so that the aim in node repair
in the RRGC setting, is to minimize the quantity dβ, referred to as
the cross-rack repair bandwidth. The FR storage-bandwidth tradeoff
for RRGCs was characterized in [95]. The minimum storage rack-aware
(MSRR) and minimum bandwidth rack-aware regenerating (MBRR)
points are given by,
B B
(αMSRR , βMSRR ) = , ,
k k(d − m + 1)
dB
αMBRR = dβMBRR = ,
kd − m(m−1)
2
10.1 Background
670
10.2. Nonlinear LRCs 671
The early study of LRCs in the linear case was mostly centered on sys-
tematic linear codes, where only the message symbols were guaranteed
to be repaired with low degree. These codes were accordingly termed
as codes with information-symbol locality. The study was subsequently
expanded to include all-symbol locality codes, i.e., LRCs where it was
possible to repair all the code symbols with low repair degree. In this
section, we begin with information-symbol locality before moving on to
discuss all-symbol locality.
The original treatment in [72] was for the case when the local codes
have minimum distance δ = 2, corresponding to single-parity-check
672 Locally Recoverable Codes
Hence dmin = n − s. □
with rank(G|Ta ) < (k − 1). We will abuse notation and regard the empty
set as also having such a representation. We begin iteration (a + 1) by
picking an index ia+1 such that
this can always be done. We then set T = P̂ and stop. We set the
flag to J = 1.
10.4. Bounds on dmin and Rate for Linear LRCs 675
Case (i) Suppose we exited the recursion at the end of the (a + 1)th
iteration with flag J = 0. In this case, T = Ta+1 = a+1 j=1 Sij . The
S
inclusion of each set Sij can increase the rank by at most r since each
local code has dimension at most r. Therefore,
k−1
(a + 1) ≥ .
r
Next, we claim that each set Sij , j = 1, 2, · · · , a + 1, brings in additional
column indices associated to at least (δ − 1) redundant columns, i.e.,
indices associated to columns that do not contribute to an increase in
rank. This can be explained as follows. Let
Vℓ = Uiℓ ∪ Tℓ−1 ,
then
we claim that
| Siℓ \ Vℓ | ≥ δ,
i.e., that while we have increased the rank by 1, we have increased the
number of column indices by a quantity ≥ δ. In this way, there are
always (δ − 1) column indices associated to redundant columns that are
added at every step.
The justification for the claim is as follows: in any [n, k, dmin ] code
A, any (k × m) submatrix of a (k × n) generator matrix GA for A, must
have rank k if m ≥ n − dmin + 1. Thus if we partition the column indices
of GA according to [n] = B1 ∪ B2 , B1 ∩ B2 = ϕ, then
apply our earlier arguments about increasing the size of the column
index set by at least (δ − 1) at each of the first a steps. Since we
have replaced Sia+1 by Ŝia+1 , we cannot assert that this last step has
introduced any column indices associated to redundant columns at all.
Thus we can only assert that
|T | ≥ (k − 1) + a(δ − 1)
k
≥ (k − 1) + − 1 (δ − 1).
r
This gives us
k
∴ dmin ≤ (n − k + 1) − − 1 (δ − 1). (10.4)
r
l m l m
Claim: k−1
r ≥ k
r − 1. This can be seen by verifying that
k k−1
−1 = , ∀k = ar + b, 0 ≤ b ≤ (r − 1).
r r
Thus the RHS of (10.4) is larger than the RHS of (10.3). Thus (10.4)
is the desired upper bound on dmin since we can always be sure that
dmin satisfies the upper bound given by (10.4). □
10.4. Bounds on dmin and Rate for Linear LRCs 677
Table 10.1: LRC constructions described in this monograph. All of the constructions
appearing in the table are explicit.
h i
G0 = I5 p1 p2 p3
1 0 0 0 0
p11 p12 p13
0 1 0 0 0 p21 p22 p23
= 0 0 1 0 0
p31 p32 p33
0 0 0 1 0 p41 p42 p43
0 0 0 0 1 p51 p52 p53
p11 0 0 p12 0 0
p11 p12
p21 p21 0 0 p22 p22 0 0
=⇒ 0 p31 0 =⇒ 0 p32 0
p31 , p32 .
p41
0 p41 0
p42
0 p42 0
p51 0 0 p51 p52 0 0 p52
x0 x1 x2 x3 x4 x5 x6 Px
Q0
Q1
y0 y1 y2 y3 y4 y5 y6 Py
Figure 10.1: The [18, 14, 4] LRC employed in Windows Azure storage. Here Px and
Py are the local parities, while Q0 , Q1 represent the two global parities.
the two codes are comparable, at 6 for the [9, 6] RS code, and 7 for
the Azure LRC. The primary difference between the two codes lies in
the storage overhead. While the [9, 6] RS code has a storage overhead
of 1.5, this falls to 1.29 in the case of the Azure LRC. This difference
in storage overhead has reportedly resulted in a large cost savings to
Microsoft [161].
i=0 j=0
Note that f (x) x∈S i.e., the polynomial f (·) restricted to the subset Sj ,
j
reduces to a polynomial of degree ≤ (r − 1). We regard the m subcodes
k
dmin (C) ≥ (n − k + 1) − − 1 (δ − 1).
r
It follows from (10.1) that C is an optimal code with (r, δ) all-symbol
locality.
684 Locally Recoverable Codes
Example Construction
m−1
Let H be a multiplicative subgroup of F∗q and let S = Sj be
S
j=0
the union of m distinct cosets of H in F∗q . Thus we are assuming
that m|H| ≤ (q − 1). Let mH (x) = h∈H (x − h) be the annihilator
Q
polynomial of H.
Claim 1. mH (x) is constant on each multiplicative coset Sj = θH, θ ∈
F∗q of H.
Proof: Let y ∈ Sj =⇒ y = θh′ , h′ ∈ H. Then
mH (y) = (θh′ − h) = h′ [θ − (h′ )−1 h]
Y Y
h∈H h∈H
= (θ − h) = mH (θ)
Y
h∈H
H = {β i | 0 ≤ i ≤ r + δ − 2},
and let
Sj = αj H, 0 ≤ j ≤ m − 1,
mH (x) = xr+δ−1 − 1
g(x) = xr+δ−1
where
u−1
X r−1 v−1
f (x) ∈ F ⇐⇒ f (x) = aij [g(x)]i xj + auj [g(x)]u xj .
X X
This yields
1 X
2 1
f (x) = aij x5i+j +
X X
a2j x10+j .
i=0 j=0 j=0
686 Locally Recoverable Codes
We present below the proof of the bound on rate and minimum distance
of a nonlinear LRC appearing in Theorem 6.
Proof: Let C be an (n, M, dmin ) code of size M = q k , over an
alphabet Aq of size q, having (r, δ) locality. We will establish the bound
on minimum distance appearing in Theorem 6. The bound on code
rate will then follow from Corollary 1. Recall from Definition 6, that
associated to each code symbol ci , there is a set Si ⊆ [n] of size
ni := |Si | ≤ (r + δ − 1) such that the restriction Ci := C|Si of C to Si
is a code of block length ni and minimum distance ≥ δ. Note by the
Singleton bound that
| Ci | ≤ q ni −δ+1 ≤ q r .
i.e.,
k−1
dmin ≤ n − k + 1 − (δ − 1)
r
leading to the desired bound
k
dmin ≤ n − k + 1 − − 1 (δ − 1).
r
□
Table 10.2: This table provides a listing of the constructions for availability codes,
codes with sequential recovery, codes with hierarchical locality and maximally recov-
erable codes (MRCs) that appear in Sections 11, 12, 13 and 14 respectively. With the
exception of the construction of the MRC based on the Combinatorial Nullstellensatz,
all other constructions appearing in the table are explicit.
Notes
(a) For the status on constructions for the case when (r + 1)|n,
please see note above and set δ = 2.
(b) A construction with field size linear in n and minimum
distance within 1 of the upper bound on dmin can be found
in [228] for the case r ∤ k and n ̸= 1 mod (r + 1).
(c) A construction with minimum distance within 1 of the upper
bound on dmin for all parameters, having exponential field
size appears in [66].
(d) A construction achieving the minimum distance bound under
the assumption of disjoint repair sets that holds for all n ≤ q
and all n with n mod (r + 1) ̸= 1 appears in [123]. The
condition n mod (r + 1) ̸= 1 is removed in [176] and an
optimal code construction where optimality is under the
assumption of disjoint repair sets is provided in [176], for all
n ≤ q, where q is the field size.
(e) Upper bounds on dmin tighter than the one given in [72],
can be found in [160], [175], [242], [261]. Constructions for
codes achieving the tightened bound in [242] for the case of
n1 > n2 where n1 = ⌈ r+1 n
⌉, n2 = n1 (r + 1) − n and having
exponential field size can also be found there.
4. On the construction of LRCs with large block length for given field
size: There is interest in determining the largest possible block
length of a code with locality that achieves the upper bound in
(10.1) on minimum distance, for a given field size. This is analogous
to the problem of determining the maximum possible block length
n for which an MDS code having field size q exists. The focus of
the research effort here has been on the case δ = 2. Constructions
for LRCs with block length exceeding the field size q can be found
described in [21], [112], [142]. Bounds on the maximum possible
block length n of an LRC having minimum distance achieving
(10.1) for a given field size q can be found in [86], [90]. In [34], the
authors focus on the general case δ > 2, derive an upper bound on
the maximum length possible and provide a construction having
length that is super-linear in the size q of the underlying finite
field.
5. LRCs with all-symbol locality and small alphabet size for the case
δ = 2:
This section, as well as the two sections that follow, may be viewed as
providing additional, alternative approaches to handling multiple node
failure.
Codes with availability, discussed in the present section, provide
multiple, node-disjoint means of accessing the data contained within a
particular node, thereby enabling recovery from multiple-node failure.
Availability codes have an additional appeal: they are able to handle
multiple, simultaneous requests for the data contained within a partic-
ular node, a useful feature when storing popular content. The notion
of codes with availability was introduced by Wang and Zhang in [241]
in the setting of linear codes. As in the case of LRCs, we begin with a
discussion of the more general case of nonlinear codes with availability,
before specializing to the linear case.
695
696 Codes with Availability
|Rij |
and t associated functions {fij : Aq → Aq } such that
Theorem 8 (Upper Bound on Rate and dmin for Nonlinear Codes [230]).
Let C be an (n, M, dmin ) code over an alphabet Aq of size q with
availability having parameters (r, t) and where the repair sets {Rij | i ∈
[n], j ∈ [t]} are of constant size |Rij | = r. Let k = ⌊logq (M )⌋. The rate
and minimum distance of C then satisfy the upper bounds
k 1
≤ , (11.1)
n Qt
1+ 1
j=1 jr
t
k−1
(11.2)
X
dmin ≤ n − .
i=0
ri
In our proof of the rate bound, we follow Tamo et al. [230] and refer
the reader to [230] for a proof of the upper bound on minimum distance.
The rate bound will follow from Lemma 10 and Lemma 11, given
below. The proofs adopt a graphical approach that involves associating
a directed graph on n nodes with the code, called the recovery graph.
The ith node is associated to code symbol ci . The edges are colored
using one of t colors which we associate with elements of the set [t].
There is a directed edge bearing color ℓ, ℓ ∈ [t], from node j to node
i iff j ∈ Riℓ . Next, a random permutation π(·) of the set [n] is chosen
and the nodes are linearly ordered from left to right with the ith node
appearing in position π(i). We then turn to a coloring of the nodes.
Node i is assigned color ℓ iff
!! !" !# !$ !%
Figure 11.1: Illustrating the recovery graph for a binary linear code, satisfying the
parity checks: c1 + c2 = c3 , c2 + c4 = c5 . The nodes are ordered in accordance with a
random permutation π, so that the node associated to ci appears in position π(i).
Here (π(1), π(2), π(3), π(4), π(5)) = (3, 4, 5, 1, 2). The edges in red are associated to
the first p-c equation and the edges in green, with the second (only the edges relevant
to node coloring are shown). Under this permutation, node 3 is colored red and node
2 is colored green. The number 3 of uncolored symbols leads to the upper bound
M ≤ q 3 on code size.
P (i ∈ U ) = P (∪tj=1 Aij ),
where P (·) denotes the probability function. We can employ the inclusion-
exclusion principle to calculate the above probability if we know
P (∩j∈S Aij ) for every subset S ⊆ [t]. This latter probability is the proba-
bility of the event that node i is colored with all colors in the set S which
698 Codes with Availability
implies the event π(ℓ) < π(i) for all ℓ ∈ Rij , j ∈ S. It follows that in the
linear ordering determined by π, the code symbol ci must necessarily
appear to the right of all the code symbols cm , m ∈ ∪j∈S Rij . If we
restrict attention to the set of code symbols {cm | m ∈ {i} ∪j∈S Rij },
all orderings of these symbols are equally likely. Hence the probability
that code symbol ci ends up in the rightmost position within this set,
is given by:
1
P (∩j∈S Aij ) =
| ∪j∈S Rij | + 1
1 1
= = .
j∈S |Rij | + 1 |S|r + 1
P
P (i ∈ U ) = P (∪tj=1 Aij )
t
!
t
= (−1) P (Ai1 ∩ Ai2 ∩ ... ∩ Aij )
X
j−1
j=1
j
t
1
!
t
= (−1)j−1
X
.
j=1
j jr + 1
The expected value of the number of colored codes is then given by:
E(|U |) = P (i ∈ U )
X
i
1
= n 1 − Q .
t
j=1 1+ 1
jr
|Rij | = r for all {i, j}. Let π be an arbitrary permutation on [n] and
let the nodes of the associated recovery graph be colored as described
above, under the ordering of nodes associated to π(·). Let U be the set
of colored nodes. Then we have the upper bound
M ≤ q n−|U | ,
on code size.
Proof. Follows from the fact that under any ordering of code symbols,
the code symbols associated to colored nodes can be determined given
the values of the code symbols associated to uncolored nodes.
As in the case of LRCs, there is greatest interest in the linear case, and
we provide below, a formal definition of a linear availability code.
Definition 10 (Linear Availability Code). An [n, k, dmin ] code C over a
field Fq , is said to be a code with availability with parameters (r, t), if
for each code symbol ci there are t disjoint sets
{ Rij ⊆ [n] | |Rij | ≤ r, i ∈
/ Rij j = 1, 2, · · · , t } ,
such that
ci = aijℓ cℓ , aijℓ ∈ Fq , holds for j = 1, 2, · · · , t.
X
ℓ∈Rij
The two lemmas below will show that the code C has dimension
r−1+t r+t r−1+t
! ! !
k = n(r, t) − = − ,
t−1 t t−1
and hence rate
r+t r−1+t
k t − t−1 r
= = .
n r+t
t
r+t
Lemma 13. The matrix H(r, t) has the following recursive structure
H(r, t − 1) 0
H(r, t) =
I
|{z} H(r − 1, t)
| {z }
n(r,t−1) columns n(r,t)−n(r,t−1) columns
Proof. There are four blocks in the above recursive structure for H(r, t).
We prove the presence of these four blocks separately.
• By lexicographic ordering, the first n(r, t − 1) subsets in columns
i.e., the sets C(i), 1 ≤ i ≤ n(r, t − 1) all contain the element 1.
Similarly the first m(r, t − 1) subsets in rows i.e., the sets R(i),
1 ≤ i ≤ m(r, t − 1) all contain the element 1. Hence 1 is fixed in
all these sets and we can think of them as if t has reduced by one.
Hence the first n(r, t − 1) columns and the first m(r, t − 1) rows
of H(r, t) has the form: H(r, t − 1).
702 Codes with Availability
Lemma 14. By Lemma 13, the matrix H(r, t) has the recursive struc-
ture:
" #
H(r, t − 1) 0
H(r, t) =
I H(r − 1, t)
In this recursive structure we have that
rank(H(r, t)) = rank(H ′ ),
where
h i
H′ = I H(r − 1, t) .
It follows that the dimension k of the availability code C is given by:
r−1+t
!
k = n(r, t) − .
t−1
Proof. The row-reduction process in block-matrix form applied to
" #
H(r, t − 1) 0
H(r, t) =
I H(r − 1, t)
11.3. Upper Bounds on dmin of Linear Availability Codes 703
We claim that
H(r, t − 1)H(r − 1, t) = 0.
Clearly, if we can show this, this establishes the Lemma. We will show
the product H(r, t − 1)H(r − 1, t) to be the zero matrix by showing that
each inner product of a row in H(r, t − 1) and a column in H(r − 1, t)
over F2 equals 0. Each row of H(r, t − 1) is associated to a subset D
of size (t − 2) drawn from a set of size (r + t − 1). Each column of
H(r − 1, t) is associated to a subset F of size t drawn from a set of
size (r + t − 1). The inner product is precisely equal to the number of
subsets E that satisfy
D ⊆ E ⊆ F,
given that
H(r, t − 1)H(r − 1, t) = 0.
The shortening approach and the approach via GHW are closely
connected. Knowing the GHW of the dual code makes it easier to
identify candidate sets S to be used in conjunction with the shortening
approach. However, in practice, the GHW of the dual code may not
be precisely known, while upper bounds to the GHW of the dual code,
might be more easily available. For this reason, the bound in Theorem 9
below, is phrased in terms of upper bounds d⊥ i ≤ ei , i = 1, 2, · · · , b on
the first 1 ≤ b ≤ n − k GHWs of the dual code.
1
Generalized Hamming weights are at times referred to in the literature, as
minimum support weights, see for example [94].
11.3. Upper Bounds on dmin of Linear Availability Codes 705
Proof. As noted above, the {ei }bi=1 play the role of known upper bounds
on the GHW of the dual code, as in most cases, the exact GHW of
the dual code will be unknown. Let the support of a subcode of C ⊥ of
dimension i be Si , where |Si | = d⊥
i . Let some arbitrary indices of code
symbols be added to Si so that the augmented set satisfies |Si | = ei .
Next, let C be shortened at the co-ordinates indexed by Si i.e.,
Cshorten = {c|[n]\Si : c ∈ C, c|Si = 0}.
It follows that Cshorten is also an availability code having parameters
(r, t) and block length (n − ei ). We claim that Cshorten has dimension
≥ n − ei − (n − k − i) = (k + i − ei ). This can be seen as follows. From
the definition of GHW, the p-c matrix of the code C can be written in
the following form:
0
Hi
i rows {
H= A H ′
.
n − k − i rows {
|{z} |{z}
co-ordinates in Si ( ei columns) n−ei columns
dmin ≤ n − |S|.
We show below how the very same bound can be obtained by applying
Corollary 2 with i = |S| − k + 1, and ei = |S|. The motivation for
making this connection, is that this will make it easier to compare prior
bounds in the literature, with the bound appearing in Corollary 2.
The dual of the restriction C|S of the code C to the set S, is a
shortened version of the dual C ⊥ having dimension ≥ |S| − k + 1.
The definition of GHWs of the dual C ⊥ , allows us to conclude that:
|S|−k+1 ≤ |S|. This now allows us to apply the bound
d⊥
dmin ≤ min n − k − i + 1
i∈T
11.3. Upper Bounds on dmin of Linear Availability Codes 707
d⊥
|S|−k+1 ≤ ei = |S|,
ei − i < k,
are both satisfied. We end up, as mentioned earlier, with the very same
bound:
d⊥
i ≤ ei ,
ei − i < k,
b = ⌈n − nRmax ⌉ (11.4)
2ej
eb = n, ej−1 = ej − + r + 1, ∀j ∈ [2, b] (11.5)
j
i = max j
{j : (ej −j) < k, j∈[b]}
• Each row has Hamming weight r+1 and each column has Hamming
weight t,
• The support of any two rows of Hsa intersect in at most one index
and
With respect to the definition above, we note that for strict avail-
ability to hold, we need that (r + 1) | nt. We do not require the rows
of the matrix Hsa to be linearly independent. It is straightforward to
verify that a code with strict availability is also a code with availability.
We will now derive the upper bound on the rate of codes with strict
availability appearing in [14].
Then R(r, t) satisfies the functional equation and upper bound given
respectively by:
t t
R(r, t) = 1 − + R(t − 1, r + 1),
r+1 r+1
t t 1
R(r, t) ≤ 1 − + . (11.6)
r + 1 r + 1 r+1
j=1 (1 + j(t−1) )
1
Q
It follows that R(r, t) = supn R(r, t, n). Next, we pick an integer n such
that R(r, t, n) > 0. Note that by Construction 2, such an n does exist.
Let C be an (n, k, r, t)sa code having rate R(r, t, n). Since our interest
is in deriving an upper bound on code rate, we can without loss of
generality, assume that C has a p-c Hsa that satisfies the conditions laid
710 Codes with Availability
out in Definition 12. It follows that the rank of this p-c matrix satisfies
rank(Hsa ) = n − nR(r, t, n). Next, we note that HsaT is the p-c matrix of
Notes
1. Upper bounds on the rate of binary codes with strict availability:
In [114], [115], Kadhe and Calderbank provide the following upper
bound on rate of binary codes with strict availability for t = 3
and any r:
r−2 3 1
R(r, 3) ≤ + H2 ,
r+1 r+1 r+2
11.4. Strict Availability 711
3 3 log(2r + 4)
R(r, 3) ≤ 1 − + (11.7)
r + 1 (r + 1)(2r + 3)
The bound in (11.7) can be achieved by constructing a code that
makes use of the incidence matrix of a Steiner Triple System as
the p-c matrix, a construction pointed out by [12], and [245] in the
availability context and thus the upper bound in (11.7) is tight.
In the same paper [115], the authors provide the following upper
bound on the rate of binary codes with strict availability for the
case r = 2 and arbitrary t:
1
R(2, t) ≤ H2 .
t+1
(a) The cyclic code construction given in [245] has the following
parameters
n = 2t − 1, k = 2(t−1) − 1, r = t − 1
2(t−1) −1
and hence has rate 2t −1 which is larger than r
r+t = 2t−1 .
t−1
Open Problem 12. Determine the smallest possible block length n for
which an (n, M ≥ q k ) availability code over an alphabet of size q exists,
having availability parameters (r, t).
Open Problem 14. Derive a tight upper bound to the minimum distance
dmin of an (n, M ≥ q k ) availability code over an alphabet of size q having
parameters (r, t).
12
LRCs with Sequential Recovery
714
12.1. Recovery from Two or Three Erasures 715
ℓ∈Sj
We now present an upper bound due to Prakash et al. [174] on the rate
n of a seq-LRC for the case t = 2. The bound takes on the form of a
k
The construction below due to [174] achieves the above lower bound
on block length n.
Construction 3. [174] Let 2k = ur + b, 1 ≤ b ≤ r. Let G be a graph with
u + 1 nodes where u of the nodes have degree r and the remaining node
has degree b. Let each edge represent a message symbol. Thus there are
a total of k message symbols. Let each node represent a parity symbol
storing the binary sum of message symbols corresponding to the edges
incident on that node. Thus there are (u+1) parity symbols. The message
and parity symbols put together, yield an [n = k + u + 1, k] seq-LRC
l m
with parameters (r, t = 2). Furthermore, since n = k + u + 1 = k + 2k
r ,
by Theorem 11, this code has the minimum possible block length, and
hence maximum possible rate for given parameter set (k, r, t = 2).
12.1. Recovery from Two or Three Erasures 717
B = < h1 , . . . , hℓ >
An upper bound for the case of general (r, t) derived in Balaji et al. [13]
is presented below in Theorem 13. Matching constructions establishing
that this bound is tight are also given in the same paper. The bound also
establishes the correctness of a conjecture due to Song et al. appearing
in [226] and that is stated in the notes subsection.
12.2. The General Case 721
rs+1
Ps
rs+1 +2 i=0 ri
, for t even,
k
≤ (12.9)
n
s+1
rP
, for t odd,
s
rs+1 +2 i=1 ri +1
where s = ⌊ t−1
2 ⌋.
The proof is along the same lines of the proof used to bound the
rate for the cases t = 2 and t = 3. The bound is tight as it is possible
to construct seq-LRCs that achieve this rate bound. Details including
code constructions, can be found in [13].
where
2
4. C is an (a0 r × a02r ) matrix with each row of weight r and each
column of weight 2.
2
Hence the block length is given by n = a0 (1 + r + r2 ). Since the diagonal
entries of D0 , D1 are nonzero, it follows that the rank of H is equal to
the number of rows. It follows that the dimension k of the code k equals
a0 r2
2 .
Let us form an augmented matrix H∞ by adding a row to H at the
very top, this row is the binary sum of rows in H. Thus H∞ is given by:
1 0 0
H∞ = D 0 A1 0 , (12.11)
0 D1 C
where
1. 1 is an (1 × a0 ) row vector with each coordinate equal to 1,
2. the matrices D0 , D1 , A1 , C remain as before.
Clearly, H∞ is also a valid p-c matrix for the code C. Each column
of H∞ has Hamming weight exactly 2. Hence this matrix H∞ can be
interpreted as the edge-vertex incidence matrix of a graph G∞ having
n edges and (1 + a0 + a0 r) nodes (the number of rows in H∞ ). Fig.
12.1, called the Moore graph, shows the graph G∞ corresponding to
the values (a0 = 7, r = 6) for a certain choice of matrices D0 , D1 , A1 , C
ensuring that the girth of G∞ is ≥ t + 1 = 5.
Each edge in G∞ represents a distinct code symbol while each vertex
represents a parity-check on the code symbols represented by edges
attached to the vertex. Thus each vertex is associated to a row in the
p-c matrix H∞ and each edge to a column of the p-c matrix. Each
column of the p-c matrix H∞ has Hamming weight 2 and the location
of the two 1s within the column indicates the vertices to which the edge
is connected. In Fig. 12.1, the edges at the very top, which are colored
in red, correspond to the first a0 = 7 columns of H∞ . The edges which
are colored in black and blue, correspond respectively, to the columns
of H∞ corresponding to the sub-matrices
0 0
A1 and 0 .
D1 C
12.2. The General Case 723
Figure 12.1: The figure shows the graphical interpretation of a binary, rate-optimal
seq-LRC C having parameter set (n, k, r, t) = (175, 126, 6, 4). Each of the 175 edges of
the graph represents a distinct code symbol and each of the 50 vertices represents a
parity-check of the code symbols represented by edges incident on it. This is a regular
graph with a total of 50 vertices, each of degree r + 1 = 7 and is an example of a
Moore graph called the Hoffman-Singleton graph. This graph has girth 5, which is a
necessity for the associated binary code to be able to recover from t = 4 erasures. The
code has redundancy 49 and not 50 since it turns out that the overall parity-check
at the very top is redundant.
D0 A1 0
H = 0 D1 A2 , (12.12)
0 0 P
where
1 0 0
D0 A1 0
H∞ = , (12.13)
0 D1 A2
0 0 P
where
Figure 12.2: The figure shows the graphical interpretation of a binary, rate-optimal
seq-LRC C having parameter set (n, k, r, t) = (52, 27, 3, 5). Each of the 52 edges of
the graph represents a distinct code symbol and each of the 26 vertices represents
a parity-check on the code symbols represented by edges incident on it. This is a
regular graph with a total of 26 vertices, each of degree r + 1 = 4 and is an example
Moore graph for (r = 3, t = 5) corresponding to the projective plane of order r = 3.
This graph has girth 6, which is a necessity for the associated binary code to be able
to recover from t = 5 erasures. The code has redundancy 25 and not 26 since it turns
out that the overall parity-check at the very top is redundant.
the first a0 columns of H∞ . The edges which are colored in black and
blue, correspond respectively, to the columns of H∞ corresponding to
the sub-matrices
0 0
A1 0
and .
D1 A2
0 P
The sequential recovery property follows from by noting that the girth
of G∞ is ≥ 6 and that all nodes in G∞ have degree exactly r + 1. It
can be seen that the rate of this code achieves the upper bound on rate
given in Theorem 13 for (r = 3, t = 5).
Notes
k 1
≤
1 + i=1
Pm ai
n ri
where m = ⌈logr (k)⌉, and the integers {ai } satisfy the conditions
ai ≥ 0, m
i=1 ai = t.
P
Figure 12.3: Comparison of rate bounds on codes with sequential recovery (12.9)
and codes with availability (11.1) for t = 12.
Open Problem 15. The tight upper bound on rate presented in this
section does not depend on block length n and depends only on the pair
(r, t). It can be shown that the block length n of a rate-optimal code must
t−2
necessarily satisfy n ≥ r 2 on account of the tree-like structure forced
upon the graphical representation of these codes as discussed in [13].
The open problem in this context, is to derive an upper bound on the
dimension k of a seq-LRC for given (n, r, t) and identify constructions
achieving the upper bound.
728 LRCs with Sequential Recovery
729
730 Hierarchical Locality
X1 X2 X3 X4 X5 X6 X7 Y1 Y2 Y3 Y4 Y5 Y6 Y7
P1
X-code Px Y-code PY P2
Figure 13.1: Illustrating the [18, 14, 4] code with information-symbol locality and
locality parameters (r = 7, δ = 2) employed in the Windows Azure storage system.
[24,14,6]
[12,8,3] [12,8,3]
Figure 13.2: A [24, 14, 6] code having 2-level hierarchical locality. The top-level
[24, 14, 6] code is a code that is a subcode of the disjoint union of two [12, 8, 3] codes.
The [12, 8, 3] codes are in turn, subcodes of the disjoint union of three [4, 3, 2] codes.
As a result, the [24, 14, 6] code can recover from any single-node failure with repair
degree 3 by making use of the codes at the bottom level and any double-node failure
with repair degree r = 8 by making use of the middle codes.
nodes within the support of the local X-code, will mean that local node
repair is no longer possible and hence, the repair degree will jump from
r = 7 to k = 14. In such a scenario, codes with hierarchical locality can
step in to provide a more gradual degradation in repair degree.
An example of a code with hierarchical locality is presented in
Fig. 13.2. The figure shows an [24, 14, 6] code having 2-level hierarchical
locality. The top-level [24, 14, 6] code is a code that is a subcode of the
disjoint union of two [12, 8, 3] codes. The [12, 8, 3] codes, called middle
codes, are in turn, subcodes of the disjoint union of three [4, 3, 2] codes.
As a result, the [24, 14, 6] code can recover from any single-node failure
with repair degree 3 by making use of the codes at the bottom level
and any double-node failure with repair degree r = 8 by making use
of the middle codes. It is only with 3 or more node failures, that the
repair degree jumps to 14. We note that the Windows Azure code has
information-symbol locality, while the [24, 14, 6] code has all-symbol
locality.
731
2. dmin (Ci ) ≥ δ1 ,
Remark 12. Each local code (of a middle code), is a code with (r2 , δ2 )-
locality, meaning that it is a code of block length ≤ (r2 + δ2 − 1) and
minimum distance ≥ (δ2 − 1). By the Singleton bound, this means that
the dimension of each local code is ≤ r2 . Similarly, the dimension of
each middle code Ci is ≤ r1 .
Proof. The proof will identify a code CS obtained by restricting the code
C to a subset S ⊆ [n], where S has large size and dim(CS ) = (k − 1).
The bound
we will mean the rank of the matrix G |S . Let ai denote the incremental
rank and si denote the incremental support size when adding the support
of a local code Li to the existing support set S by replacing S by the
union S ∪ Li . Then we have si ≥ ai + (δ2 − 1), 1 ≤ i ≤ iend , since the
rank condition (i.e., Line 3 in Algorithm 1) ensures that ai > 0 in every
iteration.
Let Vi denote the column space of the matrix G|Li . Let ij denote the
index of the last local code C Lij added for fixed value j of the outer loop
index. As noted above, the support Lij of the code C Lij is contained
in Mj . Within the jth outer iteration, if there are no more local codes
having support contained in the support Mj of the current middle code,
and which will result in an increase in rank of the associated matrix
G|S , then the support Lij of the last local code added with Lij ⊆ Mj
is deleted from S and the support set is incremented by taking instead,
the union with Mj . Thus, in place of replacing the existing support set
S by S ∪ Lij , we replace S by S ∪ Mj . This has the effect of increasing
the support size during the ij th inner iteration by an amount
i=1
734 Hierarchical Locality
i=1
i=1 j=1
iend −1 X−1
jend
= (k − 1) + (δ2 − 1) + (δ1 − δ2 ).
X
i=1 j=1
Constructions for codes with hierarchical locality for any arbitrary level
h of hierarchy can be found in [199], [201]. Our focus here is on the
case of 2-level hierarchy. Let (n1 , n2 ) denote block lengths of the middle
and local codes respectively. Then the construction presented in [199] is
optimal when n2 | n1 | n and r2 | r1 | k. The construction is shown to be
optimal under certain other numerical constraints as well. In [18], the
authors provide optimal constructions based on algebraic curves and
elliptic curves. The constructions provided in [18] also assume numerical
conditions such as for example, δ2 = 2 or r2 | r1 | k. In [260], the authors
first construct a family of generalized RS-based optimal LRCs and then
use the resulting LRCs to construct optimal codes with hierarchical
locality. The constructions presented in [260], are less restrictive in their
choice of parameters in comparison with the constructions in [199] and
[18]. We now present an illustrative example of the optimal construction
in [199] for 2-level hierarchy.
As will be seen, the code will turn out to be optimal despite the fact
that r2 ∤ r1 . We choose n1 = 12 and n2 = 4 as the block lengths of the
middle and local codes respectively and note that n2 | n1 | n. In the
construction, there are two middle codes having disjoint support sets
M1 and M2 . In turn, each middle code contains three support-disjoint
local codes, i.e., Mi = Li1 ∪ Li2 ∪ Li3 , i = 1, 2 where Lij denotes the
support of a local code.
The underlying finite field Fq in the construction is selected to be
the field F25 , which ensures that n | (q − 1). Let G, H with H ⊆ G be
subgroups of F∗q with sizes given by |G| = n1 = 12, |H| = n2 = 4. Let α
be a primitive element of Fq , thus α has order 24. We set
θ∈αj H
and
Pαi G (x) = (x − θ) = x12 − α12i , i = 0, 1.
Y
θ∈αi G
From (13.4), (13.5), it can be seen that the monomials in both f (·) and
g(·) above have degree belonging to the set {0, 1, 2, 4, 5, 6, 8, 9, 10}. Since
the middle code is required to have minimum distance δ1 = 3, we would
like to make sure that the coefficient of x10 in both f (x) and g(x) equals
zero. This can be ensured by making sure that the message coefficients
{aij , bij , | i, j ∈ {0, 1, 2}} are pre-coded to satisfy the conditions:
a02 a12 a22
+ + = 0
(1 − α8 )(1 − α16 ) (α8 − 1)(α8 − α16 ) (α16 − 1)(α16 − α8 )
b02 b12
+ 12 +
(α − α )(α − α ) (α − α )(α12 − α4 )
4 12 4 20 20
b22
= 0.
(α − α )(α20 − α12 )
20 4
where
(x12 − α12 )
T0 (x) = ,
(1 − α12 )
(x12 − 1)
T1 (x) = .
(α12 − 1)
The monomials in the polynomial m(x) have degrees belonging to the
set {0, 1, 2, 4, 5, 6, 8, 9, 12, 13, 14, 16, 17, 18, 20, 21}. However, we are in-
terested in constructing an overall block code having dimension k = 14.
We have already imposed two constraints on the 18 message coefficients
{aij , bij }2i,j=0 . Thus we are in a position to impose two further con-
straints. In the interest of ensuring that the minimum distance as large
a value as possible, we restrict m(x) to have degree 18, by setting the
738 Hierarchical Locality
Notes
739
740 Maximally Recoverable Codes
of the coordinate set [n]. From a geometric point of view, one could say
that the topology of the parity-checks (by which we mean the support
sets of the parity-checks) has been identified, but not the specific parity-
checks themselves. The rest of the definition remains unchanged and an
[n, k] MRC is defined in this more general topological setting, as any
subcode of C L that is capable of recovering from any erasure pattern
that it is possible for an [n, k] subcode of such a code C L to do so.
uT G = cT ,
Hc = 0,
in two different ways, from the perspective of the generator and p-c
matrices of the code C L .
Theorem 16 (Excluded Erasure Patterns). Let C L be a linear [n, kL ]
code over a finite field Fq , having generator matrix GL of size (kL × n)
and p-c matrix HL of size (n − kL × n). Let C be a subcode of C L of
dimension k < kL . Then it is not possible for C to recover from an
erasure pattern E of size |E| ≤ (n − k) if either of the following two
equivalent conditions are satisfied:
(a) rank(GL |S ) < k, where S = [n] \ E,
The corollary below presents a test for an MRC that follows from
Theorem 15, Corollary 3, Definition 16 and Remark 14.
(a) rank(HL |E ) = (n − kL ),
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
HL = 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 .
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
Am = {θ5m+i | i = 1, 2, 3, 4, 5}, m = 0, 1, 2.
14.3. Existence of MRCs 745
θi + θj ̸= θk + θl ,
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
= 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
H
θ1 θ2 θ3 θ4 θ5 θ6 θ7 θ8 θ9 θ10 θ11 θ12 θ13 θ14 θ15
θ12 θ22 θ32 θ42 θ52 θ62 θ72 θ82 θ92 θ10
2 2
θ11 2
θ12 2
θ13 2
θ14 2
θ15
Theorem 17. Let C L be an [n, k] code over Fq . Let the field size q
n−1
satisfy q > n−k−1 . Then there exists an [n, k] subcode code C ⊆ CL
over Fq which is an MRC with respect to C L .
Let the erasure pattern E also satisfy the property that rank(HL |E ) =
(n−kL ). The theorem will then follow from Corollary 4 if we can establish
that rank(H|E ) = (n − k) for every such erasure pattern. For any given
746 Maximally Recoverable Codes
HGlob = Lp (α1 , . . . , αn , kL − k)
α1 α2 ... αn
ℓ ℓ ℓ
α12 α22 ... αn2
(14.1)
≜ .. .. .. ..
. . . .
ℓ(kL −k−1) ℓ(kL −k−1) ℓ(kL −k−1)
α12 α22 . . . αn2
" #
HL |E
H|E =
HGlob |E
1 0 0
... a1,1 ... a1,|E|−m
0 1 ... 0 a2,1 ... a2,|E|−m
.. .. .. .. .. .. ..
. . . . . . .
0 0 1
... am,1 ... am,|E|−m
=
2ℓ
α1 α2 ... αm αm+1 ... α|E|
ℓ ℓ 2ℓ 2ℓ ℓ
α12 α22 ... αm αm+1 ... 2
α|E|
.. .. .. .. .. .. ..
. . . . . . .
ℓ(kL −k−1) ℓ(kL −k−1) 2ℓ(kL −k−1) ℓ(k −k−1) 2ℓ(kL −k−1)
α12 α22 . . . αm 2 L
αm+1 . . . α|E|
Then for n − kL + 1 ≤ i ≤ n − k,
n−k
XL n−k
XL
f (αi ) = aji vj = aji f (αj ).
j=1 j=1
Since α1 , . . . , α|E| are linearly independent over F2ℓ , this implies that
f (x) has at least 2ℓ(kL −k) zeros, since any linear combination of the
14.5. Reduced Field-Size Construction for the Disjoint Locality Case 749
n−kL
set {αi + j=1 aji αj : n − kL + 1 ≤ i ≤ (n − k)} of (kL − k) linearly
P
independent elements over F2ℓ is also a zero of f (x). There are 2ℓ(k−kL )
such linear combinations. However, the degree of f (x) is ≤ 2ℓ(k−kL −1) .
It follows that f (x) ≡ 0 i.e., f (x) must be the all-zero polynomial. This
implies that the vector v = 0, a contradiction. Hence H|E is full rank
and this choice of HGlob yields an MRC.
The above described construction has large field size primarily
because of the choice of HGlob that contains n elements {α1 , . . . , αn }
that are linearly independent over F2ℓ . It turns out that for the same
choice of HGlob , it is possible to reduce the field size for a special case of
HL by selecting the set {α1 , . . . , αn } more intelligently. This is described
below.
1r+1 0 0
...
0 1r+1 . . . 0
HL = (14.2)
.. ,
0 0 . 0
0 0 . . . 1r+1
where 1r+1 is a row vector of length r + 1 with all components equal to
1. Note that the matrix HL is a binary matrix, so that in reference to
the previous subsection, we have that the ground field here is Fb = F2 .
750 Maximally Recoverable Codes
[ig , jg ] ⊆ Sg , g ∈ [m],
i1 ≤ j1 , i2 ≤ j2 , · · · , im ≤ jm ,
E = [i1 , j1 ] ∪ · · · ∪ [im , jm ],
(j1 − i1 + 1) + · · · + (jm − im + 1) = |E| = m + kL − k.
14.5. Reduced Field-Size Construction for the Disjoint Locality Case 751
ju − iu > 0, ∀u ∈ [ℓ],
ju − iu = 0, ∀u ∈ [m] \ [ℓ],
ℓ
(ju − iu ) = kL − k.
X
u=1
where
′
Hglob,E = Lp (αi1 , (αi1 + αi1 +1 ), . . . , (αi1 + αj1 ),
αi2 , (αi2 + αi2 +1 ), . . . , (αi2 + αj2 ),
. . . , αiℓ , (αiℓ + αiℓ +1 ), . . . , (αiℓ + αjℓ ),
αiℓ+1 , αiℓ+2 , . . . , αim , kL − k)
752 Maximally Recoverable Codes
and
1 0 ... 0 0 0 ... 0 ...
0 0 0 0 0 ...
0 0
For HE′ to be of full rank, we need that J have full rank kL − k. Let
{λ1 , λ2 , . . . , λm } ⊆ Fq
{ζ1 , . . . , ζr } ⊆ F2r
Notes
There are many open problems on the topic of MRCs that one could
list. A basic open problem is listed below.
Open Problem 18. Determine the minimum field size required to
construct an MRC with respect to parent code CL having disjoint-
locality specified by the p-c matrix HL given in (14.2).
More generally, one can raise the same question as above with
the single-parity-check local codes associated to (14.2) replaced by
MDS codes constructed say, using Vandermonde matrices. One could
generalize this further by leaving the p-c matrices of the individual MDS
codes unspecified. One could also ask similar questions with respect to
other topologies.
15
Codes with Combined Locality and Regeneration
In the previous sections, we have seen that RGCs minimize the repair
bandwidth, whereas LRCs have low repair degree. A natural question
to ask is, do there exist codes that simultaneously have low repair
bandwidth as well as low repair degree? Working independently, Kamath
et al. [117], [134] and Rawat et al. [189], [220] arrived at the same class
of codes that answered this question in the affirmative. These codes
have the property that the local codes are RGCs and for this reason, are
termed as locally regenerating codes (LRGCs). It follows that LRGCs
share the same vector symbol alphabet Fαq as RGCs.
In this section, we present a minimum distance bound that applies
to LRGCs. We also describe in brief, constructions that achieve the
bound. We follow the approach adopted in [117].
756
15.1. Locality of a Code with Vector Alphabet 757
Locality
Definition 18. Let C be a vector code of block length n over Fαq . The ith
vector code symbol ci for 0 ≤ i ≤ (n − 1), is said to have (r, δ) locality,
758 Codes with Combined Locality and Regeneration
• rank(G||I ) = K, and
• for any i ∈ I, the ith vector code symbol ci , has (r, δ) locality.
Both MSR and MBR codes belong to a class of vector codes that we
term here as uniform rank accumulation (URA) codes.
(i) a1 = α, and
The rank profile of an {(n, k, d), (α, β), K, Fq } MSR code is given
by (see for example, [211]):
(
α 1≤i≤k
ai = . (15.1)
0 (k + 1) ≤ i ≤ n
Note that
n
ai = kα = K,
X
i=1
as expected.
In the case of an {(n, k, d), (α, β), K, Fq } MBR code, the rank profile
is given by [211]:
(
α − (i − 1)β 1≤i≤k
ai = . (15.2)
0 (k + 1) ≤ i ≤ n
as expected.
For y ≥ 1, set P (inv) (y) = x, where x is the smallest integer such that
P (x) ≥ y. The minimum distance of C is then upper bounded by the
following theorem (see Theorem 4.1 in [117]):
Theorem 18. Let C be an [[n, K, dmin , α]] code with (r, δ) information-
symbol locality, where the local codes are URA codes having identi-
cal [[nℓ = r + δ − 1, Kℓ , δ, α]] parameters and identical rank profile
{a1 , . . . , anℓ }. Then, we have:
For a code with MSR locality, one can simplify (15.4) using (15.1) to
obtain ([117], [189]):
K K
dmin ≤ n− +1− − 1 (δ − 1).
α αr
15.2. Codes with MSR/MBR Locality 761
Local Codes are Polygonal MBR Codes The polygonal MBR code
construction described in Section 4.1.1, yields MBR codes having pa-
rameter sets of the form
( !)
r
(nℓ , r, d = nℓ − 1), (α = nℓ − 1, β = 1), Kℓ = rα −
2
for any pair (nℓ , r) with nℓ > r ≥ 1. The construction makes use of
a scalar MDS code precoder and the resultant MBR code possesses
the RBT property. In [117], the authors present the construction of a
minimum-distance-optimal LRGC with MBR all-symbol locality where
the local MBR codes are polygonal MBR codes. Interestingly, this LRGC
construction may be regarded as replacing the scalar MDS precoder
appearing in the construction of the polygonal MBR code, with a scalar
all-symbol locality code having optimal minimum distance such as the
Tamo-Barg code (see Section 10.7). The resultant LRGC code has
parameters
where ν ≥ 2, and the local codes are polygonal MBR codes having
parameters
( !)
r
(nℓ , r, d = nℓ − 1), (α = nℓ − 1, β = 1), Kℓ = rα − .
2
1 2 3 4 5 6 7 8 9 P 1 2 3 4 5 6 7 8 9 P 1 2 3 4 5 6 7 8 9 P
Scalar All-Symbol Locality Code
Figure 15.1: The upper portion of the figure shows an example LRGC C having
parameters [[n = 15, K = 20, dmin = 5, α = 4]] that is optimal with respect to (15.4).
There are 3 disjoint
local codes, each of which is a pentagon-MBR code having
parameter set (5, 3, 4), (4, 1), 9, F31 . The set of 30 scalar symbols shown in the
bottom portion of the figure form a [30, 20, 9] LRC with all-symbol locality (optimal
with respect to (10.1)), where each local code is a [10, 9, 2] MDS code. The 30 symbols
of the LRC are used to label the (3 × 10) = 30 edges of the 3 pentagons.
over F31 . Owing to the data collection property, the contents of each
pentagon should be decodable from the contents of any r = 3 nodes.
This calls for the 10 scalar symbols populating the MBR code to form a
[10, 9, 2] MDS code. By concatenating three codewords of the [10, 9, 2]
MDS code, we can populate the three pentagons and we will in this way,
have satisfied the node repair and data collection properties required to
say that each local code is an MBR code having the desired parameters
given above. The periodic sequence {bi } in this case is given by
4, 3, 2, 0, 0, 4, 3, 2, 0, 0 , 4, 3, 2, 0, 0 .
| {z } | {z } | {z }
first period second period third period
GRS Codes We recall from Section 2 that an [n, k] GRS code C over
F is a code of the form:
TrF/B (xγi ) = ai ∈ B, i = 1, 2, · · · , t.
This can be seen using the trace-dual basis. Associated to every basis,
(γ1 , · · · , γt ) for F over B, there is a second basis, (γ1∗ , · · · , γt∗ ) for F over
768 Repair of Reed-Solomon Codes
B, known as the trace-dual basis (for instance, see [156, Ch. 4]) that
satisfies:
1, i = j,
TrF/B (γi γj∗ ) =
0, else.
Using the trace-dual basis, we can recover x from {ai }ti=1 via:
t
x = ai γi∗ .
X
i=1
Then the i0 th code symbol ui0 f (αi0 ) for i0 ∈ [n], can be recovered by
downloading b = bi symbols over the base field B.
P
i∈[n]\{i0 }
16.2. Tools Employed 769
Proof: Let the dual code of the [n, k] GRS code C associated to
evaluation set E, and scaling set u = (u1 , · · · , un ) be the [n, n − k] GRS
code having scaling set vector v = (v1 , · · · , vn ) (and the same evaluation
set E). Since each of the repair polynomials has degree < (n − k), for
every j ∈ [t] we have
(v1 gj (α1 ), · · · , vn gj (αn )) ∈ C ⊥ ,
n
=⇒ ui vi f (αi )gj (αi ) = 0,
X
i=1
=⇒ ui0 vi0 f (αi0 )gj (αi0 ) = − ui vi f (αi )gj (αi ).
X
i∈[n]\{i0 }
i∈[n]\{i0 }
(16.3)
Note that since
dimB g1 (αi0 ), g2 (αi0 ), · · · , gt (αi0 ) = t,
it follows that the elements {gj (αi0 )}tj=1 form a basis for F over B. Hence,
the values of ui0 vi0 f (αi0 ) and hence of f (αi0 ), can be determined by
making use of (16.3), for all j ∈ {1, 2, · · · , t}. This approach requires
that the ith node, i ̸= i0 , provides the t values:
n o
TrF/B (ui vi f (αi )gj (αi )) | j = 1, 2, · · · , t . (16.4)
However since the space
Wi := g1 (αi ), g2 (αi ), · · · , gt (αi ) ,
has dimension bi over B, it follows that in place of t, bi scalars from
the base field B suffice to provide the information content contained
in (16.4). More specifically, if the set {θi1 , · · · , θibi } is a basis for Wi , it
suffices for the ith node to supply the bi symbols:
n o
TrF/B (ui vi f (αi )θij ) | j = 1, 2, · · · , bi .
It follows that node i0 can be repaired using the repair bandwidth
associated to a set of b = bi symbols over the base field B. □
P
i∈[n]\{i0 }
770 Repair of Reed-Solomon Codes
Remark 17. It turns out that the converse of Lemma 15 is also true.
Let C be an [n, k] GRS code and (u1 f (α1 ), · · · , un f (αn )) be a codeword
in C. Linear repair of the i0 th code symbol ui0 f (αi0 ) by downloading
bi symbols over the base field B from node i, for all i ∈ [n] \ {i0 }, is
possible only if there exists a set of t polynomials {g1 (x), · · · , gt (x)}
over F, each of degree < (n − k) satisfying (16.2). We refer the reader
to [85] for a proof.
In [85], the authors identify a set {gj (x) | j ∈ [t]}, of repair polynomials
leading to a repair scheme for an RS code having parameters (n ≤
q t , k ≤ n − q t−1 ) where remarkably, the repair bandwidth b = (n − 1),
measured in number of symbols over the base field B = Fq , is as small
as possible.
bi = 1, 1 ≤ i ≤ n, i ̸= i0 ,
Lemma 16.
t i = i0
dimB g1 (αi ), g2 (αi ), · · · , gt (αi ) =
1 i ̸= i0 .
16.4. Dau-Milenkovic Repair Scheme 771
t−1
Proof: Note that TrF/B (x) = x + xq + · · · + xq . It follows that
gj (αi0 ) = γj and therefore,
dimB g1 (αi0 ), · · · , gt (αi0 ) = t.
n ≤ qt, k ≤ n − qs ,
for all j ∈ [t], where {γ1 , γ2 , · · · , γt } is a basis for F over B. Note that
the deg(gj (x)) = q s − 1 < n − k, as needed. The lemma below shows
how this choice of repair polynomial set permits recovery of f (αi0 ) by
downloading ≤ (n − 1)(t − s) symbols from B.
Lemma 17.
t i = i0
dimB g1 (αi ), · · · , gt (αi ) =
≤ t − s i ̸= i0 ,
16.5. Bounds on Repair-Bandwidth 773
Let C be an [n, k] GRS code over F. We claim [48], [85] that any linear
repair scheme for C will necessarily incur a repair bandwidth of at least:
q t (n − 1)
b ≥ (n − 1) logq (16.7)
(n − k − 1)(q t − 1) + n − 1
units, measured in terms of number of symbols over the subfield B.
Proof. From Remark 17, we know that given an evaluation point αi0 ∈ F,
there exists a set {g1 (x), g2 (x), . . . , gt (x)}, gj (x) ∈ F[x], of t repair
polynomials, each having degree < (n − k), such that:
dimB g1 (αi0 ), g2 (αi0 ), · · · , gt (αi0 ) = t.
774 Repair of Reed-Solomon Codes
Let
dimB g1 (αi ), g2 (αi ), · · · , gt (αi ) = di ,
for all i ∈ [n] \ {i0 }. It follows that the repair bandwidth needed to
recover the code symbol corresponding to the evaluation point αi0 using
(16.3) is now given by i∈[n]\{i0 } di . For i ∈ [n] \ {i0 }, let the subspace
P
We have dimB (Si ) = t−di and hence the cardinality of the set of nonzero
elements in Si is given by q t−di − 1. As the next step, we determine the
average number ρ of sets {Si , i ∈ [n] \ {i0 }} that a nonzero element in
Bt belongs to:
1
ρ := |{i ∈ [n] \ {i0 } : s ∈ Si }|
X
(q t − 1) t
s̸=0,s∈B
1
= |{s ∈ Si , s ̸= 0}|
X
(q − 1) i∈[n]\{i }
t
0
1
= (q t−di − 1). (16.8)
X
(q t − 1) i∈[n]\{i }
0
Clearly, there exists a t-tuple s∗ := (s∗1 , s∗2 , . . . , s∗t ) ∈ Bt \ {0}, such that
the polynomial g ∗ (x) = j∈[t] s∗j gj (x) vanishes on at least ρ evaluation
P
min
X
di
{di ∈[0,t]}
i∈[n],i̸=i0
subject to (16.10). It turns out that the minimum occurs when {di }
are balanced and this results in the following lower bound on repair
bandwidth:
n−1
b ≥ (n − 1) logq
ρ′
q t (n − 1)
= (n − 1) logq .
(n − k − 1)(q t − 1) + n − 1
Notes
1. An early paper: The line of work in which scalar MDS codes
are vectorized by treating each code symbol belonging to a field
F, as a vector over a base field B, for the purpose of reducing
776 Repair of Reed-Solomon Codes
MDS Codes
Distributed systems such as Hadoop, Google File System and Windows
Azure have evolved to support erasure codes so as to derive the benefits
of improved storage efficiency in comparison with simple replication.
MDS codes in general, and RS codes in particular, are the most common
form of erasure coding employed here. Examples include the [9, 6] RS
code employed in the Hadoop Distributed File System, the [14, 10] RS
code in Facebook’s f4 Storage System and the [11, 8] RS code employed
in Yahoo Cloud Object Storage, see [47] for additional examples.
Regenerating Codes
NCCloud: The NCCloud storage system described in [98], is one of the
earliest projects that dealt with the performance evaluation of RGCs
in practice. The NCCloud system employs an (n, k = n − 2, d = n − 1)
MSR code with functional repair. The performance evaluation is carried
out for an (n = 4, k = 2, d = 3) case and compared against RAID-6.
778
779
LRCs
Windows Azure Code: In [103], the authors compare performance eval-
uation results of an (n = 16, k = 12, r = 6, δ = 2) LRC with that of an
[16, 12, 5] RS code in the Azure production cluster and demonstrate the
repair savings of LRCs. Subsequently the authors [101] implemented
an (n = 18, k = 14, r = 7, δ = 2) LRC in Microsoft’s Windows Azure
Storage system and showed that this code has repair degree comparable
to that of an [9, 6, 4] RS code, but has storage overhead 1.29 versus
1.5 in the case of the RS code. This reduction in storage overhead has
reportedly resulted in significant cost savings for Microsoft [161].
782
References
783
784 References
[90] J. Hao, K. Shum, S.-T. Xia, and Y.-X. Yang, “On the maximal
code length of optimal linear locally repairable codes,” in Proc.
IEEE International Symposium on Information Theory, Vail,
CO, USA, 2018, 2018.
[91] J. Hao, S. Xia, and B. Chen, “On the linear codes with (r, δ)-
locality for distributed storage,” in Proc. IEEE International
Conference on Communications, ICC 2017, Paris, France, May
21-25, 2017, pp. 1–6, IEEE.
[92] J. Hao, S. Xia, K. W. Shum, B. Chen, F. Fu, and Y. Yang,
“Bounds and constructions of locally repairable codes: Parity-
check matrix approach,” IEEE Trans. Inf. Theory, vol. 66, no. 12,
2020, pp. 7465–7474.
[93] K. Haymaker, B. Malmskog, and G. L. Matthews, “Locally
recoverable codes with availability t≥2 from fiber products of
curves,” Adv. Math. Commun., vol. 12, no. 2, 2018, pp. 317–336.
[94] T. Helleseth, T. Klove, V. I. Levenshtein, and O. Ytrehus,
“Bounds on the minimum support weights,” IEEE Trans. Inf.
Theory, vol. 41, no. 2, 1995, pp. 432–440.
[95] H. Hou, P. P. C. Lee, K. W. Shum, and Y. Hu, “Rack-aware
regenerating codes for data centers,” IEEE Trans. Inf. Theory,
vol. 65, no. 8, 2019, pp. 4730–4745.
[96] G. Hu and S. Yekhanin, “New constructions of SD and MR codes
over small finite fields,” in Proc. IEEE International Symposium
on Information Theory, Barcelona, Spain, 2016, pp. 1591–1595,
2016.
[97] P. Hu, C. W. Sung, and T. H. Chan, “Broadcast repair for
wireless distributed storage systems,” in Proc. 10th Interna-
tional Conference on Information, Communications and Signal
Processing, Singapore, 2015, pp. 1–5, 2015.
[98] Y. Hu, H. C. H. Chen, P. P. C. Lee, and Y. Tang, “NCCloud:
Applying network coding for the storage repair in a cloud-of-
clouds,” in Proc. 10th USENIX conference on File and Storage
Technologies, San Jose, CA, USA, 2012, p. 21, 2012.
References 793
[216] S. Shao, T. Liu, C. Tian, and C. Shen, “On the tradeoff region
of secure exact-repair regenerating codes,” IEEE Trans. Inf.
Theory, vol. 63, no. 11, 2017, pp. 7253–7266.
[217] D. Shivakrishna, V. A. Rameshwar, V. Lalitha, and B. Sasidha-
ran, “On maximally recoverable codes for product topologies,” in
Proc. Twenty Fourth National Conference on Communications,
IEEE, pp. 1–6, 2018.
[218] K. W. Shum and Y. Hu, “Cooperative regenerating codes,” IEEE
Trans. Inf. Theory, vol. 59, no. 11, 2013, pp. 7229–7258.
[219] N. Silberstein and A. Zeh, “Optimal binary locally repairable
codes via anticodes,” in Proc. IEEE International Symposium
on Information Theory, Hong Kong, 2015, pp. 1247–1251, 2015.
[220] N. Silberstein, “Optimal locally repairable codes via rank-metric
codes,” talk presented at the conference on Trends in Coding
Theory, Ascona, Switzerland, Oct. 28 to Nov. 2, 2012 (joint work
with A. S. Rawat and S. Vishwanath).
[221] N. Silberstein and T. Etzion, “Optimal fractional repetition
codes based on graphs and designs,” IEEE Trans. Inf. Theory,
vol. 61, no. 8, 2015, pp. 4164–4180.
[222] N. Silberstein and A. Zeh, “Anticode-based locally repairable
codes with high availability,” Designs, Codes and Cryptography,
vol. 86, Feb. 2018.
[223] R. Singleton, “Maximum distance q-nary codes,” IEEE Trans.
Inf. Theory, vol. 10, no. 2, 1964, pp. 116–118.
[224] J.-Y. Sohn, B. Choi, S. W. Yoon, and J. Moon, “Capacity of
clustered distributed storage,” IEEE Trans. Inf. Theory, vol. 65,
no. 1, 2019, pp. 81–107.
[225] W. Song, S. H. Dau, C. Yuen, and T. J. Li, “Optimal locally
repairable linear codes,” IEEE Journal on Selected Areas in
Communications, vol. 32, no. 5, 2014, pp. 1019–1036.
[226] W. Song, K. Cai, C. Yuen, K. Cai, and G. Han, “On sequential
locally repairable codes,” IEEE Trans. Inf. Theory, vol. 64, no. 5,
2018, pp. 3513–3527.
[227] C. Suh and K. Ramchandran, “Exact-repair MDS code construc-
tion using interference alignment,” IEEE Trans. Inf. Theory,
vol. 57, no. 3, 2011, pp. 1425–1442.
806 References
[248] Z. Wang, I. Tamo, and J. Bruck, “On codes for optimal re-
building access,” in Proc. 49th Annual Allerton Conference on
Communication, Control, and Computing 2011, pp. 1374–1381,
2011.
[249] Z. Wang, I. Tamo, and J. Bruck, “Long MDS codes for optimal
repair bandwidth,” in Proc. IEEE International Symposium on
Information Theory, Cambridge, MA, USA, 2012, pp. 1182–1186,
2012.
[250] V. K. Wei, “Generalized Hamming weights for linear codes,”
IEEE Trans. Inf. Theory, vol. 37, no. 5, 1991, pp. 1412–1418.
[251] Y. Wu, “Existence and construction of capacity-achieving net-
work codes for distributed storage,” IEEE Journal on Selected
Areas in Communications, vol. 28, no. 2, 2010, pp. 277–288.
[252] E. Yavari and M. Esmaeili, “Locally repairable codes: Joint
sequential–parallel repair for multiple node failures,” IEEE Trans.
Inf. Theory, vol. 66, no. 1, 2020, pp. 222–232.
[253] F. Ye, K. W. Shum, and R. W. Yeung, “The rate region for
secure distributed storage systems,” IEEE Trans. Inf. Theory,
vol. 63, no. 11, 2017, pp. 7038–7051.
[254] M. Ye and A. Barg, “Explicit constructions of MDS array codes
and RS codes with optimal repair bandwidth,” in Proc. IEEE
International Symposium on Information Theory, Barcelona,
Spain, 2016, pp. 1202–1206, 2016.
[255] M. Ye and A. Barg, “Explicit constructions of high-rate MDS
array codes with optimal repair bandwidth,” IEEE Trans. Inf.
Theory, vol. 63, no. 4, 2017, pp. 2001–2014.
[256] M. Ye and A. Barg, “Explicit constructions of optimal-access
MDS codes with nearly optimal sub-packetization,” IEEE Trans.
Inf. Theory, vol. 63, no. 10, 2017, pp. 6307–6317.
[257] M. Ye and A. Barg, “Cooperative repair: Constructions of optimal
MDS codes for all admissible parameters,” IEEE Trans. Inf.
Theory, vol. 65, no. 3, 2018, pp. 1639–1656.
[258] R. W. Yeung, “A framework for linear information inequalities,”
IEEE Trans. Inf. Theory, vol. 43, no. 6, 1997, pp. 1924–1934.
References 809
[259] A. Zeh and E. Yaakobi, “Optimal linear and cyclic locally re-
pairable codes over small fields,” in Proc. IEEE Information
Theory Workshop, Jerusalem, Israel, 2015, pp. 1–5, 2015.
[260] G. Zhang and H. Liu, “Constructions of optimal codes with
hierarchical locality,” IEEE Trans. Inf. Theory, vol. 66, no. 12,
2020, pp. 7333–7340.
[261] J. Zhang, X. Wang, and G. Ge, “Some improvements on locally
repairable codes,” CoRR, vol. abs/1506.04822, 2015.
[262] M. Zhang and R. Li, “Two families of LRCs with availability
based on iterative matrix,” in Proc. 13th International Sym-
posium on Computational Intelligence and Design, Hangzhou,
China, 2020, pp. 334–337, 2020.
[263] L. Zhou and Z. Zhang, “Explicit construction of min-
imum bandwidth rack-aware regenerating codes,” CoRR,
vol. abs/2103.01533, 2021.
[264] B. Zhu, K. W. Shum, H. Li, and H. Hou, “General fractional
repetition codes for distributed storage systems,” IEEE Commun.
Lett., vol. 18, no. 4, 2014, pp. 660–663.
[265] M. Zorgui and Z. Wang, “Centralized multi-node repair regen-
erating codes,” IEEE Trans. Inf. Theory, vol. 65, no. 7, 2019,
pp. 4180–4206.
Index
810
Index 811