0% found this document useful (0 votes)
7 views272 pages

Codes For Distributed Storage: Foundations and Trends in Communications and Information Theory

The document discusses various coding techniques for distributed storage systems, focusing on the protection against data loss and the minimization of storage overhead. It introduces concepts such as regenerating codes, locally recoverable codes, and maximum distance separable codes, which aim to improve data recovery efficiency and reduce repair bandwidth. The monograph also highlights fundamental bounds, code constructions, and identifies open problems in the field of distributed storage coding.

Uploaded by

hemant singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views272 pages

Codes For Distributed Storage: Foundations and Trends in Communications and Information Theory

The document discusses various coding techniques for distributed storage systems, focusing on the protection against data loss and the minimization of storage overhead. It introduces concepts such as regenerating codes, locally recoverable codes, and maximum distance separable codes, which aim to improve data recovery efficiency and reduce repair bandwidth. The monograph also highlights fundamental bounds, code constructions, and identifies open problems in the field of distributed storage coding.

Uploaded by

hemant singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 272

Foundations and Trends® in Communications and

Information Theory
Codes for Distributed Storage
Suggested Citation: Vinayak Ramkumar, S. B. Balaji, Birenjith Sasidharan, Myna
Vajha, M. Nikhil Krishnan and P. Vijay Kumar (2022), “Codes for Distributed Storage”,
Foundations and Trends® in Communications and Information Theory: Vol. 19, No. 4,
pp 547–813. DOI: 10.1561/0100000115.

Vinayak Ramkumar
Indian Institute of Science, Bengaluru
[email protected]
S. B. Balaji
Qualcomm, Bengaluru
[email protected]
Birenjith Sasidharan
Govt. Engineering College, Barton Hill, Trivandrum
[email protected]
Myna Vajha
Qualcomm, Bengaluru
[email protected]
M. Nikhil Krishnan
International Institute of Information Technology Bangalore
[email protected]
P. Vijay Kumar
Indian Institute of Science, Bengaluru
[email protected]
This article may be used only for the purpose of research, teaching,
and/or private study. Commercial use or systematic downloading (by
robots or other automatic processes) is prohibited without explicit
Publisher approval.
Boston — Delft
Contents

1 Introduction 549
1.1 Conventional Repair of an MDS Code . . . . . . . . . . . 549
1.2 Regenerating Codes and Locally Recoverable Codes . . . . 551
1.3 Overview of the Monograph . . . . . . . . . . . . . . . . . 553

2 Maximum Distance Separable Codes 557


2.1 Reed-Solomon Codes . . . . . . . . . . . . . . . . . . . . 557
2.2 Singleton Bound . . . . . . . . . . . . . . . . . . . . . . . 559
2.3 Generalized Reed-Solomon Codes . . . . . . . . . . . . . . 560
2.4 Systematic Encoding . . . . . . . . . . . . . . . . . . . . 562
2.5 Cauchy MDS Codes . . . . . . . . . . . . . . . . . . . . . 562

3 Regenerating Codes 567


3.1 Definition and Terminology . . . . . . . . . . . . . . . . . 567
3.2 Bound on File Size . . . . . . . . . . . . . . . . . . . . . 572
3.3 Storage-Repair-Bandwidth Tradeoff . . . . . . . . . . . . . 576
3.4 Network Coding Approach to the File-Size Bound . . . . . 579
3.5 Overview of RGC-Related Topics in the Monograph . . . . 582

4 MBR Codes 584


4.1 Polygonal MBR Code . . . . . . . . . . . . . . . . . . . . 585
4.2 Product-Matrix MBR Code . . . . . . . . . . . . . . . . . 587
5 MSR Codes 591
5.1 Product-Matrix MSR Code . . . . . . . . . . . . . . . . . 592
5.2 Diagonal-Matrix-Based MSR Code . . . . . . . . . . . . . 598
5.3 Coupled-Layer MSR Code . . . . . . . . . . . . . . . . . . 603
5.4 Small-d MSR Codes . . . . . . . . . . . . . . . . . . . . . 611

6 Storage-Repair-Bandwidth Tradeoff 616


6.1 Piecewise Linear Nature of FR Tradeoff . . . . . . . . . . 616
6.2 ER Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . 619
6.3 Non-existence of ER Codes Achieving FR Tradeoff . . . . . 620
6.4 Outer Bounds on the Tradeoff Under ER . . . . . . . . . . 622

7 Interior-Point ER Codes 626


7.1 Determinant Code . . . . . . . . . . . . . . . . . . . . . . 627
7.2 Cascade Code . . . . . . . . . . . . . . . . . . . . . . . . 633
7.3 Moulin Code . . . . . . . . . . . . . . . . . . . . . . . . . 634

8 Lower Bounds on Sub-Packetization Level of MSR Codes 641


8.1 Properties of Repair Subspaces . . . . . . . . . . . . . . . 642
8.2 Lower Bound for Optimal-Access MSR Codes . . . . . . . 648
8.3 Lower Bound for General MSR Codes . . . . . . . . . . . 651

9 Variants of Regenerating Codes 657


9.1 MDS Codes that Trade Repair Bandwidth for Reduced
Sub-Packetization Level . . . . . . . . . . . . . . . . . . . 657
9.2 Fractional Repetition Codes . . . . . . . . . . . . . . . . . 662
9.3 Cooperative Regenerating Codes . . . . . . . . . . . . . . 665
9.4 Secure Regenerating Codes . . . . . . . . . . . . . . . . . 666
9.5 Rack-Aware Regenerating Codes . . . . . . . . . . . . . . 668

10 Locally Recoverable Codes 670


10.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 670
10.2 Nonlinear LRCs . . . . . . . . . . . . . . . . . . . . . . . 671
10.3 Linear LRCs . . . . . . . . . . . . . . . . . . . . . . . . . 671
10.4 Bounds on dmin and Rate for Linear LRCs . . . . . . . . . 673
10.5 Pyramid LRC . . . . . . . . . . . . . . . . . . . . . . . . 678
10.6 Azure LRC . . . . . . . . . . . . . . . . . . . . . . . . . . 680
10.7 Tamo-Barg LRC . . . . . . . . . . . . . . . . . . . . . . . 681
10.8 Bounds on dmin and Rate for Nonlinear LRCs . . . . . . . 686
10.9 Extended Notions of Locality . . . . . . . . . . . . . . . . 688

11 Codes with Availability 695


11.1 Linear Availability Codes . . . . . . . . . . . . . . . . . . 699
11.2 Constructions of Linear Availability Codes . . . . . . . . . 699
11.3 Upper Bounds on dmin of Linear Availability Codes . . . . 703
11.4 Strict Availability . . . . . . . . . . . . . . . . . . . . . . 708

12 LRCs with Sequential Recovery 714


12.1 Recovery from Two or Three Erasures . . . . . . . . . . . 715
12.2 The General Case . . . . . . . . . . . . . . . . . . . . . . 720

13 Hierarchical Locality 729


13.1 An Upper Bound on dmin . . . . . . . . . . . . . . . . . . 732
13.2 Optimal Constructions . . . . . . . . . . . . . . . . . . . . 735

14 Maximally Recoverable Codes 739


14.1 Recoverable Erasure Patterns . . . . . . . . . . . . . . . . 740
14.2 Defining Maximally Recoverable Codes . . . . . . . . . . . 743
14.3 Existence of MRCs . . . . . . . . . . . . . . . . . . . . . 745
14.4 MRCs Constructed using Linearized Polynomials . . . . . . 747
14.5 Reduced Field-Size Construction for the Disjoint Locality
Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749

15 Codes with Combined Locality and Regeneration 756


15.1 Locality of a Code with Vector Alphabet . . . . . . . . . . 756
15.2 Codes with MSR/MBR Locality . . . . . . . . . . . . . . 758

16 Repair of Reed-Solomon Codes 765


16.1 Vectorization Approach . . . . . . . . . . . . . . . . . . . 765
16.2 Tools Employed . . . . . . . . . . . . . . . . . . . . . . . 767
16.3 Guruswami-Wootters Repair Scheme . . . . . . . . . . . . 770
16.4 Dau-Milenkovic Repair Scheme . . . . . . . . . . . . . . . 771
16.5 Bounds on Repair-Bandwidth . . . . . . . . . . . . . . . 773
17 Codes in Practice 778

Acknowledgements 782

References 783
Codes for Distributed Storage
Vinayak Ramkumar1 , S. B. Balaji2 , Birenjith Sasidharan3 , Myna
Vajha4 , M. Nikhil Krishnan5 and P. Vijay Kumar6
1 Indian Institute of Science, Bengaluru, India; [email protected]
2 Qualcomm, Bengaluru, India; [email protected]
3 Govt. Engineering College, Barton Hill, Trivandrum, India;

[email protected]
4 Qualcomm, Bengaluru, India; [email protected]
5 International Institute of Information Technology Bangalore, India;

[email protected]
6 Indian Institute of Science, Bengaluru, India; [email protected]

ABSTRACT
In distributed data storage, information pertaining to a given
data file is stored across multiple storage units or nodes in
redundant fashion to protect against the principal concern,
namely, the possibility of data loss arising from the failure
of individual nodes. The simplest form of such protection
is replication. The explosive growth in the amount of data
generated on a daily basis brought up a second major con-
cern, namely minimization of the overhead associated with
such redundant storage. This concern led to the adoption by
the storage industry of erasure-recovery codes such as Reed-
Solomon (RS) codes and more generally, maximum distance
separable codes, as these codes offer the lowest-possible
storage overhead for a given level of reliability.
In the setting of a large data center, where the amount of
stored data can run into several exabytes, a third concern

Vinayak Ramkumar, S. B. Balaji, Birenjith Sasidharan, Myna Vajha, M. Nikhil


Krishnan and P. Vijay Kumar (2022), “Codes for Distributed Storage”, Foundations
and Trends® in Communications and Information Theory: Vol. 19, No. 4, pp 547–813.
DOI: 10.1561/0100000115.
©2022 V. Ramkumar et al.
548

arises, namely the need for efficient recovery from a com-


monplace occurrence, the failure of a single storage unit.
One measure of efficiency in node repair is how small one
can make the amount of data download needed to repair
a failed unit, termed the repair bandwidth. This was the
subject of the seminal paper by Dimakis et al. [50] in which
an entirely new class of codes called regenerating codes was
introduced, that within a certain repair framework, had
the minimum-possible repair bandwidth. A second measure
relates to the number of helper nodes contacted for node
repair, termed the repair degree. A low repair degree is
desirable as this means that a smaller number of nodes are
impacted by the failure of a given node. The landmark paper
by Gopalan et al. [72] focuses on this second measure, lead-
ing to the development of the theory of locally recoverable
codes. The two events also led to the creation of a third
class of codes known as locally regenerating codes, where the
aim is to simultaneously achieve reduced repair bandwidth
and low repair degree. Research in a different direction led
researchers to take a fresh look at the challenge of efficient
RS-code repair, and led to the identification of improved
repair schemes for RS codes that have significantly reduced
repair bandwidth.
This monograph introduces the reader to these different
approaches towards efficient node repair and presents many
of the fundamental bounds and code constructions that have
since emerged. Several open problems are identified, and
many of the sections have a notes subsection at the end that
provides additional background.
1
Introduction

Given the failure-prone nature of a storage device, reliability against data


loss has always been of paramount importance in the storage industry.
In the early days, this was achieved through simple replication of data,
for example, triple replication was a commonplace selection within the
Hadoop distributed file system (HDFS). However, the explosive growth
in the amount of data stored over the past couple of decades encouraged
the industry to look for other means of ensuring reliability and having
less storage overhead. Here, the class of maximum distance separable
(MDS) codes are a natural choice as they incur the least amount of
storage overhead for a given level of protection, measured in terms of
the maximum number of node failures that can be tolerated.

1.1 Conventional Repair of an MDS Code

Many of the schemes employed in redundant array of independent


disks (RAID) technology make use of MDS codes. An [n, k] MDS code
is a block code of length n and dimension k over a suitably-defined
finite field. To store data using an [n, k] MDS code, the data file is
first partitioned into k equal-sized fragments, that are then stored on
k distinct storage units. An additional set of r = (n − k) fragments

549
550 Introduction

of redundant data are then created and stored on a further set of r


storage units in such a manner that the contents of any k out of the
n storage units suffice to recover the data. In this way, the contents
of a file are efficiently stored in redundant fashion, across a set of n
storage units. For example, RAID 6 makes use of a [5, 3] MDS code.
Other examples of MDS codes that appear in the erasure coded-version
HDFS-EC of HDFS are a [9, 6] MDS code as well as a [14, 10] code,
the latter employed by Facebook. Throughout the monograph, we will
alternately refer to a storage unit as a node.
Today’s data centers store massive amounts of information, amounts
that can run into several exabytes, i.e., 1018 bytes. While protection
against data loss and maintaining low values of storage overhead continue
to be of primary importance, a third concern has recently surfaced. This
has to do with the efficiency with which a failed storage unit can be
repaired. We will view the repair process as one in which a new storage
unit, which we will term as the replacement node, is brought in as a
substitute for a failed storage unit. The replacement node then draws
from the partial or entire contents of all or a subset of the remaining
(n − 1) nodes, and uses the data so received to replicate the contents of
the original failed node.
As is well known, an [n, k, dmin ] code C is protected against data
loss if the number of node failures does not exceed (dmin − 1). For a
given value of (dmin − 1), MDS codes in general, and Reed-Solomon (RS)
codes in particular, have the least possible value of storage overhead
given by nk = n−dmin n
+1 . This follows as the minimum distance dmin
of an MDS code satisfies dmin = (n − k + 1), which by the Singleton
bound [156] is the largest value possible. In coding-theoretic terms, the
problem of node repair is equivalent to recovery from erasure of a single
code symbol. The most obvious approach is to invoke a parity-check
(p-c) equation involving the erased code symbol. Let
c = (c1 c2 . . . cn )
be a code word and let us assume without loss of generality, erasure of
the first code symbol c1 . Any p-c equation involving c1 of the form
n
hi ci = 0, h1 ̸= 0,
X

i=1
1.2. Regenerating Codes and Locally Recoverable Codes 551

is associated to a codeword

h = (h1 h2 . . . hn )

belonging to the [n, n − k] dual code C ⊥ . In the case of an MDS code C,


its dual C ⊥ is also an MDS code and hence has parameters [n, n−k, k+1].
Thus any codeword h in C ⊥ has Hamming weight wH (h) ≥ k + 1. Thus
if a p-c equation
n
hi ci = 0,
X

i=1

is used to recover the code symbol c1 , then we have


n 
−hi 
c1 = (1.1)
X
ci ,
i=2
h1
−hi
with at least k terms of the form h1 on the right side being nonzero.

1.2 Regenerating Codes and Locally Recoverable Codes

For the operation of a data center, equation (1.1) has two implications.
Firstly, that the replacement of the failed node must necessarily contact
k “helper nodes”, i.e., nodes that store the code symbols {ci | hh1i ̸= 0}.
Secondly, equation (1.1) suggests that each helper node must transfer
its entire contents (represented by ci ) for repair of the failed node. The
number of helper nodes contacted (at least k in the case of an MDS
code) is called the repair degree of the code. The total amount of data
downloaded for repair of the failed node is termed the repair bandwidth.
In the case of an MDS code, it is clear that the repair bandwidth is at
least k times the amount of data stored in the failed node.
This is illustrated below in the case of an [14, 10] MDS code. Assume
a data file of size equal to 1 GB. The data file is partitioned into 10
fragments, each of size 100 MB and each data fragment is stored in a
different node. Four parity nodes are then created, corresponding to
the four parity symbols of the MDS code. The contents of the 14 nodes
can be regarded as the layering of 108 codewords, each belonging to the
[14, 10] MDS code over F28 . Fig. 1.1 shows repair of a failed node. As
552 Introduction

1 2 3 4 5 6 7 8 9 10 11 12 13 14
100 100 100 100 100 100 100 100 100 100 100 100 100
MB MB MB MB MB MB MB MB MB MB MB MB MB
Failed
Node

Data Node
10 X 100MB
Parity Node
100 Replacement
MB Node

Figure 1.1: Illustrating the repair degree and repair bandwidth involved in the
conventional repair of a failed node in a [14, 10] MDS code

can be seen, there are k = 10 helper nodes corresponding to nodes 2


through 11 and each helper node passes on the 100 MB of data or parity
stored in the respective node, to the repair center. Thus in this case the
repair degree equals 10 and the repair bandwidth equals 10 × 100 MB=
1 GB.
Seminal papers by Dimakis et al. [50] and Gopalan et al. [72] heralded
the theory of two entirely new classes of erasure-recovery codes, termed
as regenerating codes (RGCs) and locally recoverable codes (LRCs), that
were designed with the express aim of lowering the repair bandwidth
and repair degree respectively. The development of the theory of RGCs
and LRCs also led to the creation of a class of codes termed as locally
regenerating codes by Kamath et al. [117] and Rawat et el. [189], where
the aim is to simultaneously achieve reduced repair bandwidth and low
repair degree. Research in a slightly different direction, pioneered by
Shanmugam et al. [215] and Guruswami and Wootters [85], led to a
re-examination of the repair bandwidth of RS codes and the design of
more efficient repair schemes that permitted node repair with reduced
repair bandwidth.
As an indication of the kind of impact that research on the topics
of RGCs and LRCs has had on the development of coding theory, we
note that papers reporting research in this area have received many
best paper awards over the years. The list includes [50], [62], [72], [103],
[137], [185], [228], [229], [237], [255].
1.3. Overview of the Monograph 553

1.3 Overview of the Monograph

This monograph presents an overview of how research on the topic of


codes for distributed storage has evolved in a certain direction (see
Fig. 1.2 for an overview of topics covered here). There have been several
excellent prior surveys on the topic, including those found in [46], [51],
[136], [145]. Additionally, concise surveys by the authors of the present
monograph can be found in [10], [178].
Given the vast nature of the literature on the topic of codes for
distributed storage, we have undoubtedly missed many papers that have
made a strong contribution. We apologize in advance to the authors of
these papers for the inadvertent omission. Furthermore, as can be seen
from the listing of topics in Fig. 1.2, our focus here is only on certain
specific approaches to coding for distributed storage.

Codes for distributed storage

Minimize repair bandwidth Minimize repair degree Minimize both repair Improved repair of
bandwidth and repair degree RS codes

Regenerating codes Locally recoverable codes Locally regenerating codes

Figure 1.2: An overview of the coverage of codes for distributed storage in this
monograph.

MDS Codes Section 2 provides background on MDS, RS codes and a


generalization of RS codes known as generalized RS (GRS) codes.

Regenerating Codes The next seven sections deal with RGCs. The
definition of an RGC along with a fundamental upper bound on file
size is presented in Section 3. The bound reveals that there is a tradeoff
between the storage overhead and the repair bandwidth. Sections 4
and 5 present constructions for the two main classes of RGCs, namely
minimum bandwidth regenerating (MBR) codes and minimum storage
regenerating (MSR) codes, that lie at the two ends of the storage-repair
554 Introduction

bandwidth tradeoff. The tradeoff itself is explored in the following


section, Section 6. Constructions for RGCs that lie on interior points of
the tradeoff are presented in Section 7. The sub-packetization level of
an RGC may be regarded as denoting the number of symbols stored
per node. An alternate viewpoint is to regard a regenerating code as a
code over a vector symbol alphabet of the form Fαq , with α denoting the
sub-packetization level. Lower values of sub-packetization are desirable
in practice, as a large sub-packetization level, apart from increasing the
complexity of implementation, also limits the smallest size of a file that
can be stored. Section 8 presents lower bounds on the sub-packetization
level of an MSR code.
Several variants of RGCs have been explored in the literature. Piggy-
back codes, ϵ-MSR codes and the codes of Li-Liu-Tang, are MDS codes
that have reduced repair bandwidth and much smaller sub-packetization
level. Cooperative RGCs explore the cooperative repair of a set of t > 1
failed nodes. Secure RGCs are designed to provide security in the pres-
ence of an eavesdropper or an active adversary. Rack-aware RGCs are
designed to minimize the amount of cross-rack repair data that is trans-
ferred. An erasure-recovery code is said to possess the repair-by-transfer
(RBT) property, if it enables repair of a failed node without need for
computation at either helper or replacement node. Fractional repetition
codes form a class of erasure-recovery codes that possesses the repair-
by-transfer property and can be viewed as a generalization of a class of
RBT MBR codes. The former codes potentially offer reduced storage
overhead at the cost of reduced freedom in the selection of helper nodes.
All these variants of RGCs can be found discussed in Section 9.

Locally Recoverable Codes As noted above, the need for repair of a


failed node with low degree prompted the creation of LRCs. Section 10
introduces LRCs and presents an upper bound on the rate and minimum
distance of an LRC as well as optimal code constructions.
One means of handling the simultaneous failure of several nodes
with low repair degree is to make the local codes that are at the core
of an LRC more powerful. There are other approaches however, each
with its own advantages and disadvantages. The three sections that
follow present these other approaches. Availability codes, discussed in
1.3. Overview of the Monograph 555

Section 11, represent one such example. This class of codes has the
additional feature that in the case of a single erased node, there are
multiple, node-disjoint means of recovering from the node failure. This
can be a very useful feature to have in practice, particularly as a means
of handling cases when there are multiple simultaneous demands for
the data contained within a particular node.
Sequential-recovery LRCs place the least stringent conditions on an
LRC for the local recovery from multiple erasures, and consequently,
have smallest possible storage overhead. These are discussed in Sec-
tion 12. If an LRC has large block length and small value of repair
degree r, and a particular local code is overwhelmed by erasures, the
only option is to fall back on the properties of the full-length block code
to recover from the erasure pattern, leading to a sharp increase in the
repair degree. Codes with hierarchical locality, discussed in Section 13,
are designed to address this situation, provide layers of local codes hav-
ing increasing block length as well as erasure-recovery capability, and
permit a more graceful degradation in repair degree with an increasing
number of erasures.
Maximally recoverable codes (MRCs), discussed in Section 14, may
be regarded as the subclass of LRCs that are as MDS as possible in the
sense that every set of k columns of the generator matrix of an MRC is
a linearly independent set, unless the locality constraints imposed make
it impossible for this to happen. An MRC is maximal in the sense that
if an MRC is not able to recover from an erasure pattern, then no other
code satisfying the same locality constraints can possibly recover from
the same erasure pattern.

Locally Regenerating Codes Section 15 introduces a class of codes in


which the local codes are themselves regenerating codes. As a result,
these codes simultaneously offer both low repair degree as well as low
repair bandwidth.

Improved Repair Schemes for RS Codes The evolution of RGCs


and LRCs spurred researchers to take a fresh look at the challenge of
efficient RS-code repair and led to the identification of improved repair
556 Introduction

schemes for RS codes having significantly reduced repair bandwidth.


These developments are described in Section 16.

Codes in Practice The final section, Section 17, discusses the impact
that the theoretical developments discussed in this monograph have
had in practice.
2
Maximum Distance Separable Codes

In this section, we will provide some background on maximum distance


separable (MDS) codes, of which Reed-Solomon (RS) codes are the
principal example. MDS codes are widely used in the storage industry,
appearing for example in the guise of RAID codes. References to MDS
and RS codes can be found scattered throughout the manuscript, as
they are often an ingredient to a particular code construction or else
are closely related in some manner.

2.1 Reed-Solomon Codes

Let Fq be a finite field of q elements. Let Fq [x] denote the set of all
polynomials in x over Fq :
( d )
Fq [x] = ui xi | ui ∈ Fq , d ∈ {0, 1, 2, · · · } .
X

i=0

If f (x) = i=0 ui xi , with ud ̸= 0, then f is said to have degree d and


Pd

monic if ud = 1. Let {θ1 , θ2 , . . . , θn } ⊆ Fq be a set of n distinct elements


and set

CRS = { (f (θ1 ), f (θ2 ), . . . , f (θn )) | f ∈ Fq [x], deg(f ) ≤ k − 1} .

557
558 Maximum Distance Separable Codes

We will refer to a code having the structure of CRS as an [n, k] Reed-


Solomon (RS) code [196]. Thus each codeword c in CRS is of the form

(c1 , c2 , · · · , cn ) = (f (θ1 ), f (θ2 ), . . . , f (θn ))

for some polynomial of the form f (x) = ui xi . We can therefore


Pk−1
i=0
write:

[c1 c2 . . . cn ] = [u0 u1 . . . uk−1 ]G

where the (k × n) generator matrix G of the RS code CRS is the Van-


dermonde matrix given by
1 1 1
 
...

θ1 θ2 ... θn 
= 
 
G .. .. .. .. .
. . . .
 
 
θ1k−1 θ2k−1 . . . θnk−1
From the properties of a Vandermonde matrix, it follows that every (k ×
k) sub-matrix of G is non-singular. Thus CRS is a k-dimensional subspace
of Fnq , i.e., CRS is an [n, k] linear code. We will use the terminology (n, M )
to denote a code C having block length n and of size |C| = M , that is
not necessarily linear. Thus CRS is simultaneously also an (n, q k ) code.
The minimum distance dmin of a code C is the minimum Hamming
distance between a pair of distinct codewords in C. The minimum weight
wmin of a linear code is equal to the minimum Hamming weight of a
nonzero codeword in C. It is straightforward to show that in a linear
code, we must have dmin = wmin . Since a polynomial of degree d can
have at most d zeros, it follows that in the case of an RS code CRS ,
dmin = wmin ≥ (n − k + 1). On the other hand, the polynomial
k−1
f (x) = (x − θj ) (2.1)
Y

j=1

has exactly (k − 1) zeros and it follows from this that dmin = wmin =
(n − k + 1) in the case of an [n, k] RS code.
We use [n, k, dmin ] to denote an [n, k] linear code having minimum
distance dmin . Analogously, we will use (n, M, dmin ) to denote an (n, M )
code having minimum distance dmin . It follows that the RS code CRS is
2.2. Singleton Bound 559

an [n, k, (n − k + 1)] linear code over Fq . The Singleton bound below


will establish that an RS code CRS has the largest possible size among
all codes of block length n and dmin = (n − k + 1).

2.2 Singleton Bound

Let C be an (n, M ) code over an alphabet A of size |A| = q. Thus


C ⊆ An . Let C have minimum distance dmin . We will now derive a
bound on the maximum possible size M of C.
Let A be the (M × n) matrix whose rows are precisely the M
codewords in C. Let B be the (M × (n − dmin + 1)) sub-matrix of A
obtained by restricting attention to the last (n − dmin + 1) columns of A.
Clearly all the rows of B must be distinct, else, C will have minimum
distance ≤ dmin − 1, a contradiction. It follows that M ≤ |A|n−dmin +1 .
This upper bound on the size M of an (n, M ) code having minimum
distance dmin is called the Singleton bound [223].

Theorem 1. (Singleton Bound) The size M of an (n, M, dmin ) code C


over an alphabet A must satisfy the upper bound:

M ≤ |A|n−dmin +1 .

Definition 1. Codes achieving the Singleton bound with equality are


called maximum distance separable (MDS) codes.

Remark 1. An [n, k, dmin ] RS code is an MDS code since dmin = (n −


k + 1) and the code size M equals q k = q n−dmin +1 .

It follows from the arguments used to establish the Singleton bound,


that a codeword belonging to an (n, M, dmin ) code C can be uniquely
identified given access to any set of (n − dmin + 1) code symbols. In
particular, a codeword belonging to a linear [n, k, dmin ] MDS code C,
can be uniquely identified given any set of k code symbols. It follows
from this that if G is the generator matrix of an [n, k] MDS code, then
every (k × k) sub-matrix of G is non-singular. It is straightforward to
establish the converse and hence, an [n, k] code is an MDS code iff every
(k × k) sub-matrix of a generator matrix G for the code is nonsingular.
560 Maximum Distance Separable Codes

2.2.1 Recovery from Erasures


Let C be an [n, k, dmin ] code that is used for transmission over an
erasure channel, i.e., a channel in which a subset of the code symbols
transmitted over the channel are erased. Since each codeword in C
is uniquely determined from a subset of (n − dmin + 1) or more code
symbols, it follows that the transmitted codeword can be recovered
if no more that (dmin − 1) code symbols are erased, i.e., if at least
(n − dmin + 1) code symbols remain unerased.
If C is an RS code of the form:
CRS = {(f (θ1 ), f (θ2 ), . . . , f (θn )) | f ∈ Fq [x], deg(f ) ≤ k − 1}
and {θi1 , θi2 , . . . , θik } represent a subset of the unerased code symbols
of size k = (n − dmin + 1), the remaining code symbols can be explicitly
recovered using Lagrange interpolation:
k k
(x − θij )
f (x) = f (θiℓ )
X Y
.
ℓ=1 j=1,j̸=ℓ
(θiℓ − θij )

2.3 Generalized Reed-Solomon Codes

We now present a generalization of RS codes under which the dual of a


generalized RS (GRS) code is once again a generalized RS code [156].
Let C be an RS code as above, i.e.,
CRS = {(f (θ1 ), . . . , f (θn )) | f ∈ Fq [x], deg(f ) ≤ k − 1} ,
where {θ1 , · · · , θn } are a collection of n distinct elements belonging to
the finite field Fq .
We begin with an observation concerning the ((n − 1) × n) Vander-
monde matrix:
1 1 1
 
...

θ1 θ2 ... θn 
= 
 
P .. .. .. .. .
. . . .
 
 
θ1n−2 θ2n−2 . . . θnn−2
Clearly P has rank (n − 1) and every ((n − 1) × (n − 1)) sub-matrix
of P is nonsingular. It follows that the right nullspace of P contains a
2.3. Generalized Reed-Solomon Codes 561

vector u = [u1 · · · un ]T ∈ Fnq all of whose components are nonzero, i.e.,


u satisfies P u = 0 and ui ̸= 0, 1 ≤ i ≤ n.
Next, let f, g be polynomials over Fq with deg(f ) ≤ k − 1 and
deg(g) ≤ n − k − 1. Set h(x) = f (x)g(x). Then deg(h) ≤ n − 2 and we
can write
n−2
h(x) =
X
hj x j .
j=0

It follows that
n n n−2 n−2 n
!
ui h(θi ) = hj θij = ui θij = 0.
X X X X X
ui hj
i=1 i=1 j=0 j=0 i=1

n
=⇒ ui f (θi )g(θi ) = 0.
X

i=1

It follows from this that the dual of an RS code having generator matrix
of the form
1 1
 
...

θ1 ... θn 
G = 
 
.. .. .. 
. . .
 
 
θ1k−1 . . . θnk−1
is the block code having generator matrix of the form
1 1
  
... u1

θ1 ... θn 
u2 
H = 
  
.. .. .. 
..

. . . .
  
  
θ1n−k−1 ... θnn−k−1 un
with all ui ̸= 0. We will refer to any code having generator matrix G of
the form
1 1
  
... u1

θ1 ... θn 
u2 
G =   with all ui ̸= 0
  
.. .. .. 
..
. . . .
  
  
θ1k−1 ... θnk−1 un
562 Maximum Distance Separable Codes

as a GRS code. Clearly, this code has parameters [n, k, n − k + 1] and is


hence also an MDS code. This establishes that the dual of an RS code
is a GRS code and further that the dual of a GRS code is once again, a
GRS code.

2.4 Systematic Encoding

Definition 2. An [n, k] linear code C is said to be systematic if it


possesses a generator matrix G of the form

G = [Ik | P ],

where P is a (k × (n − k)) matrix.

When we make reference to a systematic code, it is implicitly un-


derstood that the code is encoded using a generator matrix having this
form. The advantage of encoding using such a matrix is that the k
message symbols are explicitly present within the set of n code symbols
and this is a very desirable property in practice. While not every linear
code is systematic, there is an ‘equivalent’ code obtained by rearranging
code symbols that is systematic and in this way, the requirement of
making a code systematic is easily met.
Let C be an [n, k] MDS code and let G0 be a (k × n) generator
matrix for C. Since any k columns of G0 are linearly independent, the
matrix G0 can be row reduced to yield a second generator matrix G for
C that is of the form G = [Ik | P ]. The (k × (n − k)) matrix P has the
interesting and useful property that any (ℓ × ℓ) square sub-matrix of P
is nonsingular, 1 ≤ ℓ ≤ min{k, n − k}. This property can be established
using elementary row reduction. The converse is also straightforward
to establish, namely that if the (k × n) generator matrix G of a linear
code C is of the form G = [Ik | P ] where every (ℓ × ℓ) sub-matrix of P
is non-singular, then C is an [n, k] MDS code.

2.5 Cauchy MDS Codes

Our goal here is to present an explicit construction of a square (m × m)


matrix A, called the Cauchy matrix, having the property that every
2.5. Cauchy MDS Codes 563

square sub-matrix of A obtained by selecting any ℓ rows and ℓ columns


of A is non-singular.
Construction 1. (Cauchy Matrix) Let {a1 , a2 , . . . , am , b1 , b2 , . . . , bm } be
a set of 2m distinct elements belonging to the finite field Fq . Let A be
an (m × m) matrix, called the Cauchy matrix, whose (i, j)th entry Aij ,
1 ≤ i, j ≤ m, is given by:
1
Aij = ,
(ai − bj )
i.e.,
1 1 1
 
(a1 −b1 ) (a1 −b2 ) ... (a1 −bm )
 1 1 1 

(a2 −b1 ) (a2 −b2 ) ... (a2 −bm )

A = .. .. .. .. . (2.2)
 


 . . . . 

1 1 1
(am −b1 ) (am −b2 ) ... (am −bm )

We will show that the Cauchy matrix A is non-singular by identifying


an inverse. It follows from the structure of A that this will also establish
that every square sub-matrix of A is also non-singular.
The relevance of Cauchy matrices is that if we choose a generator
matrix G for an [n, k] code C to be of the form G = [Ik | P ] where P is
a (k × (n − k)) Cauchy matrix
1 1 1
 
(a1 −b1 ) (a1 −b2 ) ... (a1 −bn−k )
 1 1 1 

(a2 −b1 ) (a2 −b2 ) ... (a2 −bn−k )

P =  .. .. .. .. ,
 

 . . . . 

1 1 1
(ak −b1 ) (ak −b2 ) ... (ak −bn−k )

with {a1 , . . . , ak , b1 , . . . , bn−k } ⊆ Fq constituting a set of n distinct


elements, then G generates a (systematic) MDS code [156].

2.5.1 Inverse of the Cauchy Matrix


We now present a proof of the invertibility of the Cauchy matrix ap-
pearing in [208]. Define the degree-m polynomials
m m
A(x) = (x − ai ), B(x) = (x − bi ).
Y Y

i=1 i=1
564 Maximum Distance Separable Codes

The formal derivative A′ (x) of A(x) is given from the product formula
by:
m m
A′ (x) = (x − aj ),
X Y

i=1 j=1,j̸=i

so that
m
A′ (aℓ ) = (aℓ − aj ).
Y

j=1,j̸=ℓ

Define:
m
A(x) (x − aj )
Ai (x) = = (2.3)
Y
.
(x − ai )A′ (ai ) j=1,j̸=i
(ai − aj )

Then

1, x = ai
Ai (x) =
0, x = aℓ , ℓ ̸= i,

and thus serves as an indicator function for ai . Analogously, let


m
B(x) (x − bℓ )
Bu (x) = =
Y
,
(x − bu )B (bu )
′ (b − bℓ )
ℓ=1,ℓ̸=u u

be the indicator function for the elements {bu }m


u=1 , so that:
m
(ai − bℓ )
Bu (ai ) =
Y
.
(b − bℓ )
ℓ=1,ℓ̸=u u

For any degree-(m − 1) polynomial p(x), an application of Lagrange’s


interpolation formula gives us:
m m m
(x − aj )
p(x) = p(ai ) = p(ai )Ai (x).
X Y X

i=1 j=1,j̸=i
(ai − aj ) i=1

Setting p(x) = Bu (x)A(bu ), we obtain from (2.3) that


m
Bu (x)A(bu ) Bu (ai )A(bu ) 1
=
X
.
A(x) i=1
A′ (ai ) (x − ai )
2.5. Cauchy MDS Codes 565

The LHS above satisfies



Bu (x)A(bu ) 1, x = bu
=
A(x) 0, x = bℓ , ℓ ̸= u.
It follows that

Bu (bℓ )A(bu ) m 
−Bu (ai )A(bu )

1 1, ℓ = u
= =
X
A(bℓ ) i=1
A′ (ai ) (ai − bℓ ) 0, else.

It is evident from the above that the (m × m) matrix H whose (u, i)th
entry is given by
−Bu (ai )A(bu )
Hui = , 1 ≤ u, i ≤ m, (2.4)
A′ (ai )
is the inverse of the Cauchy matrix. An alternate, more symmetric
expression
Hui = (ai − bu )Bu (ai )Ai (bu ), 1 ≤ i, u ≤ m,
can be obtained by noting in (2.4) that
A(x) A(bu )
Ai (x) = =⇒ ′ = Ai (bu )(bu − ai ).
(x − ai )A (ai )
′ A (ai )

Notes
1. MDS codes of block length q + 1, q +2: Let {θ1 , θ2 , . . . , θq } denote
the q elements in Fq . It is straightforward to verify that the code
having generator matrix
1 1 0
 
...

θ1 ... θq 0
G = 
 
.. .. .. 
. . .
 
 
θ1k−1 . . . θqk−1 1
is an [n = q + 1, k] MDS code [156]. If q is even, then the generator
matrix
1 ... 1 0 0
 

G =  θ 1 . . . θq 0 1 
 

θ12 . . . θq2 1 0
566 Maximum Distance Separable Codes

yields an [n = q + 2, k = 3] MDS code over Fq [156]. The dual of


this code is an [n = q + 2, k = q − 1] MDS code over Fq . We also
note that one can construct an [n = k + 1, k] MDS code over a
finite field Fq of any size.

2. The (linear) MDS conjecture: Let N (k, q) denote the maximum


possible block length of a k-dimensional MDS linear code over Fq .
If k > q, then it is known that N (k, q) = k+1. The MDS conjecture
[209] states that if 2 ≤ k ≤ q, then N (k, q) = q + 1, except when
q is even and k ∈ {3, q − 1}, in which case N (k, q) = q + 2. The
conjecture is shown to hold for prime q in [16] and for the case
k ≤ 2p − 2, when q is not prime and p is the characteristic of Fq
in [17].

3. Vector MDS codes: Let C be an (n, q kα , dmin ) code of block length


n, size q kα and minimum distance dmin over a vector alphabet
Fαq . If dmin = (n − k + 1), then C is by the Singleton bound, an
MDS code. Such MDS codes over a vector alphabet are frequently
referred to as MDS array codes [23], [26]. Minimum storage regen-
erating codes [50], Even-Odd codes [25] and the Row-Diagonal
Parity code [45] are examples of vector MDS codes. Both Even-
Odd and Row-Diagonal Parity codes are double-erasure-recovering
vector binary MDS codes, i.e., q = 2 and dmin = n − k + 1 = 3.
3
Regenerating Codes

As noted in Section 1, the conventional repair of a Reed-Solomon code


requires the download of an amount of data equal to the size of the data
file being stored in order to repair a single failed node, despite the fact
that each node only stores a small fraction of the contents of the data
file. The amount of data downloaded to repair a single node is termed
the repair bandwidth. To address this problem, Dimakis et al. [50] came
up with a class of codes, termed as regenerating codes (RGCs), whose
repair bandwidth is as small as possible. This seminal paper is all the
more remarkable because a priori, it is not clear that any reduction in
repair bandwidth is even possible. This section introduces RGCs and
establishes their basic properties.

3.1 Definition and Terminology

We begin by describing the functioning of an RGC in a data-storage


setting, before going on to provide a formal, mathematical definition.
An RGC C is associated to a parameter set

{(n, k, d), (α, β), B, Fq },

567
568 Regenerating Codes

where the role of the various parameters is explained below. The aim of
the RGC is to store in efficient and reliable fashion data pertaining to a
data file B that is comprised of B symbols, termed the message symbols,
belonging to an underlying finite field Fq . The B message symbols are
first mapped onto a set of nα symbols over Fq and the nα symbols are
then distributed evenly, across a set of n storage units called nodes,
so that each node stores exactly α symbols. The creation of the nα

1
1
Data
2 Collector 1’
2

3
k

k+1
d+1

n n

capacity nodes capacity nodes

Figure 3.1: An illustration of the data-collection and node-repair properties of an


{(n, k, d), (α, β), B, Fq } RGC. Here, node 1′ is a replacement node for the failed
node 1.

codes symbols and their distribution across n nodes should be such that
the two key properties described below and illustrated in Fig. 3.1, are
satisfied:

1. Data Collection Property: It should be possible to recover the B


message symbols, given access to the contents of any k of the n
storage nodes.

2. Node Repair or Regeneration Property: If a particular node or


storage unit fails, then it should be possible to recover from such
a failure by having a replacement of the failed node connect to
any d of the remaining (n − 1) nodes, and download β symbols
from each of these d nodes to arrive at the α symbols stored by
the replacement node. The repair is called exact repair (ER) if
the contents of the replacement node following node repair are
identical to the contents of the original failed node. The repair is
called functional repair (FR) if, following node repair, the contents
of the set of (n − 1) surviving nodes, together with the contents
3.1. Definition and Terminology 569

of the new replacement node, once again meet the requirements


of an RGC.
The parameter α is termed the sub-packetization level of the RGC,
motivated by viewing the collection of α Fq -symbols as a packet and
each individual Fq -symbol as a sub-packet. The d assisting nodes are
called helper nodes and the parameter d, the repair degree1 . The total
number dβ of Fq symbols downloaded from the d helper nodes for repair
of a failed node is termed the repair bandwidth of the RGC. The rate of
the RGC is given by R = nα B
. Its reciprocal nα
B is the storage overhead.
We note that under FR, the contents of a node can change with time,
and that ER is a particular and more stringent instance of functional
repair. ER is more desirable in practice, as it simplifies logistics.

3.1.1 Formal Definition of Exact and Functional Repair


We begin with the case of exact repair.
Definition 3 (Exact Repair Regenerating Code). An (n, M, dmin ) code C
over an alphabet A is said to be an ER RGC having parameter set
{(n, k, d), (α, β), B, Fq },
where {k, d, α, β, B} are positive integers with 1 ≤ k ≤ d ≤ (n − 1), if
the following conditions are satisfied:
1. A = Fαq ,

2. M = q B ,

3. dmin ≥ (n − k + 1),

4. and where, for every index i ∈ [n], and every subset S ⊆ [n] \ {i}
of size |S| = d, there are functions
hi,j,S : Fαq → Fβq , ∀j ∈ S,
as well as functions
fi,S : Fdβ
q → Fαq
1
This is in analogy with the analogous term first introduced in the context of a
locally recoverable code. Locally recoverable codes are introduced in Section 10.
570 Regenerating Codes

such that

ci = fi,S hi,j,S (cj ), j ∈ S , (3.1)




for every codeword (c1 , c2 , · · · , cn ) ∈ C.

In any (n, M, dmin ) code, a codeword is uniquely determined given


any subset of (n − dmin + 1) code symbols. Thus the condition dmin ≥
(n − k + 1), has the implication that a codeword can be uniquely
recovered from the contents of any k code symbols. This in the context
of distributed storage is the data-collection property. Clearly, equation
(3.1) corresponds to the node-repair property.
We present below the formal definition of an FR RGC by defining
this notion first for a collection of codes, rather than for a single code,
for reasons explained in Remark 2.

Definition 4 (Functional Repair Regenerating Code). A (finite) collection


ℓ=1 of L (n, M ) codes over a common alphabet A is said to be a
{Cℓ }L
collection of FR RGCs having common parameter set

{(n, k, d), (α, β), B, Fq },

where {k, d, α, β, B} are positive integers with 1 ≤ k ≤ d ≤ (n − 1), if


the following conditions are satisfied:

1. A = Fαq ,

2. M = q B ,

3. All codes Cℓ , ℓ = 1, 2, · · · , L have minimum distance satisfying:


dmin (Cℓ ) ≥ (n − k + 1),

4. And where, for every index ℓ ∈ [L], i ∈ [n], and every subset
S ⊆ [n] \ {i} of size |S| = d, there are functions
(ℓ)
hi,j,S : Fαq → Fβq , ∀j ∈ S,

as well as functions
(ℓ)
fi,S : Fdβ
q → Fαq ,
3.1. Definition and Terminology 571

such that if we set


 
(ℓ) (ℓ)
ĉi = fi,S hi,j,S (cj ), j ∈S , (3.2)

we have
c1 , · · · , ci−1 , ĉi , ci+1 , · · · , cn | (c1 , · · · , cn ) ∈ Cℓ = Cℓ′
 

′ ′
for some ℓ , 1 ≤ ℓ ≤ L. An FR RGC is then simply a code that is
an element of such a collection of FR RGCs and shares the same
parameter set as does the collection of FR RGCs.
Remark 2. The naive approach would be to define an FR RGC by
defining it as a code that, apart from the data collection property, has
the property that following node repair, one arrives at a second code that
is also an FR RGC. The above approach aims to avoid such a circular
definition. The presence of a collection of FR RGCs, as appearing in
the definition above, can be seen in the construction of an FR RGC
appearing in [212]. The case of ER may be regarded as corresponding to
the special case of FR when there is just a single code in the collection,
i.e., when L = 1.

Linear RGC There are four classes of mappings associated with an


RGC:
(a) The mapping from B message symbols to the nα contents of the
n nodes,

(b) The mapping used to recover the B message symbols from the kα
contents of a specific set of k nodes,

(c) The mapping Fαq → Fβq used by node j to determine the β sym-
bols to be forwarded to the replacement of failed node i, given
knowledge of the remaining (d − 1) helper nodes,

(d) The mapping used by the replacement node to extract the α sym-
bols to be stored from the dβ symbols supplied to the replacement
node, by a specific set of d helper nodes.
We will say that an RGC is linear if all four mappings above are linear.
All the RGCs that will be encountered in this monograph will be linear.
572 Regenerating Codes

3.2 Bound on File Size

We adopt the information-theoretic approach2 of Shah et al. [211] to


establishing the fundamental bound on file size B for a given parameter
set {(n, k, d), (α, β), Fq }, appearing in (3.7).
The RGC is of size q B and we can therefore without loss of generality,
associate a unique message vector M ∈ FB q to each codeword. We will
assume that M as a random variable is uniformly distributed over FB
3
q .
The contents of each node as well as the repair data symbols that are
passed between nodes are thus also random variables, that are functions
of M .
Let Wi denote the random variable taking on values in Fαq that
represents the contents of the ith node. Clearly,

H(Wi ) ≤ α, (3.3)

where we have taken the unit of entropy as log2 (|Fq |) = log2 (q) bits. In
all of our discussion here, the entropy will always be measured in units
of log2 (q) bits. For A ⊆ [n], we use the notation WA to denote the set

WA = {Wi | i ∈ A}.

The data-collection property required of an ER RGC imposes the


following additional constraints:

H(WA ) = B, and H(M | WA ) = 0 (3.4)

for A ⊆ [n], |A| ≥ k. For x, y ∈ [n], and D ⊆ [n], such that x ̸∈ D,


y ̸∈ D and

|D ∪ {x}| = d,

we use D Sxy to denote the random variable corresponding to the helper


data sent by the helper node x to the replacement of a failed node y,

2
The original proof of file size upper bound for RGCs by Dimakis et al. [50] was
using a network coding approach, which we discuss in Section 3.4.
3
Strictly speaking M is a random vector, but we will use the term random variable
to refer to either a random vector or random variable. Also, random variables typically
take on real values, however, this is not an essential restriction.
3.2. Bound on File Size 573

when the set of d helper nodes is the set D ∪ {x}. We will drop the pre-
script D and simply write Sxy if D is understood from the context. As an
example, this can happen if n = (d + 1), in which case D = [n] \ {x, y}.
Given subsets X, Y, D ⊆ [n], with

|D ∪ X| = d, D ∩ {X ∪ Y } = ϕ,

we define D SXY = { S y | x ∈ X, y ∈ Y, x ̸= y, A = D ∪ X \ {x} }. For


A x
the case X = Y , we use the short-hand notation D SX to indicate D SX X.

In all these cases, we will drop the prescript D if it is understood from


the context and simply write SX Y , S in place of S Y , S respectively.
X D X D X
From the definition of an RGC, it follows that

H(D Sxy ) ≤ β. (3.5)

For every i ∈ [n], the exact-repair condition imposes the constraint


i
H(Wi | SX ) = 0, |X| = d, i ∈
/ X. (3.6)

Theorem 2 (Fundamental Bound on File Size under Exact Repair). In


any {(n, k, d), (α, β), B, Fq } exact-repair RGC, we must have:
k
min{α, (d − i + 1)β}. (3.7)
X
B ≤
i=1

Proof. Let C denote the RGC. Let D be an arbitrary subset of the n


nodes of size |D| = (d + 1). Let C D denote the restriction of C to the
subset D of nodes. Clearly, the code C D stores the same B message
symbols as does the RGC C. Furthermore, C D is also an RGC having
parameter set

{((d + 1), k, d), (α, β), B, Fq }.

We will focus our attention from here on, on the code C D instead of the
code C, and establish the upper bound on file size B. The same bound
will then continue to apply to the code C. We will also assume without
loss of generality, that the n nodes are indexed so that D = [d + 1].
From the data collection property of an RGC, we have
k
B = H(W1 , . . . , Wk ) = H(Wi | W[i−1] ),
X

i=1
574 Regenerating Codes

where [i − 1] := {1, 2, . . . , i − 1}, [0] := ϕ and where, by W[i−1] , we


mean the collection {Wj | j ∈ [i − 1]}. The theorem then follows from
Lemma 1 below.

Lemma 1.
H(Wi | W[i−1] ) ≤ (d − i + 1)β.
Proof. We will prove this lemma in three steps.
Step 1 - We set A := [d + 1] \ [i], and note that
i
H(Wi | S[i−1] i
, SA ) = 0,
which follows from the repair property of an RGC.
Step 2 - We will show using Lemma 2 below, that this implies
i
H(Wi | S[i−1] ) ≤ H(SA
i
). (3.8)
Since, from the definition of an RGC, we have
i
H(SA ) ≤ |A|β = (d − i + 1)β,
this in turn implies that
i
H(Wi | S[i−1] ) ≤ (d − i + 1)β. (3.9)
Step 3 - The information passed on by a helper node to a replacement
of the failed node i, is clearly a function of the contents of the helper
node. This implies that
i
H(S[i−1] | W[i−1] ) = 0.
We will use this observation, coupled with Lemma 3 below, to show
that
H(Wi | W[i−1] ) ≤ H(Wi | S[i−1]
i
), (3.10)
thus completing the proof of Lemma 1.

It remains to state and prove Lemmas 2 and 3, appearing in the


proof above. The random variables {U, X, Y, Z} appearing in the two
lemmas should be interpreted follows
i i
X ⇔ Wi , Y ⇔ S[i−1] , Z ⇔ SA , U ⇔ W[i−1] .
3.2. Bound on File Size 575

Lemma 2. Let X, Y, Z be random vectors whose components take values


in Fq . Then,
H(X | Y, Z) = 0 =⇒ H(X | Y ) ≤ H(Z | Y ) ≤ H(Z).
Proof:
H(X, Z | Y ) = H(X | Y ) + H(Z | X, Y )
| {z }
≥0
= H(Z | Y ) + H(X | Y, Z) .
| {z }
=0
It follows that
H(X | Y ) ≤ H(Z | Y ) ≤ H(Z).

This completes Step 2 and we have established the inequality
i
H(Wi | S[i−1] ) ≤ (d − i + 1)β. (3.11)
Lemma 3, appearing in Step 3, will complete the proof of Lemma 1, by
showing that
H(Wi | W[i−1] ) ≤ H(Wi | S[i−1]
i
). (3.12)
Lemma 3.
H(X | U ) ≤ H(X | Y ) when H(Y | U ) = 0.
Proof:
H(U, X, Y ) = H(U ) + H(X | U ) + H(Y | X, U )
= H(Y ) + H(X | Y ) + H(U | X, Y ).
Since H(Y | U ) = 0 =⇒ H(Y | X, U ) = 0 and H(U, Y ) = H(U ), we
have that
H(X | U ) = H(X | Y ) − [H(U ) − H(Y ) − H(U | X, Y )]
= H(X | Y ) − [H(U, Y ) − H(Y ) − H(U | X, Y )]
= H(X | Y ) − [H(U | Y ) − H(U | X, Y )]
≤ H(X | Y ).

576 Regenerating Codes

Remark 3. While the proof given above is for the case of exact repair,
it extends in straightforward fashion to the case of functional repair. In
an RGC with functional repair, the contents of the nodes, as well as
the data transferred for node repair can change with time.
Let us assume that we are at time instant t in the functional-repair
setting. As in the case of ER, we restrict attention to a subset of (d + 1)
nodes that are numbered 1 through (d + 1). Let ti denote the last time
instant at which node i was repaired prior to time t. We assume without
loss of generality, that

t1 < t2 < · · · < tk−1 < tk < t.

With respect to the proof given above for the ER case, we now interpret
Wi , for 1 ≤ i ≤ k, as the contents of node i at time t. We interpret
i
S[i−1] , i = 1, 2, · · · , k as the data passed on by helper node j ∈ [i − 1]
to the replacement of the ith node at time ti , i.e., the time instant
which node i failed. Similarly, SA i denotes the helper information passed

on by nodes in set A for repair of node i at time ti . With this, it can


be verified by following the argument made for the ER case, that the
information-theoretic arguments remain unchanged and we therefore
arrive at the same bound.

3.3 Storage-Repair-Bandwidth Tradeoff

For a given file size B, the storage overhead and normalized repair

bandwidth are given respectively by nα B and B . Thus for a fixed value
of file size B, block length n, and repair degree d, the parameter α
is indicative of the amount of storage overhead while β determines
the normalized repair bandwidth. We will say that an RGC having
parameters {(n, k, d), (α, β), B, Fq } is optimal if (a) the file-size bound
in (3.7) is met with equality and if further, (b) reducing either α or β
causes the bound to be violated4 .
4
The latter condition is inserted since at the extreme MSR case, one could have
B = αk and β very large while satisfying (3.7), while at the same time the inequality
α
could also be satisfied with β = (d−k+1) . At the other extreme MBR end, equality
k

could hold with B = (dk − 2
)β and α very large, while α = dβ would suffice for
equality to hold.
3.3. Storage-Repair-Bandwidth Tradeoff 577

0.33

0.32

MSR
0.31

0.3

0.29

0.28

0.27

0.26

0.25 MBR
0.24
0.19 0.2 0.21 0.22 0.23 0.24 0.25

Figure 3.2: The normalized storage-repair-bandwidth tradeoff under functional


repair, for the parameters (k = 5, d = 11).

It will be convenient at this point, to introduce normalized versions


of the parameters (α, β), given by
α β
ᾱ := , β̄ := .
B B
Then by dividing both sides of (3.7) by B, we obtain
k
1 ≤ min{ᾱ, (d − i + 1)β̄}. (3.13)
X

i=1

For fixed (k, d), the locus of all pairs (ᾱ, dβ̄) that satisfy (3.13) with
equality will be shown in Section 6 to be a piece-wise linear curve as
can be seen in Fig. 3.2. For fixed value of block length n, this curve
represents a tradeoff between storage overhead nᾱ on the one hand,
and normalized repair bandwidth dβ̄ on the other. The network coding
approach to deriving the fundamental bound on file size (see Section 3.4)
tells us that for every set of parameters {(n, k, d), (α, β)} there exists an
RGC having file size B satisfying (3.7). However, network coding only
guarantees the existence of an RGC that is repaired using functional
repair. For this reason, the plot of the pairs (ᾱ, dβ̄) that satisfy (3.13)
with equality is referred to as the FR tradeoff.
The corresponding tradeoff under exact repair, called the ER tradeoff,
is harder to characterize and is discussed further in Sections 6 and 7.
578 Regenerating Codes

3.3.1 MSR and MBR Codes


Clearly, the smallest value of ᾱ for which the equality can hold in (3.13)
is given by ᾱ = k1 . Given ᾱ = k1 , the smallest permissible value of dβ̄ is
given by dβ̄ = k(d−k+1)
d
. The corresponding pair (ᾱ, dβ̄) thus represents
the point of the tradeoff corresponding to minimum possible storage
overhead and additionally, minimum possible repair bandwidth, given
that the storage overhead is as small as possible. Codes achieving (3.7)
with ᾱ = k1 and dβ̄ = k(d−k+1)
d
are for this reason, known as minimum
storage regenerating (MSR) codes.
Similarly, at the other end of the tradeoff, the smallest possible
value of the normalized repair bandwidth is given by dβ̄ = d
.
dk−(k2)
Given dβ̄ = d
, the smallest permissible value of ᾱ is given by
dk−(k2)
ᾱ = dβ̄. The corresponding pair (ᾱ, dβ̄) represents this time, the point
of the tradeoff corresponding to minimum possible repair bandwidth
and additionally, minimum possible storage overhead, given that the
normalized repair bandwidth is as small as possible. Codes achieving
(3.7) with dβ̄ = d
and ᾱ = dβ̄ are for this reason, known as
dk−(k2)
minimum (repair) bandwidth regenerating (MBR) codes.
As noted in Definitions 3,4, the minimum Hamming distance dmin
of an RGC must satisfy dmin ≥ (n − k + 1). By the Singleton bound, the
largest size M of a code of block length n and minimum distance dmin
is given by M ≤ Qn−dmin +1 , where Q is the size of alphabet of the code.
Since the alphabet size Q = q α in the case of an RGC, it follows that
the size M of an RGC must satisfy M ≤ q kα , or equivalently q B ≤ q kα ,
i.e., B ≤ kα. But B = kα in the case of an MSR code and it follows
that every MSR code is an MDS code over the vector alphabet Fαq . In
the literature, MDS codes over a vector alphabet are also referred to as
MDS array codes. For more on MDS array codes, see the notes section
appearing at the end of Section 2.
From a practical perspective, ER RGCs are easier to implement
as the contents of the n nodes in operation do not change with time.
Partly for this reason and partly for reasons of tractability, with few
exceptions, most constructions of RGCs belong to the class of ER RGCs.
Examples of FR RGC include the d = (k + 1) construction in [212] as
well as the construction in [98].
3.4. Network Coding Approach to the File-Size Bound 579

Early constructions of RGCs focused on the two extreme points


of the storage-repair-bandwidth tradeoff, namely the MSR and MBR
points. The storage industry places a premium on low storage overhead.
This is not too surprising, given the vast amount of data, running into
the exabytes, stored in today’s data centers. In this connection, we note
that the maximum rate of an MBR code is given by:
(dk − k2 )β k

B dk −
RMBR = = = 2
,
nα ndβ nd
which can be shown to satisfy the upper bound RMBR ≤ 12 , achieved
with equality when k = d = (n − 1). This makes MSR codes of greater
practical interest when minimization of storage overhead is a primary
objective.
Apart from the requirements of low repair bandwidth and low
value of storage overhead, from a practical perspective, there are some
additional properties desired of an RGC. These include small field size,
small sub-packetization level α and minimal disk read and computation
needed for node repair.

3.4 Network Coding Approach to the File-Size Bound

The fundamental upper bound on file size appearing in Theorem 2 was


originally established by Dimakis et al. [50] using a network coding
approach [3], [122] and by invoking the cut-set bound. For this reason,
the upper bound on file size bound is often referred to in the literature,
as the cut-set bound. We present an overview of this proof below.

3.4.1 The Cut-Set Upper Bound


The derivation of the upper bound only requires the RGC to be an FR
RGC. Since an ER RGC is also an FR RGC, the bound clearly applies
to ER RGCs as well. Let C be an FR RGC having parameter set
{(n, k, d), (α, β), B, Fq }.
Since the RGC is of size q B , as before, we associate without loss of
generality, a unique message vector M ∈ FB q to each codeword. Over
580 Regenerating Codes

time, nodes will undergo failures and every failed node will be replaced
by a replacement node. Let us assume to begin with, that we are only
interested in the behavior of the RGC over a finite-but-large number
N ≫ n of node repairs. Moreover, we will assume that nodes are
repaired using functional repair. For simplicity, we assume that repair
is carried out instantaneously. Then, at any given time instant t, there
are n functioning nodes whose collective contents constitute an RGC
and a data collector should be able to connect to any subset of k nodes,
download all of the contents of these k nodes and use these to recover
the B message symbols, {ui ∈ Fq }B i=1 . Clearly, in all, there are at most
N nk distinct data collectors, each corresponding to a distinct choice of


k nodes to which the data collector connects.


out
DC

d in

S
d-1

d-2

d-3

cut

Figure 3.3: The directed, capacitated graph associated to an RGC that over time,
undergoes a finite number N of node repairs. The label on each edge, indicates the
capacity of that edge. Here DC denotes the data collector.

Next, we create a source node that possesses the B message symbols


i=1 , and draw edges connecting the source to the initial set of
{ui }B
n nodes among which, data pertaining to the message symbols was
distributed in coded form. We also draw edges connecting the d helper
nodes that assist a replacement node and the replacement node, as well
as edges connecting each data collector with the corresponding set of
k nodes from which the data collector downloads data. All edges are
directed in the direction of information flow. We associate a capacity
β with edges emanating from a helper node to a replacement node
3.4. Network Coding Approach to the File-Size Bound 581

and an ∞ capacity with all other edges. Each node can only store α
symbols over Fq . We incorporate this constraint by using a standard
graph-theory construct, in which a node is replaced by 2 nodes separated
by a directed edge (leading towards a data collector) of capacity α. We
have in this way, arrived at a graph (see Fig. 3.3) in which there is a
single source S and at most N nk sinks {Ti }.


Each sink Ti would like to be able to reconstruct all the B source


symbols {ui } from the symbols it receives. This is precisely the multicast
setting of network coding. A principal result in network coding tells us
that in a multicast setting, one can transmit messages along the edges
of the graph in such a way that each sink Ti is able to reconstruct the
source data, provided that the minimum capacity of a cut separating S
from Ti is ≥ B. A cut separating S from Ti is simply a partition of the
nodes of the network into 2 sets: a subset Ai of the nodes containing S
and whose set-theoretic complement Aci , contains Ti . The capacity of
the cut is the sum of the capacities of all the edges leading from a node
in Ai to a node in Aci . A careful examination of the graph will reveal
that the minimum capacity Q of a cut separating a sink Ti from source
k−1
S is given by Q = min{α, (d − i)β} (Fig. 3.3 shows an example cut
P
i=0
separating source from sink). This leads to the upper bound (3.7) on
file size:
k−1
min{α, (d − i)β}.
X
B ≤
i=0

3.4.2 Achievability
Network coding also employs the Combinatorial Nullstellensatz [5] to
show that when only a finite number of node failures and correspond-
ing regenerations take place, this bound is achievable, and moreover,
achievable using linear network coding, i.e., achievable using only linear
operations at each node in the network for a sufficiently large value q
of the finite field Fq . In a subsequent result [251], Wu used the specific
structure of the graph to show that even in the case when the number
of sinks is infinite, the upper bound in (3.7) continues to be achievable
using linear network coding.
582 Regenerating Codes

In this way, one can draw upon principles of network coding to char-
acterize the maximum file size of an RGC given parameters {k, d, α, β}
for the case of functional repair.

3.5 Overview of RGC-Related Topics in the Monograph

Regenerating codes

Storage-
MBR codes MSR codes repair-bandwidth Interior points Variants
tradeoff

Sub-packetization
lower bounds Fractional Secure Rack-aware
Near-optimal Cooperative
repetition RGCs RGCs
MDS codes RGCs
codes

Figure 3.4: Topics related to RGCs that are covered elsewhere in the monograph.

Table 3.1: Constructions for MBR codes, MSR codes and interior-point ER RGCs
that are presented in the monograph. All of the constructions appearing in the
table are explicit. (We only provide brief descriptions however, of the Small-d MSR,
Cascade and Moulin code constructions.)

Type of RGC Code Section


MBR Polygonal [179] 4.1
MBR Product-Matrix [185] 4.2
MSR Product-Matrix [185] 5.1
MSR Diagonal [255] 5.2
MSR Coupled-Layer [137], [205], [256] 5.3
MSR Small-d [239] 5.4
Interior point Determinant [61], [63] 7.1
Interior point Cascade [64] 7.2
Interior point Moulin [54] 7.3

Fig. 3.4 presents an overview of the RGC-related topics discussed


in this monograph. Sections 4 and 5 present several constructions of
exact-repair MBR and MSR codes respectively. The storage-repair-
bandwidth tradeoffs corresponding to FR and ER appear in Section 6.
Constructions directed towards interior points of the tradeoff in the
3.5. Overview of RGC-Related Topics in the Monograph 583

case of ER appear in Section 7. Bounds on the sub-packetization level


of an MSR code appear in Section 8. Several variants of RGCs such
as fractional repetition codes and cooperative RGCs are discussed in
Section 9. A tabular listing of the constructions of RGCs that appear
in Sections 4, 5 and 7 is provided in Table 3.1.

Notes

1. Liquid storage: In [150]–[152], a different approach to coded


distributed storage is adopted. The broad objectives remain the
same however, namely, minimizing storage overhead as well as
the amount of data download needed to carry out node repair
while ensuring reliable data storage. The liquid storage systems
described in [150]–[152] employ erasure codes having large block
length, and as a result, the number of redundant nodes is propor-
tionally larger. This permits an approach to node repair termed as
lazy repair, in which the repair center is able to wait until several
nodes have failed before proceeding with node repair. This is to
take advantage of the fact that the simultaneous repair of t failed
nodes can be carried out more efficiently in terms of the amount
of helper data that needs to be downloaded for node repair, in
comparison with separately carrying out the individual repair of
the t nodes. In [150]–[152], the authors assume that nodes fail
at a certain rate and the focus is on minimizing both the peak
and average repair rate at which data needs to be read from the
storage nodes by a centralized repair center.
4
MBR Codes

As noted in Section 3, MBR codes are the subclass of RGCs that


have minimum possible normalized repair bandwidth and additionally,
smallest value of storage overhead given that the normalized repair
bandwidth is as small as possible. MBR codes thus correspond to one
of the two extreme points of the storage-repair bandwidth tradeoff. The
repair bandwidth, dβ, of an MBR code equals the amount of data to
be regenerated, α and the file size, B is given by B = kdβ − k2 β.


An MBR code is said to possess the help-by-transfer (HBT) property


if a failed node can always be regenerated without need for any form of
finite-field computation carried out at the helper nodes. If in addition,
no computation is required even at the replacement node, the code is
said to possess the repair-by-transfer (RBT) property. In other words,
the RBT property implies that node repair is accomplished by simply
copying over a subset of symbols contained in the d helper nodes to the
replacement node. It follows from this that if an MBR code possesses
the RBT property, each scalar code symbol contained in a node must
also be contained in at least one other node. As a form of converse result,
it is shown in [132] that it is not possible to construct an MBR code
in which even a single scalar symbol is repeated more than twice. This

584
4.1. Polygonal MBR Code 585

statement is true regardless of whether or not the MBR code possesses


the RBT property.
It follows that if an MBR code possesses the RBT property, its nα
scalar code symbols must be comprised of a set of nα 2 distinct code
symbols, each replicated twice. In [213], it is shown that an MBR code
with d < n − 1, cannot possess the HBT (and hence the RBT) property.
We provide in this section, two constructions (see Table 4.1) of MBR
codes. In the first construction [179], d = n − 1 and the code satisfies
the RBT property. The second construction [185] yields general MBR
codes, i.e., MBR codes for all d ≤ n − 1 (which do not in general, possess
the RBT property).
Table 4.1: The explicit MBR code constructions described in this section.

MBR code Parameters Field size Attributes


Polygonal [179] d = n − 1, β = 1 O(n2 ) RBT
Product-Matrix [185] all d, β = 1 O(n) -

4.1 Polygonal MBR Code

The polygonal MBR code construction by Rashmi et al. [179], [211]


yields MBR codes satisfying the RBT property for all parameters
k ≤ d = n − 1 and β = 1. We begin with an example construction for
the case {(n = 5, k = 3, d = 4), (α = 4, β = 1), B = 9}, which we will
refer to as the pentagon MBR code.
Consider a complete graph with n = 5 vertices. It has 52 = 10


edges. The B = 9 symbols of the data file are encoded using a [10, 9, 2]
MDS code to produce ten code symbols. Each code symbol is assigned
to a distinct edge. Each node of the pentagon MBR code stores the
code symbols assigned to the edges incident on that node (see Fig. 4.1).
We will now verify that the example construction indeed satisfies both
data collection and RBT properties.
Data Collection: Any collection of k = 3 nodes contains nine distinct
code symbols of the [10, 9, 2] MDS code. This is sufficient to recover all
10 code symbols and in this way, the 9 message symbols that make up
the data file.
586 MBR Codes
1 2 3 4 5 6 7 8 9 P

2,5,6,P
P 6

1,4,9,P 1 1,3,6,7
5 2

4 3
9 7

3,5,8,9 8 2,4,7,8

Figure 4.1: An example RBT MBR code construction for the parameter set
(n = 5, k = 3, d = 4). The file size B = 9 here.

Node Repair: To repair a failed node, each helper node simply


provides to the replacement node, the code symbol associated to the
edge it shares with the failed node. Thus, node repair is accomplished
by merely transferring β = 1 code symbol from each of the (n − 1) = 4
helper nodes to the replacement node.

4.1.1 The General Polygonal MBR Construction


The general polygonal MBR code construction yields MBR codes having
parameter set of the form
!
n k o
(n, k, d = n − 1), (α = n − 1, β = 1), B = k(n − 1) −
2
for any n ≥ 3 and requiring O(n2 ) field size. The construction is most
easily described in terms of a complete graph G on n vertices having
N = n2 edges. Let Csc be a scalar [N, B, N − B + 1] MDS code. Each


code symbol in Csc is mapped on to an edge in G. Each node is made to


store the α = n − 1 code symbols corresponding to the edges incident
on the particular node. Node repair then proceeds exactly as in the
case of the example pentagon MBR code. The replacement node simply
downloads from each helper node, the unique symbol of Csc it shared
in common with the failed node. With regard to the data collection
property, downloading all the contents of any k nodes yields
! !
k k
αk − = dk − = B
2 2
4.2. Product-Matrix MBR Code 587

distinct code symbols from Csc . The MDS property of Csc now allows all
the B message symbols associated to the data file to be recovered. Note
that the requirement of an MDS code of block length N = n2 places a


requirement of O(n2 ) on the finite field needed to construct the MBR


code.
The construction of a family of RBT MBR codes with d = n − 1
and a reduced O(n) field-size requirement is presented in [144].

4.2 Product-Matrix MBR Code

A product-matrix framework was introduced by Rashmi et al. [185] that


provides constructions for both MSR and MBR codes. We describe in
this subsection, the product-matrix MBR (PM-MBR) code construction.
This construction yields MBR codes for all parameter sets of the form
!
k
(n, k, d), (α = d, β = 1), B = kd −

, Fq ,
2
and requires a q = O(n) field size.
Let G be an (n × d) matrix over Fq of the form:
" #
G1 G2
G ≜ |{z} |{z} ,
n×k n×(d−k)
|{z}
n×d

where:
• Every (k × k) sub-matrix of G1 is non-singular

• Every (d × d) sub-matrix of G is non-singular.


The two requirements can be met, for example, by choosing G to be a
Cauchy or Vandermonde matrix, both of which require O(n) field-size.
The matrix G plays the role of generator matrix for the PM-MBR code.
Next, we introduce a data-bearing, symmetric, (d × d) matrix M of
the form
 
S
|{z} V
|{z}
k×k k×(d−k)
 
 
M ≜ ,
 
|{z} 
 T 
d×d  V
|{z} 0
|{z}

(d−k)×k (d−k)×(d−k)
588 MBR Codes

where S is a symmetric (k × k) matrix. Since S is symmetric, the matrix


M can store at most
k+1
! !
k
+ k(d − k) = kd − = B,
2 2
distinct elements. The matrix M is accordingly populated by the B
message symbols associated to the data file and may be regarded as a
the matrix analogue of the message vector associated to a scalar block
code.
Each codeword in the PM-MBR code is then represented by an
(n × d) code matrix C that is the product of the matrices G, M :

C ≜ |{z}
|{z} G |{z}
M .
n×d n×d d×d

If cTi denotes the ith row of the code matrix C, i ∈ {1, . . . , n}, the
contents of the ith node are then precisely the components of ci .
Data Collection: Consider any collection of k nodes indexed by
the subset K ⊆ {1, 2, . . . , n} of size |K| = k. Let [GK,1 GK,2 ] denote
the (k × d) sub-matrix of G = [G1 G2 ] obtained by selecting the rows
indexed by K, where GK,1 and GK,2 are the corresponding sub-matrices
of G1 and G2 respectively. Let CK denote the corresponding (k × d)
sub-matrix of C. Then we can write:
" #" # " #
GK,1 GK,2 S V CK,1 CK,2
C K = := ,
VT
| {z } | {z } | {z } | {z }
|{z} k×k k×(d−k) 0 k×k k×(d−k)
k×d

so that

CK,1 = GK,1 S + GK,2 V T and CK,2 = GK,1 V.

During data recovery, both CK,1 and CK,2 are accessible. As any (k × k)
sub-matrix of G1 is non-singular by design, in particular, the sub-matrix
GK,1 is non-singular. This allows us to recover the matrix V from:

V = (GK,1 )−1 CK,2 .

Having recovered V , we can then recover S using:

S = (GK,1 )−1 [CK,1 − GK,2 V T ].


4.2. Product-Matrix MBR Code 589

With this all B message symbols have been recovered.


Node Repair: Assume that node f has failed and let the helper nodes
be indexed by the subset D ⊆ {1, 2, . . . , n} of size |D| = d. Let CD , GD
denote the sub-matrices of C, G respectively, obtained by selecting the
rows indexed by D. We then have:
" #
S V
C D = GD .
|{z} |{z} VT 0
d×d d×d

Let g Tf denote the f th row of G. The node repair process can be


explained in three steps.

• Step 1: Each helper node-i, i ∈ D, computes cTi g f and transmits


the resultant symbol to the replacement node. At the end of Step
1, the vector CD g f is available at the replacement node.

• Step 2: Since any (d × d) sub-matrix of G is non-singular, the


replacement node can then compute:
" #
−1 S V
(GD ) CD g f = gf .
VT 0

• Step 3: By taking the transpose, the replacement node obtains:


" # !T " #
S V S V
gf = g Tf = cTf ,
VT 0 VT 0
and in this way, the contents of the failed node have been recovered.

Notes
1. Fractional repetition codes: Fractional repetition codes [59] may
be regarded as generalizing the polygonal MBR construction. In
a fractional repetition code, the underlying scalar code symbols
are obtained by replicating an MDS code ρ ≥ 2 times. However,
unlike in the case of an MBR code, for the repair of each node,
only a specific set of d ≤ n − 1 helper nodes is guaranteed to be
able to help in node repair. For this reason, fractional repetition
codes are said to have table-based repair. Fractional repetition
codes are discussed in greater detail in Section 9.2.
590 MBR Codes

2. Binary MBR codes: There exist MBR codes over the binary field F2
with β = 1 if the parameters {n, k, d} satisfy any of the following
conditions (i) k = d − 1 = n − 2 (ii) k = d = n − 2 and (iii)
k = d − 1 = n − 3. Details can be found in [132], [179].

Open Problem 1. Determine the smallest possible field size q of an


MBR code having given parameters {n, k, d, β}.
5
MSR Codes

Among the class of RGCs, MSR codes have received the greatest at-
tention for reasons that include the fact that MSR codes are MDS
codes, have storage overhead that can be made as small as desired,
and have been challenging to construct. An MSR code with parame-
ters (n, k, d, α) has file size B = kα and repair bandwidth β = d−k+1 α
.
MSR codes can also be viewed as vector MDS codes that incur the
least-possible repair-bandwidth for the repair of a failed node.
While only β symbols are passed on to the replacement of a failed
node by each of the d helper nodes, the number of symbols accessed
by the helper node in order to generate these β symbols could be
significantly larger than β. There is interest in practice, in the subclass
of MSR codes having the property that the number of scalar symbols
accessed at each helper node is also equal to the number β of symbols
that are passed on for node repair. Such MSR codes are termed as
optimal-access MSR codes.
An early construction of an MSR code with parameters (n, k, d) and
(α, β) satisfying d = (n − 1) ≥ 2k − 1, β = 1, can be found in [227] and is
briefly discussed in the notes subsection. A detailed description of three
constructions of an MSR code is presented in the present section, along

591
592 MSR Codes

Table 5.1: Explicit MSR code constructions described in this section. Here r = (n−k)
and s = (d − k + 1). The ∗ in the last row is to indicate that Small-d MSR codes
have lowest possible sub-packetization level under the assumption of helper-set-
independent repair, see Section 5.4.

MSR code Parameters Field size Attributes


PM -MSR [185] d ≥ 2(k − 1), α = s nα low-rate
Diagonal MSR[255] all d, α = sn sn optimal-update
n
d = n − 1, α = r⌈ r ⌉ optimal-access
CL-MSR [137], [205], [256] n
with minimum α
n
Small-d MSR [239] d ∈ {k + 1, k + 2, k + 3}, α = s⌈ s ⌉ O(n) optimal-access
with minimum∗ α

with a brief description of a fourth MSR code. The first construction


is the product-matrix MSR (PM-MSR) code [185], i.e., the MSR code
constructed using a product-matrix framework as was the case with the
PM-MBR code. PM-MSR codes, like PM-MBR codes, have smallest
level of sub-packetization possible of an RGC, corresponding to setting
parameter β = 1. This is followed by a description of the Diagonal
MSR code [255] construction, a construction which yields MSR codes
for all (n, k, d) parameter sets. The third construction presented is the
coupled-layer MSR (CL-MSR) code [137], [205], [256]. The CL-MSR
code is an optimal-access MSR code with parameter d = (n − 1) that
turns out to have least-possible sub-packetization level of an optimal-
access MSR code. Following the three detailed descriptions, we provide
a brief summary of the attributes of a fourth MSR code construction,
termed the Small-d MSR code construction [239]. The Small-d MSR
code construction yields MSR codes for small values of d that have the
optimal-access property. Table 5.1 presents an overview of the four code
constructions. Brief discussions of other constructions of MSR codes
can be found in the notes subsection.

5.1 Product-Matrix MSR Code

The product-matrix MSR (PM-MSR) construction by Rashmi et al.


[185] yields MSR codes with parameters satisfying d ≥ 2(k − 1) and
β = 1. We begin with the case d = 2(k − 1) and then show how the
general d ≥ 2(k−1) code can be constructed by appropriately shortening
the d = 2(k − 1) code. The process of code shortening is explained in
Section 5.1.3.
5.1. Product-Matrix MSR Code 593

Given that we are operating at the MSR point with d = 2(k − 1)


and β = 1, it follows that the resultant MSR code will have parameter
set given by:

(n, k, d = 2(k − 1)), (α = (k − 1), β = 1), B = kα .




Note that d = 2(k − 1) = 2α. Let M be a (2α × α) matrix having


the structure:
 
S1
 |{z} 
M =  α×α  ,
 
|{z}  S2 
2α×α |{z}
α×α

where the matrices S1 , S2 are symmetric, of size (α × α). It follows that


the total number of distinct symbols that can be contained in the matrix
is given by α(α + 1) = αk which is precisely the file size of the MSR
code it is planned to construct. In the first step of the construction, the
matrix M is populated with the B = kα message symbols.
Encoding is carried out using an (n × d) matrix J given by

ΛG
 
J = G |{z}
|{z} ,
n×α n×α
|{z}
n×d

where G is an (n × α) matrix and Λ is an (n × n) diagonal matrix. The


matrix J may be regarded as playing the role of generator matrix in
the construction. The matrices G and Λ are required to be chosen such
that the following properties hold:

• Every (d × d) sub-matrix of J is non-singular,

• Every (α × α) sub-matrix of G is non-singular,

• The n diagonal elements of Λ are distinct.

We now present a Vandermonde matrix J of the required form that


meets all the above requirements. Let Fq be a finite field having size
q ≥ nα. Let γ be a primitive element of Fq , i.e., γ is a generator of the
multiplicative group F∗q of Fq and set θi = γ i−1 , for all 1 ≤ i ≤ n. Then
594 MSR Codes

the Vandermonde matrix


(d−1)
 
1 θ1 θ12 . . . θ1
1 θ2 θ 2 . . . θ (d−1) 
 
2 2
J = . . .. .. .. 
 
 .. .. . . . 
 
(d−1)
1 θn θn . . . θn
2

meets all the requirements. It is of the form J = [G ΛG] where:

1 θ1 . . . θ1α−1 θ1α
   

1 θ2 . . . θ2 
α−1 
 
θ2α 
G =   and Λ = 
 
.. .
.. .
.. .
..  ..
.
. .
 
  
1 θn . . . θnα−1 θnα

The (n × α) code matrix C is then given by


" #
h i S1
C = |{z} M =
J |{z} G ΛG .
|{z} S2
n×α n×d d×α

As in the case of the PM-MBR code, the ith node stores the α
symbols contained in the ith row ci of C. Let the ith row of G be
denoted by g Ti and let λi be the ith diagonal element of Λ.
Node Repair: Suppose node f has failed. The f th node stores the
f th row of C given by
" #
h i S1
cTf = g Tf λf g Tf = g Tf S1 + λf g Tf S2 .
S2

Our goal in node repair, is to recreate this vector using helper data.
Let D ⊆ {1, 2, . . . , n} \ {f }, with |D| = d, be the indices of the d
helper nodes. Let JD be the sub-matrix of J obtained by selecting the
d = 2α rows of J whose indices lie in D. Let CD be the sub-matrix of
C containing rows with indices lying in D. Then
" #
S1
C D = JD ,
|{z} |{z} S2
d×α d×d

and the symbols of CD are precisely the contents of the d helper nodes.
5.1. Product-Matrix MSR Code 595

Step 1: The helper node i sends cTi g f to the replacement node.


Aggregating repair information from all helper nodes, the replacement
node obtains CD g f .
Step 2: The replacement node then computes
" # " #
S1 S1 g f
(JD )−1 CD g f = gf =
S2 S2 g f
and thus recovers S1 g f and S2 g f .
Step 3: Since S1 and S2 are symmetric, the replacement node can
then carry out the computation
(S1 g f )T + λf (S2 g f )T = g Tf S1 + λf g Tf S2 = cTf ,
to recover the content of the failed node.
Data Collection: Let the subset K ⊆ {1, 2, . . . , n}, with |K| = k
represent the indices of the nodes whose contents are to be used to
recover the data file. Let JK = [GK (ΛG)K ] be the (k × d) sub-matrix of
J obtained by picking rows with indices in K. Similarly, let CK denote
the (k × α) sub-matrix of C corresponding to K given by
" #" #
GK (ΛG)K S1
C K = |{z} | {z } .
|{z} k×α k×α S2
k×α

Our goal is to recover S1 and S2 . Let ΛK be the (k × k) sub-matrix


of Λ consisting of rows and columns whose index lies in K. As Λ is
a diagonal matrix, it can be easily verified that (ΛG)K = ΛK GK . It
follows that
" #
h i S1
CK = GK Λ K GK = GK S1 + ΛK GK S2 .
S2
Next, compute
CK GTK = GK S1 GTK + ΛK GK S2 GTK = P + ΛK Q = A,
where P = GK S1 GTK and Q = GK S2 GTK . Since S1 and S2 are symmetric
matrices, it follows that P and Q are also symmetric, and of size (k × k).
Let aij , pij and qij be the (i, j)-th entries of A, P and Q respectively.
Then,
aij = pij + λi qij and aji = pji + λj qji .
596 MSR Codes

Since pij = pji and qij = qji , we have


" # " #" #
aij 1 λi pij
= .
aji 1 λj qij
For i ̸= j, λi ̸= λj and we can solve for pij and qij . Thus, we have access
to all the off-diagonal elements of both P and Q.
Let e be such that eT GK = 0T . Such an e can be found since GK
is of size (α + 1) × α. Moreover, all the entries of e must be non-zero
since any α rows of G, and hence of GK , are required to be linearly
independent. Therefore,

eT P = eT GK S1 GTK = 0T .

In each of the k equations here there is only one unknown namely, the
diagonal element pii . In this way, the diagonal elements of P can be
recovered. The diagonal entries of Q can be recovered in identical fashion.
Given P and Q, the matrices S1 , S2 can be recovered in straightforward
fashion from P = GK S1 GTK and Q = GK S2 GTK . This completes the
data-collection process.

5.1.1 Extension to the Case d > 2(k − 1)


The extension we present here is by shortening of an MSR code, which
is the method adopted in [185] to provide constructions for d ≥ 2k − 1.
An alternative approach, that also makes use of the PM framework,
can be found in [143], that directly yields MSR constructions for any
d ≥ 2k − 1 without need of shortening. We begin by explaining the
concept of shortening as it applies to scalar linear codes.

5.1.2 Shortening of a Scalar Linear Code


Let C be an [n, k] systematic, linear scalar code. Then C has a generator
matrix of the form
" #
Ik P
G = |{z} |{z} ,
k×k k×(n−k)

and the first k code symbols in any codeword of C are message symbols.
Let C1 ⊆ C be the subcode of C corresponding to code symbol c1 = 0,
5.1. Product-Matrix MSR Code 597

i.e., (c1 , . . . , cn ) ∈ C1 =⇒ c1 = 0. Then C1 is an [n, k − 1] code. The


first code symbol in all the codewords in C1 is zero. Deleting this symbol
leads us to the code C1′ , which is an [n − 1, k − 1] code. We will refer
to the code C1′ as the code obtained by shortening the code C on or
with respect to, the first coordinate. If S ⊆ {1, . . . , k} is a subset of size
1 ≤ |S| = s ≤ (k − 1), then it is clear that through repeated shortening,
we can construct an [n − s, k − s] code CS′ by considering the subcode
of C that is obtained by setting s message symbols to zero.

5.1.3 Shortening of a Linear MSR Code


Next, let C be an (n, k, d), (α, β), Fq , B linear MSR code, i.e., an MSR


code that is linear as an RGC (see Section 3). Thus, the nα symbols
stored across the n storage nodes are linear functions of the B message
symbols.
The size of the data file equals kα which is precisely the number
of Fq symbols contained in any set of k nodes. Clearly, by making an
appropriate linear transformation of code symbols, we may assume that
the contents of the first k nodes {ci }ki=1 are precisely the B message
symbols. Consider the subcode C 1 of C that corresponds to the contents
of the first s, 1 ≤ s ≤ (k − 1) nodes being equal to zero. It can be
verified that if one deletes or removes these nodes, one will be left with
an MSR code having parameters:
(n − s, k − s, d − s), (α, β), B = α(k − s) .


5.1.4 Extending Parameter Set of the PM-MSR Code


Suppose now it is desired to construct an MSR code having parameters
(n, k, d = 2(k − 1) + s), (α = d − k + 1, β = 1), B = αk ,


one begins with the construction of a PM-MSR code having parameters


(n + s, k + s, 2(k + s − 1)), (α = k + s − 1, β = 1), B = α(k + s) .


Shortening with respect to s nodes then converts this into an MSR code
having parameters
(n, k, d = 2(k − 1) + s), (α = k + s − 1, β = 1), B = αk .

598 MSR Codes

In this way, the PM-MSR construction can be made to realize MSR


codes having parameters d ≥ 2(k − 1).

5.1.5 Rate of the PM-MSR Code


PM-MSR codes exist only for d ≥ 2(k − 1) from which it follows that
n ≥ d + 1 ≥ 2k − 1. The rate R of the code then satisfies:
k k 1 1
R = ≤ = + ,
n 2k − 1 2 2(2k − 1)
which is just a little over half. This relatively low rate is a drawback of
the PM-MSR code.

5.2 Diagonal-Matrix-Based MSR Code

In this subsection we describe a construction for a linear MSR code


family due to Ye and Barg [255]. The construction is explicit, employs
field size that is linear in the block length n, and is able to generate
MSR codes for any parameter set (n, k, d). The sub-packetization level
α = sn with s ≜ (d − k + 1) is, however, exponential in the parameter
n. We will refer to these codes here as the Diagonal MSR construction
since the construction employs diagonal matrices.
The construction can be described in terms of an (rα × nα) p-c
matrix H of the form:
 
I I ··· I

A1 A2 ··· An 
H =  (5.1)
 
.. .. .. .. ,
. . . .
 
 
Ar−1
1 Ar−1
2 · · · Ar−1
n

where r = (n − k) and where each sub-matrix Ai is a diagonal (α × α)


matrix over Fq . Thus to fully specify the respective MSR code, it suffices
to identify the matrices {Ai }.
The parameters of the Diagonal MSR code are of the form:

(n, k, d), (α = sn , β = sn−1 ), B = αk, Fq ,



5.2. Diagonal-Matrix-Based MSR Code 599

with field-size requirement given by q ≥ sn. In the construction, each


sub-matrix Ai , for i ∈ [n], is a diagonal matrix taking on the form:

Ai =
X
λi,ai ea eTa ,
a∈[α]

where (a1 , · · · , an ) ∈ Zns represents a base-s expansion of (a − 1), i.e.,


n
(a − 1) =
X
ai si−1
i=1

and the vectors ea ∈ Fαq are unit vectors such that the a-th element
of ea is 1 and all other elements are zero. Thus, the matrix ea eTa is an
(α × α) diagonal matrix having a 1 in the a-th row and a-th column
and zeros everywhere else. The elements {λi,u | i ∈ [n], u ∈ [0, s − 1]}
are chosen to be distinct and hence form a subset of Fq of size ≥ ns.
Thus the ith matrix Ai is an (α×α) matrix, whose diagonal elements
are indexed by the variable a, where a takes on values in the set [sn ] = [α].
The ‘a’th diagonal element equals λi,ai , and thus is a function of i and
the ith component ai of a.
Let c = (cT1 , · · · , cTn )T be a codeword in the Diagonal MSR code,
where ci = (ci (1), · · · , ci (α))T ∈ Fαq is stored in node i ∈ [n]. Then,

Hc = 0,
n
Aji ci = 0 for all j ∈ [0, r − 1],
X

i=1
n X
λji,ai ea eTa ci = 0 for all j ∈ [0, r − 1],
X

i=1 a∈[α]
n
j
λi,ai ci (a) = 0 for all j ∈ [0, r − 1], a ∈ [α]. (5.2)
X

i=1

It follows that the rα equations shown in (5.2) characterize the Diagonal


MSR code. We will refer to the p-c equation appearing in (5.2), as the
(j, a)-th parity-check.
Data Collection: The data collection property can be established by
showing that the Diagonal MSR code can recover from any r = (n − k)
600 MSR Codes

erasures. Let the set of node indices corresponding to the r erasures be


denoted by E ⊆ [n], |E| = r. Then the equation (5.2) reduces to:
X j
λi,ai ci (a) = κ∗ , j ∈ [0, r − 1], a ∈ [α],
i∈E

where κ∗ denotes a known quantity that can be computed from the


contents of the unerased nodes. As the set of {λi,ai | i ∈ E} are distinct
for every a ∈ [sn ], we can recover {ci (a) | i ∈ E, a ∈ [sn ]}, thereby
recovering all the erased symbols.
Node Repair: Let i0 ∈ [n] be the index of the failed node that needs
to be repaired, and let the subset D ⊆ [n] \ {i0 } of size |D| = d denote
the indices of the d helper nodes. The ith helper node for i ∈ D sends
the following β = sn−1 symbols as helper information:
s−1
( )
hi,i0 (a) = ci (a(i0 , u)) | a ∈ [s ], ai0 = 0 ,
X
n

u=0

where a(i0 , u) is the integer, whose s-ary representation

(a1 , · · · , ai0 −1 , u, ai0 +1 , · · · , an ),

is the same as that of a except that the i0 th component, ai0 , is replaced


by u. Equivalently, a(i0 , u) = a − ai0 si0 −1 + usi0 −1 . Since all α symbols
contained in a helper node are accessed in order to generate the β
helper symbols, it follows that the Diagonal MSR code does not have
the optimal-access MSR property.
Focusing on the (j, a(i0 , u))-th p-c equation for a ∈ [sn ] such that
ai0 = 0 we obtain:

λji0 ,u ci0 (a(i0 , u)) = − λji,ai ci (a(i0 , u)), all u ∈ [0, s − 1].
X

i∈[n]\{i0 }

Summing over u, we obtain:


s−1 s−1
λji0 ,u ci0 (a(i0 , u)) = − λji,ai ci (a(i0 , u))
X X X

u=0 u=0 i∈[n]\{i0 }

= − λji,ai hi,i0 (a).


X

i∈[n]\{i0 }
5.2. Diagonal-Matrix-Based MSR Code 601

Spelling out these p-c equations for all j ∈ [0, r − 1] in matrix form, we
obtain:
1 1 1 ci0 (a(i0 , 0))
  
···
 i0 ,0 λi0 ,1 · · · λi0 ,s−1   ci0 (a(i0 , 1)) 
 λ  
 . .. .. ..
 . ..   
 . . . .

 .


r−1 r−1 r−1
λi0 ,0 λi0 ,1 · · · λi0 ,s−1 ci0 (a(i0 , s − 1))
| {z }
Vi0

h1,i0 (a)
 

1 ··· 1 1 ··· 1
 .. 

 . 

 λ
 1,a1 · · · λi0 −1,ai0 −1 λi0 +1,ai0 +1 · · · λn,an   hi0 −1,i0 (a)
 
= −
 .. .. .. ... (5.3)

.. ..  .
 . . . . .   i0 +1,i0 (a)
 h 
..

λr−1 r−1
λr−1 r−1
 
1,a1 · · · λi0 −1,ai · · · λn,an .
 
0 −1
i0 +1,ai 0 +1  
hn,i0 (a)
| {z }
Li0

Case d = (n − 1) : For the case when d = n−1, s = d−k +1 = r and Vi0


is an (r ×r) Vandermonde matrix and Li0 is an (r ×n−1) Vandermonde
matrix. Also, the symbols in the RHS of the above equation given by
{hi,i0 (a) | i ∈ [n] \ {i0 }} are known. Therefore by the invertibility of Vi0
we can recover failed node symbols
{ci0 (a(i0 , u)) | u ∈ [0, s − 1]} .
By varying a ∈ [sn ] such that ai0 = 0, we can recover all the failed node
symbols:
{ci0 (a(i0 , u)) | u ∈ [0, s − 1], a ∈ [sn ], ai0 = 0} = {ci0 (a) | a ∈ [sn ]}.
Case d < (n − 1) : In this case, Vi0 is an (r × s) Vandermonde matrix
and Li0 is an (r × n − 1) Vandermonde matrix. We will show that all
failed symbols can be recovered by establishing that any d symbols of
{hi,i0 (a) | i ∈ [n]\{i0 }}, are enough to recover the remaining n−1−d =
r − s symbols. This will be done by proving that {hi,i0 (a) | i ∈ [n] \ {i0 }}
are code symbols of an [n − 1, d] MDS code.
We start by defining an ((r − s) × r) matrix Ni0 with row vectors
lying in the left null space of Vi0 . The ith row of Ni0 , Ni0 (i, :) is defined
as shown below for any i ∈ [0, r − s − 1]:

h 0 ··· i−1 i i+1 ··· i+s−1 i+s i+s+1 ··· r−1 i


Ni0 (i, :) = 0 ··· 0 f0 f1 ··· fs−1 fs 0 ··· 0 .
602 MSR Codes

s s−1
where f (x) = fi xi = (x − λi0 ,u ). Then we have:
P Q
i=0 u=0

f (λi0 ,0 ) f (λi0 ,1 ) f (λi0 ,s−1 )


 
···
 λi0 ,0 f (λi0 ,0 ) λi0 ,1 f (λi0 ,1 ) · · · λi0 ,s−1 f (λi0 ,s−1 ) 
= = ,
 
Ni0 Vi0  .. .. .. .. 0

 . . . .


|{z}
((r−s)×s)
λir−s−1
0 ,0
f (λi0 ,0 ) λir−s−1
0 ,1 i0 ,s−1 f (λi0 ,s−1 )
f (λi0 ,1 ) · · · λr−s−1
(5.4)
and

Ni0 Li0
 
f (λ1,a1 ) ··· f (λi0 −1,ai0 −1 ) f (λi0 +1,ai0 +1 ) ··· f (λn,an )
λ1,a1 f (λ1,a1 )· · · λi0 −1,ai0 −1 f (λi0 −1,ai0 −1 ) λi0 +1,ai0 +1 f (λi0 +1,ai0 +1 ) · · · λn,an f (λn,an )
 
 
=  .. .. .. .. .. ..
 
. . . . . .

 
 
λ1,a1 f (λ1,a1 ) · · · λi0 −1,ai −1 f (λi0 −1,ai0 −1 ) λi0 +1,ai +1 f (λi0 +1,ai0 +1 ) · · · λn,an f (λn,an )
r−s−1 r−s−1 r−s−1 r−s−1
0 0

1 ··· 1 1 ··· 1
 
 λ1,a1 · · · λi0 −1,ai0 −1 λi0 +1,ai0 +1 ··· λn,an 
=  .. .. .. ..
 
.. .. 

 . . . . . .


λr−s−1
1,a1 · · · λr−s−1
i0 −1,ai λr−s−1
i0 +1,ai · · · λr−s−1
n,an
0 −1 0 +1

f (λ1,a1 )
 

..
.
 
 
 
f (λi0 −1,ai0 −1 )
 
.
 
×

 f (λi0 +1,ai0 +1 ) 

 .. 

 . 

f (n, an )

Notice that Ni0 Li0 is p-c matrix of an [n − 1, n − 1 − (r − s) = d]


generalized Reed-Solomon (GRS) code as f (λi,ai ) ̸= 0 for all i ̸= i0 .
From equations (5.3) and (5.4) we get,

h1,i0 (a)
 
  ..



 .
 hi0 −1,i0 (a) 
 
Ni0 Li0 
  = 0.
 hi0 +1,i0 (a) 

 .. 
.
 
 
hn,i0 (a)

By the GRS property we can recover all the n − 1 symbols in {hi,i0 (a) |
i ∈ [n]\{i0 }} from any d-symbol subset. This implies that the symbols in
5.3. Coupled-Layer MSR Code 603

the RHS of the equation (5.3) are known. Therefore by the invertibility
of the sub-matrix of Vi0 comprising of the first s rows of Vi0 , we can
recover the failed node symbols

{ci0 (a(i0 , u)) | u ∈ [0, s − 1]} .

By varying a ∈ [sn ] such that ai0 = 0, we can recover all the failed node
symbols:

{ci0 (a(i0 , u)) | u ∈ [0, s − 1], a ∈ [sn ], ai0 = 0} = {ci0 (a) | a ∈ [sn ]}.

Remark 4. Diagonal MSR codes turn out to also satisfy the optimal-
update property where, to update a single symbol out of the α symbols
in a systematic node, one is required to update only (n − k) parity
symbols.

An extension of the Diagonal MSR code that has the (h, d) optimal-
repair property for any h ∈ [2, n − k], d ∈ [k, n − h] appears in [255]. By
(h, d) optimal-repair property is meant, the recovery of h erasures by
downloading
αh
d−k+h
symbols each from d helper nodes, which is the minimal repair band-
width possible for MDS codes [31]. The sub-packetization level of these
extended codes is of the form sn where s = lcm(2, 3, · · · , n − k). The h
node repair discussed here assumes a centralized repair setting whereas
an alternate, cooperative repair approach is discussed in Section 9.3.

5.3 Coupled-Layer MSR Code

In [256], Ye and Barg presented an explicit construction of a high-rate,


n
optimal-access MSR code with α = r⌈ r ⌉ , field size no larger than r⌈ nr ⌉,
and d = (n − 1), where r = n − k. Essentially the same construction was
independently rediscovered by Sasidharan et al. [205] two months later,
from a different coupled-layer perspective, where layers of an arbitrary
MDS codes are coupled by a simple pairwise-coupling transform to yield
an MSR code.
604 MSR Codes

Just prior to the appearance of these two papers, in an earlier


version of [137], Li et al. show how a systematic MSR code can be
converted into an MSR code by increasing the sub-packetization level
by a factor of r using a pairwise-symbol transformation. This result is
then extended in [137] to a technique that takes an MDS code, increases
sub-packetization level by a factor of r and converts it into a code in
which the optimal repair of r nodes can be carried out. By applying
this transform repeatedly ⌈ nr ⌉ times, it is shown that any scalar MDS
code can be transformed into an MSR code. It turns out that the three
papers [137], [205], [256], either explicitly or implicitly, employed as
a key part of the construction, essentially the same pairwise-coupling
transform.
In this subsection, we present this optimal-access MSR code construc-
tion contained in [137], [205], [256] from the coupled-layer perspective
appearing in [205]. We will refer to the resultant MSR code as the
coupled-layer MSR (CL-MSR) code1 . This code has the additional at-
tribute of having the lowest-possible level of sub-packetization of any
linear, optimal-access MSR code, provided n ̸= 1 (mod r), as it attains
a lower bound on sub-packetization level of a linear, optimal-access
MSR code, see Section 8.2.
A CL-MSR code has parameter set of the form

(n = rt, k = r(t − 1), d = n − 1), (α = rt , β = rt−1 ), B = αk, Fq ,




and as can be seen, code parameters are a function of two integer-valued


variables, namely r ≥ 1, and t ≥ 2. The field-size requirement is given
by q ≥ n. The rate R of this code is hence given by R = r(t−1)rt = t−1
t
and can be made arbitrarily close to 1 by making t large enough. The
principal steps in the construction of an CL-MSR code are the following:

(a) An [n = rt, k = r(t − 1)] scalar MDS code CMDS is first selected,

(b) The n = rt code symbols of each codeword in CMDS are arranged


so as to form a two-dimensional (r × t) array,
1
Vajha et al. [240] present an implementation and evaluation of the coupled-layer
MSR code in the Ceph distributed storage system. In the paper, the Coupled-LAYer
code is given the acronym, the Clay code. The implementation in Ceph is described
in Section 17.
5.3. Coupled-Layer MSR Code 605

(c) Each codeword belonging to the CL-MSR code C, is uniquely


associated to a set of α = rt codewords drawn from CMDS that are
not necessarily distinct,

(d) The α codewords from CMDS are vertically stacked so as to form a


data cube, which we will refer to as the uncoupled data cube,

(e) The symbols within the uncoupled data cube are transformed
using a simple, linear pairwise-symbol transformation that replaces
selected pairs of symbols over Fq contained within the uncoupled
data cube, by their transformed versions. The data cube obtained
via this transformation is called the coupled data cube.

Let

{B(x, y, z) | (x, y) ∈ Zr × Zt , z ∈ Ztr }

denote the nα = (r × t × rt ) symbols of the uncoupled data cube (see


Fig. 5.1). Then, for fixed value z 0 of the planar (or horizontal-layer)
index z, the n = rt symbols {B(x, y, z 0 ) | (x, y) ∈ Zr × Zt } constitute
the n code symbols of a codeword from CMDS .

Figure 5.1: An example uncoupled data cube for the case (r = 2, t = 3). As can be
seen, the location of the red dots within a plane, provides a pictorial representation
of the index z associated to the plane.

Let H be an ((n − k) × n) p-c matrix of the scalar code CMDS . Let


θℓ,(x,y) denote the element of H lying in the ℓ-th row, and (x, y)-th
606 MSR Codes

column. The symbols in the uncoupled data cube then satisfy the
equations:

θℓ,(x,y) B(x, y; z) = 0, (5.5)


X

(x,y)∈Zr ×Zt

for all ℓ ∈ [0, n − k − 1] and all z ∈ Ztr .

PFT

PRT

Uncoupled data cube Coupled data cube

Figure 5.2: Paired symbols within either the uncoupled or coupled data cube are
depicted using yellow rectangles connected by dotted lines. The pairwise forward
transform (PFT) and pairwise reverse transform (PRT) are used to transform symbol-
pairs between the two data cubes.

Next, the symbols in the uncoupled data cube B(·) are paired. The
symbol B(x, y, z) with zy =
̸ x is paired with the symbol B(zy , y, z(y, x))
where, we use the notation z(y, x) to denote the vector in which the
y-th component of z is replaced by x:

z(y, x) ≜ (z0 , · · · , zy−1 , x, zy+1 , · · · , zt−1 ).

The symbols B(x, y, z) with zy = x remain unpaired. Equivalently, we


may regard these symbols as fixed points in this pairing process, i.e.,
each symbol B(x, y, z) with zy = x is paired with itself.
Next, let

{A(x, y, z) | (x, y) ∈ Zr × Zt , z ∈ Ztr },

denote the nα symbols of a second data cube, termed the coupled data
cube. The contents of the coupled data cube will shortly be related to
the contents of the uncoupled data cube, as depicted in Fig. 5.2. There is
5.3. Coupled-Layer MSR Code 607

an analogous pairing of symbols within the coupled data cube. Thus the
symbol A(x, y, z) with zy =
̸ x is paired with the symbol A(zy , y, z(y, x))
and the symbols A(x, y, z) with zy = x are paired with themselves.
Let u be a nonzero element in the finite field Fq , satisfying u2 ̸= 1.
The symbols of the coupled data cube are derived from those of the
uncoupled data cube via the following transformation:
" # " #−1 " #
A(x, y, z) 1 u B(x, y, z)
= ,
A(zy , y, z(y, x)) u 1 B(zy , y, z(y, x))
for zy ̸= x, (5.6)
A(x, y, z) = B(x, y, z), for zy = x.

We will refer to set of equations (5.6), as the pairwise forward transform


(PFT). The mapping in the reverse direction, given by:
" # " #" #
B(x, y, z) 1 u A(x, y, z)
= ,
B(zy , y, z(y, x)) u 1 A(zy , y, z(y, x))
for zy ̸= x, (5.7)
B(x, y, z) = A(x, y, z), for zy = x.

will be referred to as the pairwise reverse transform (PRT).

Remark 5. (4-symbol MDS property) Note for the case zy ̸= x, we have


that

[A(x, y, z) A(zy , y, z(y, x)) B(x, y, z) B(zy , y, z(y, x))] =


" #
1 0 1 u
[A(x, y, z) A(zy , y, z(y, x))] .
0 1 u 1

Since any two columns of the (2 × 4) matrix appearing on the extreme


right of the equation above are linearly independent, it follows that all
four symbols

{A(x, y, z), A(zy , y, z(y, x)), B(x, y, z), B(zy , y, z(y, x))}

can be computed from knowledge of any two symbols from the 4-symbol
set, i.e., the four symbols form a [4, 2] MDS code.
608 MSR Codes

In terms of physical storage, in the CL-MSR code, the n = rt nodes


are indexed by the pairs (x, y) ∈ Zr × Zt with node (x, y) storing the
symbols

{A(x, y, z) | z ∈ Ztr },

of the coupled data cube.


Substituting the PRT (5.7) into the p-c equations (5.5) of the CL-
MSR code associated with the uncoupled data cube B(·), we obtain
the equivalent p-c equations placed on the symbols of the coupled data
cube A(·):

θℓ,(x,y) A(x, y, z) + uθℓ,(x,y) A(zy , y, z(y, x)) = 0,


X X X

(x,y)∈Zr ×Zt y∈Zt x̸=zy


(5.8)

for all ℓ ∈ [0, n − k − 1] and z ∈ Ztr .


Node Repair: Let (x0 , y0 ) be the failed node. Let us define a subset
P(x0 , y0 ) of β = rt−1 “helper” planes given by
n o
P(x0 , y0 ) ≜ z ∈ Ztr | zy0 = x0 .

To recover the rt erased symbols {A(x0 , y0 , z) | z ∈ Ztr }, each of the


remaining nodes (x, y) ̸= (x0 , y0 ) pass on the β = rt−1 code symbols

{A(x, y, z) | z ∈ P(x0 , y0 )}

that lie in these helper planes.


Consider a symbol A(x, y, z) with y = ̸ y0 , lying in a helper plane
z ∈ P(x0 , y0 ). The companion A(zy , y, z(x, y)) of A(x, y, z) also lies in
one of the helper planes since the y0 th component of z(x, y) is x0 . Thus
both symbols are passed on to the replacement node.
For each z ∈ P(x0 , y0 ), we next rewrite (5.8) by placing the erased
code symbols on the left and using the symbol κ∗ to denote linear
combinations of all the known helper information to the right. This
leads to

θℓ,(x0 ,y0 ) A(x0 , y0 , z) + uθℓ,(x,y0 ) A(x0 , y0 , z(y0 , x)) = κ∗ , (5.9)


X

x̸=x0 , x∈Zr
5.3. Coupled-Layer MSR Code 609

∀ℓ ∈ [0, r − 1]. As a result, for each fixed z ∈ P(x0 , y0 ), there are r


unknowns and r equations from which the r unknowns
[
A(x0 , y0 , z) A(x0 , y0 , z(y0 , x))
x̸=x0 , x∈Zr

can be recovered, since the choice of {θℓ,(x,y) } ensures the (r × r) coeffi-


cient matrix is non-singular. Repeating this process for all z in P(x0 , y0 )
allows us to recover
 
[  [ 
A(x0 , y0 , z) A(x0 , y0 , z(y0 , x))
 
z∈P(x0 ,y0 ) x̸=x0 , x∈Zr

= {A(x0 , y0 , z) | z ∈ Ztr },

which is precisely the set of all erased symbols.


Data Collection: To establish the data collection property, it is
sufficient to show that the entire data file can be recovered in the
presence of any (n − k) = r node erasures. Let E ⊆ Zr × Zt represent a
fixed erasure pattern of size |E| = r. We describe these erasures using
a nomenclature that is plane dependent. For a given plane z, erasures
(x, y) ∈ E with zy = x are termed as serious erasures. The intersection
score (IS) of a plane is then defined to be the number of serious erasures
in the plane, and an illustrative example appears in Fig. 5.3.

0 1 2 3 4
0
Serious

1
Serious
x
2

Figure 5.3: Shown above is plane with index z = (2, 1, 0, 1, 2) for the case (r =
4, t = 5) where the four black circles indicate the four erasures. The intersection
score of this plane is 2 as it has two serious erasures, corresponding to the circles
enclosing red dots.
610 MSR Codes

We present below, a sequential decoding algorithm corresponding


to rounds 0, 1, 2, · · · in that order, that is described in [205]. In the
sequential algorithm, erased symbols in planes having intersection score
IS = i are decoded in the ith round, by making use of symbols recovered
in prior rounds.
IS = 0 case: Let z be a plane with IS= 0. This implies that (zy , y)
is not an erased node, for all y ∈ Zt . As a result, for any (x, y) ∈ / E,
both A(x, y, z) and A(zy , y, z(y, x)) are known. This allows the symbols
B(x, y, z), (x, y) ∈
/ E to be computed using the PRT, see (5.7). Therefore
equation (5.5) reduces to

θℓ,(x,y) B(x, y, z) = κ∗ ,
X

(x,y)∈E

where κ∗ is a known value. Thus we get r equations in r unknowns. The


choice of {θℓ,(x,y) } guarantees the resultant (r × r) coefficient matrix is
invertible. In this way, one can recover all symbols {B(x, y, z) | (x, y) ∈
E} of the uncoupled data cube, corresponding to all planes z having IS
= 0.
IS > 0 case: Next, let z be a plane with IS= i. We will first show that
the symbols {B(x, y, z) | (x, y) ∈ / E} can be computed using unerased
code symbols as well as code symbols that were recovered from prior
rounds of the sequential decoding process. For the case (x, y) ∈ / E
and (zy , y) ∈
/ E, both the symbols A(x, y, z), A(zy , y, z(y, x)) are known.
Therefore B(x, y, z) can be computed from the PRT via

B(x, y, z) = A(x, y, z) + uA(zy , y, z(y, x)).

Now for the case when (x, y) ∈ / E and (zy , y) ∈ E, the plane z(y, x) has
IS= i−1. Therefore we would have recovered the symbol B(zy , y, z(y, x))
in round i − 1. We know symbol A(x, y, z) as it is unerased. Using the
symbols A(x, y, z) and B(zy , y, z(y, x)) and the 4-symbol MDS property
noted above, the symbol B(x, y, z) can be computed. In this way, we
know B(x, y, z) for any (x, y) ∈ / E. As a result, equation (5.5) can be
reduced to the form

θℓ,(x,y) B(x, y, z) = κ∗ ,
X

(x,y)∈E
5.4. Small-d MSR Codes 611

where κ∗ is a known value. Thus we end up once again with r equations


in r unknowns. The choice of the {θℓ,(x,y) } guarantees the resultant
(r × r) coefficient matrix is invertible. In this way, one can recover
{B(x, y, z) | (x, y) ∈ E} for all planes z having IS = i.
At the end of this decoding process we will have recovered all the
uncoupled code-symbols {B(x, y, z) | (x, y) ∈ Zr × Zt , z ∈ Ztr }. By
applying the PFT we can compute the erased node symbols {A(x, y, z) |
(x, y) ∈ E, z ∈ Ztr }.

5.3.1 Extension to Other Parameters


Although the construction is explained for the case when (n, k, d) are of
the form (n = rt, k = r(t − 1), d = n − 1), the construction can be used
to generate MSR codes for general parameter sets (n, k, d = n − 1) by
shortening the code as described earlier in the section on the product-
matrix MSR code.
For the case of d < n − 1, it turns out that the coupled-layer
construction technique can be applied, to result in an [n = st, k =
d + 1 − s] vector MDS code having sub-packetization α = st . This
coupled-layer MDS code is such that the repair of node (x0 , y0 ) with
optimal repair bandwidth, is possible only if the d helper nodes from
each of which β = st−1 symbols are downloaded, includes all the (s − 1)
nodes corresponding to the set {(x, y0 ) | x ̸= x0 }. Thus the coupled-layer
construction does not yield an MSR code for the case d < n − 1, since
this represents a form of table-based repair.

5.4 Small-d MSR Codes

This subsection discusses an optimal-access MSR code construction for


small values of d. Small values of d are of interest since they correspond
to low repair degree. Additionally, many high-rate codes where the gap
between n and k is small, may also fall into the small-d category. In
[239], Vajha et al. present a construction for optimal-access MSR codes
with

n = st, k , d ∈ {k + 1, k + 2, k + 3}, α = st , s = (d − k + 1) ,
 
612 MSR Codes

with s ∈ {2, 3, 4} and t ≥ 2, and field size q linear in block length, i.e.,
q = O(n).
These codes have two additional attributes. Consider a setting in a
linear RGC where node f has failed and we are interested in the data
transferred by helper node h to the failed node f , and where the indices
of the remaining (d − 1) helper nodes are specified by a set D ⊂ [n]
of size (d − 1). Since the RGC is linear, the data transferred can be
represented in the form
(D)
Shf ch
(D)
where Shf is a (β × α) matrix and ch represents the (α × 1) vector
corresponding to the data stored in node h. It turns out in the case of the
(D)
Small-d MSR code construction, that the matrix Shf appearing above,
is a function of the failed node f alone, and so we can simply write Sf
(D)
in place of Shf . This property is termed the constant-repair-matrix
property. We note as an aside, that since the code is an optimal-access
MSR code, the entries of each matrix Sf are either 0 or 1 with each
row of Sf containing a single 1.
It turns out that not only do Small-d MSR code possess the constant-
repair-matrix property, they also have the smallest possible sub-packeti-
zation level α possible, of any linear, optimal-access MSR code having
(D)
the property that the repair matrix Shf is independent of the remaining
(D)
helper nodes in D, so that we can write Shf = Shf . We term this latter
property with respect to repair matrices, the helper-set-independence
property. Clearly, the constant-repair-matrix property implies the helper-
set-independence property.
By shortening a Small-d MSR code, one can construct additional
optimal-access (n, k) MSR codes that also have constant repair matrices.
These also have minimum sub-packetization level possible of a linear,
optimal-access MSR code having the helper-set-independence property,
provided n ̸= 1 (mod s) where s = d − k + 1. Details can be found in
[239].
Open Problem 2. Construct an optimal-access MSR code having least-
possible sub-packetization level for the case when d = (n − 1) and n = 1
mod r.
5.4. Small-d MSR Codes 613

Open Problem 3. Provide explicit constructions of optimal-access


MSR codes having least-possible sub-packetization level, for all possible
(n, k, d), with d < (n − 1).
Open Problem 4. Construct MSR codes with least-possible
sub-packetization level for all (n, k, d)2 . (There is no optimal-access
requirement here).

Notes

1. Early constructions of high-rate MSR codes: The rate of the


PM-MSR code can be at most a little larger than 0.5, as shown
in Section 5.1.5. The construction of ER-MSR codes in the high-
rate regime remained an open problem for quite some time. A
high-rate MSR code was first provided in [167] for the parameter
set (n = k + 2, k, d = k + 1). The existence of ER-MSR codes
for all (n, k, d) as B goes to infinity was established in [31]. The
Zigzag code [229], [248] was the first non-asymptotic, high-rate
construction for any (n, k, d = n − 1). Zigzag codes have sub-
packetization level α = (n − k)k+1 and are non-explicit in general,
as they make use of the Combinatorial Nullstellensatz [5] to
establish the data-collection property. These code possess the
optimal-access and optimal-update properties.

2. Non-explicit, high-rate MSR codes: A high-rate optimal-access


MSR construction for the case d = n − 1 with sub-packetization
n
level α = r⌈ r ⌉ where r = n − k appeared in [200]. The sub-
packetization level of this linear code matches with the lower
bound on sub-packetization of linear, optimal-access MSR codes
derived in [11], provided n ̸= 1 (mod r). This construction was
n
extended in [190] to the case d < (n − 1), with α = s⌈ s ⌉ where
s = d − k + 1. In [55], the authors generalize the PM-MSR con-
struction to obtain an (n, k, d) MSR code with α = (t−1)(d−k+1) ,

(t−1)
where t ≥ d
d−k+1 is an integer. This code is based on multilinear
2
The construction claim in [204] of an MSR code with d < (n − 1) and low
sub-packetization level is incorrect, as pointed out by the authors of [204] in their
revised posting on arXiv [206].
614 MSR Codes

algebra, as is the Moulin code construction described in Sec-


tion 7.3. The constructions in [55], [190], [200] are non-explicit as
the Combinatorial Nullstellensatz [5] is employed to establish the
data-collection property.

3. Systematic MSR codes: Vector MDS codes for which the optimal
repair property holds only for the systematic nodes are referred to
as systematic MSR codes. An early construction of a systematic
MSR code with β = 1 for the case d = (n − 1) ≥ (2k − 1) can
be found in [212]. In a subsequent paper [227] that builds upon
[212], the authors provide a construction for MDS codes that can
repair both systematic as well as parity nodes under the restriction
d ≥ 2k − 1, n ≥ 2k, under the assumption that all the un-erased
systematic nodes participate in node repair. Thus for the case
d = (n − 1) ≥ 2k − 1, the construction in [227] yields an MSR
code. Other early constructions of systematic MSR codes with
d = n − 1 can be found in [32], [229]. A general construction,
valid for all (n, k, d) parameters sets, first appeared in [76]. A
k−1
lower bound α ≥ r r , where r = n − k, on the sub-packetization
level of linear, systematic MSR codes with d = n − 1 having the
optimal-access property is derived in [233]. It is shown in [11]
k−1
that this can extended to the slightly tighter bound α ≥ r⌈ r ⌉ .
In [2], [33], [249], non-explicit, optimal-access, linear systematic
MSR code constructions with d = n − 1 having α matching the
k−1
lower bound α ≥ r⌈ r ⌉ for k ̸= 1 (mod r) are presented. Explicit
constructions of optimal-access, linear systematic MSR codes with
k−1
d = (n − 1) and α = r⌈ r ⌉ for k = ̸ 1 (mod r) are provided for
(n − k) = 2, 3, in [186]. Optimal-access, linear systematic MSR
k−1
codes with d = n−1 having optimal sub-packetization level r⌈ r ⌉
for k ̸= 1 (mod r) can be constructed over a field of size q ≥ n
using the transformation presented in [137].

4. Optimal-access MSR codes for all (n, k, d) by Ye and Barg: In


[255], apart from the Diagonal MSR code construction, the authors
present the construction of a second class of MSR codes which
we will refer to here as the Permuted-Diagonal MSR construction.
5.4. Small-d MSR Codes 615

This construction yields an optimal-access MSR code for any


(n, k, d) with sub-packetization level α = sn−1 where s = d − k + 1
and where the field size q satisfies q ≥ n + 1. Permuted-Diagonal
MSR codes are the only known explicit optimal-access MSR codes
for any (n, k, d) having field size O(n). As is the case with Diagonal
MSR codes, these codes can also be extended to have the (h, d)
optimal repair property for any h ∈ [2, n − k], d ∈ [k, n − h]. In
[148], a modification of the Permuted-Diagonal MSR code for the
d = n−1 case is presented, which reduces the field size requirement
to q = 3 for even r, and q ≥ r + 1 for odd r where r = n − k.
6
Storage-Repair-Bandwidth Tradeoff

This section deals with storage-repair-bandwidth tradeoffs in the case of


functional and exact repair. As seen in Section 3, in the case of FR the
tradeoff is governed by the following equation, obtained by replacing
the inequality in (3.13) with equality:
k−1
1 = min{ᾱ, (d − i)β̄}. (6.1)
X

i=0

In the present section, we will show that the FR tradeoff takes on


the form of a piecewise linear curve. We will provide a more formal
definition of the ER tradeoff and establish that apart from the MBR
and MSR points, and possibly, a small region adjacent to the MSR
point, the tradeoff under ER is clearly separated from the FR tradeoff.

6.1 Piecewise Linear Nature of FR Tradeoff

The aim here is to show that the locus of the set of pairs (ᾱ, dβ̄) with
ᾱ ≥ 0, dβ̄ ≥ 0, satisfying (6.1), is a piecewise-linear curve, with k corner
points. We begin by partitioning the first quadrant in the (x = ᾱ, y = dβ̄)
plane into the (k + 1) pairwise disjoint regions {Rℓ | ℓ = 0, 1, · · · , k}
identified in Table 6.1.

616
6.1. Piecewise Linear Nature of FR Tradeoff 617

Table 6.1: Partitioning the first quadrant in the (x = ᾱ, y = dβ̄) plane into the
(k + 1) pairwise disjoint regions {Rℓ | ℓ = 0, 1, · · · , k}. The storage-repair-bandwidth
tradeoff under functional repair, is a piecewise-linear curve, represented by a straight
line in each of the (k + 1) regions {Rℓ }.

ℓ (x = ᾱ, y = dβ̄) ∈ Rℓ iff


0 dβ̄ ≤ ᾱ,
1 ≤ ℓ ≤ k − 1, (d − ℓ)β̄ ≤ ᾱ < (d − ℓ + 1)β̄,
k ᾱ < (d − k + 1)β̄.

1.4

1.2

1 (MSR)

0.8

0.6

0.4

(MBR)
0.2

0
0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 6.1: Illustrating the piecewise-linear nature (in red) of the normalized FR
tradeoff for (k = 4, d = 4). The {Pi } denote the k = 4 corner points with P0 , P3
representing the MBR and MSR points respectively.

We will show that in each region Rℓ , 0 ≤ ℓ ≤ k, the locus is a


straight line (see Fig. 6.1).

1. When (x = ᾱ, y = dβ̄) ∈ R0 , (6.1) takes on the form:


!
k 
dk − β̄ = 1,
2

which represents a horizontal straight line.


618 Storage-Repair-Bandwidth Tradeoff

2. When (x = ᾱ, y = dβ̄) ∈ Rℓ , for 1 ≤ ℓ ≤ (k − 1), (6.1) takes on


the form of the straight line
k−1
ℓᾱ + β̄ (d − i) = 1.
X 

i=ℓ

3. When (x = ᾱ, y = dβ̄) ∈ Rk , (6.1) takes on the form

k ᾱ = 1,

which represents a vertical straight line.

This establishes the piecewise-linear nature of the locus of points satis-


fying (6.1). Clearly, there are k corner points {Pℓ , ℓ = 0, 1, · · · , (k − 1)},
with corner point Pℓ corresponding to the point of intersection of the
straight lines associated with adjacent regions Rℓ , Rℓ+1 .


1. For the case ℓ = 0, the coordinates of the corner point Pℓ is hence


obtained by solving
k−1 k−1
(d − i)β̄ = ᾱ + (d − i)β̄, = 1,
X X

i=0 i=1

i.e.,
d
dβ̄ = k
,
dk − 2
ᾱ = dβ̄.
 
This corner point P0 = d
, d corresponds to the nor-
dk−(k2) dk−(k2)
malized values (ᾱ, dβ̄) of an MBR code.

2. For the case



1 ≤ℓ ≤ (k − 2), the coordinates of the corner
point Pℓ = ᾱ, dβ̄ is obtained by solving for ᾱ and dβ̄ from the
equations below:
k−1 k−1
ℓᾱ + (d − i)β̄ = (ℓ + 1)ᾱ + (d − i)β̄ = 1,
X X

i=ℓ i=ℓ+1
6.2. ER Tradeoff 619

i.e.,
k−1
ℓᾱ + (d − i)β̄ = 1,
X

i=ℓ
ᾱ = (d − ℓ)β̄.

3. For the case ℓ = (k − 1), the coordinates of the corner point Pℓ is


obtained by solving
(k − 1)ᾱ + (d − k + 1)β̄ = k ᾱ = 1,
i.e.,
1
ᾱ = = (d − k + 1)β̄.
k
 
This corner point Pk−1 = 1 d
k , k(d−k+1) corresponds to the nor-
malized values (ᾱ, dβ̄) of an MSR code.

6.2 ER Tradeoff

Our aim here is to characterize the normalized pairs (ᾱ, dβ̄) for which
it is possible to construct an ER RGC having parameters (n, k, d) over
some finite field Fq . We begin by noting that for any parameter set
(n, k, d), there exist constructions of ER MSR and ER MBR codes.
In the case of MBR codes this is apparent from the product-matrix
construction of an MBR code. In the case of an MSR code, this is clear
from the Diagonal MSR construction appearing in Section 5.2. Thus
at the points on the FR tradeoff corresponding to the MSR and MBR
points, there exist ER RGCs having the same normalized parameters
as FR RGCs. Since the FR tradeoff represents an outer bound to the
ER tradeoff1 , this tells us that the ER and FR tradeoffs share the MSR
and MBR points in common.
With this in mind, we define the ER tradeoff for fixed (n, k, d)
as the locus of all normalized pairs (ᾱ, dβ̄) that meet the following
requirements:
1
Meaning that for the same parameter set {(n, k, d), (α, β)}, the file size B under
ER is no larger than the file size under FR.
620 Storage-Repair-Bandwidth Tradeoff

• 1
≤ ᾱ ≤ d
,
k dk−(k2)

• There exists an ER RGC having normalized parameters (ᾱ, dβ̄),

• There does not exist an ER RGC having normalized parameters


(x, y) of the form x < ᾱ, y = dβ̄ or x = ᾱ, y < dβ̄ .
 

When we speak of an interior point in the ER tradeoff, we mean a point


lying on this locus having normalized value ᾱ satisfying: k1 < ᾱ <
d
.
dk−(k2)
While much effort has been expended and significant progress on the
problem has been made, complete characterization of the ER tradeoff
still remains open. Clearly, the FR tradeoff provides a trivial outer
bound to the ER tradeoff.

6.3 Non-existence of ER Codes Achieving FR Tradeoff

As in the case of the ER tradeoff, when we speak of an interior point in


the FR tradeoff, we mean a point lying on the piecewise-linear tradeoff
in the FR case (see Section 6.1) having normalized value ᾱ satisfying:
k < ᾱ < dk−(k) . It turns out that there do not exist ER RGCs
1 d
2
whose parameters correspond to an interior point in the normalized FR
tradeoff, apart possibly from a small region adjoining the MSR point.
The precise statement is given below.

Theorem 3. (Theorem 7 in [211]) ER RGCs having normalized param-


eter set (n, k, d), (ᾱ, β̄) such that they correspond to an interior point


on the FR tradeoff do not exist, except possibly for a small region in the
(ᾱ, dβ̄) plane corresponding to the range given below for the parameter
ᾱ:
d−k+1
 
(d − k + 1)β̄ < ᾱ ≤ (d − k + 2) − β̄.
d−k+2

In this subsection, we provide a sketch of the proof of non-existence


only in the case of interior corner points having normalized coordinates
6.3. Non-existence of ER Codes Achieving FR Tradeoff 621

(ᾱ, β̄) satisfying ᾱ = (d − p)β̄, for 1 ≤ p ≤ (k − 2). We will do this by


showing that there do not exist ER RGCs having parameter set

{(n, k, d), (α, β), B, Fq },

where the file-size B satisfies the bound


k−1
min{α, (d − i)β}, (6.2)
X
B ≤
i=0

with equality for α of the form α = (d − p)β for some integer p, with
1 ≤ p ≤ (k − 2). We refer the reader to [211] for the complete proof.
Proof: We follow the derivation in [211]. Let C be an ER RGC
having parameter set {(n, k, d), (α, β), B, Fq } with α = (d − p)β, 1 ≤
p ≤ (k − 2), that satisfies the cut-set bound in (6.2) with equality. We
will show that this leads to a contradiction. We restrict attention in
the proof, to a subset D of (d + 1) nodes that by themselves, form a
regenerating code C D having parameter set {(d + 1, k, d), (α, β), B} that
clearly, also achieves the cut-set bound with equality. We continue to
adopt the notation WA , SA B etc., that was introduced in Section 3.2. Let

A ⊆ D denote a subset of D of size |A| = k. We have that

B = H(WA ) ≤ H(WD )
 n o 
= I WD ; SD\ℓ

ℓ∈D
n o 

≤ H SD\ℓ
ℓ∈D
n o 
= H D\{m}
Sm
m∈D
X
D\{m} 
≤ H Sm
m∈D

=
X
β
m∈D
= (d + 1)β.

In arriving at this result, we have used two properties established in


[211]. For any pair of distinct nodes {ℓ, m} ⊆ D, we have

H(Sm ) = β,
622 Storage-Repair-Bandwidth Tradeoff

and this appears as Property 3 in [211]. For any three distinct nodes
{ℓ1 , ℓ2 , m} ⊆ D, we have :
ℓ1
H Sm ℓ2
| Sm ) = 0,
and this appears as part of Property 5 in [211] (after setting the pa-
rameter θ appearing in [211] to 0 since our focus here is only on corner
points that lie within the interior).
On the other hand, if an ER RGC attains the cut-set upper bound in
(6.2) with α = (d−p)β, for p an integer lying in the range 1 ≤ p ≤ (k−2),
we must have
k−1
B = min{α, (d − i)β}
X

i=0
k−1
= min{(d − p)β, (d − i)β}
X

i=0
k−1
= 2(d − p)β + min{(d − p)β, (d − i)β}
X

i=2
≥ 2(d − p)β + (k − 2)β
≥ (d + 2)β,
leading to a contradiction. □

6.4 Outer Bounds on the Tradeoff Under ER

ER Codes Cannot Have Tradeoff Approaching the FR Tradeoff A


first result in this direction, was established by Tian [237] for the specific
parameter set (n = 4, k = 3, d = 3). Tian was able to show that for these
parameter values the file size B under exact repair is upper bounded by
B ≤ 4α + 6β. This result was arrived at via a computer-aided proof that
makes use of the linear programming approach to inequalities involving
entropic expressions introduced by Yeung [110], [258]. When this result
was compared to the FR tradeoff for the same code parameters, a
non-vanishing gap between ER and FR tradeoffs can be seen, as shown
in Fig. 6.2.
The general version of this result, namely that the ER tradeoff is
strictly away from the FR tradeoff for every set (n, k, d) of parameters,
6.4. Outer Bounds on the Tradeoff Under ER 623

1.0 ER Tradeoff
FR Tradeoff
0.9

0.8
d

0.7

0.6

0.5
0.325 0.350 0.375 0.400 0.425 0.450 0.475 0.500

Figure 6.2: Normalized ER tradeoff for (n = 4, k = 3, d = 3)

was first established by Sasidharan et al. [198]. This is established for


all interior points other than those lying in the small region adjacent
to the MSR point identified in Theorem 3. This result was obtained
by exploiting the proof of the non-existence result appearing in [211],
and described in brief in Section 6.3 above, to show that there is a
non-vanishing gap between a lower bound on file size under FR and an
upper bound on file size under ER.
The proof of a non-vanishing gap in file size between the cases of
ER and FR, corresponds to providing an upper bound on file size under
ER that is tighter than in the case of FR. This upper bound on file size
under ER, was subsequently improved in [57].

The (n, d, d) Case For the case k = d, the following outer bound on
the maximum file size of a linear RGC,
d+1 d
 
B ≤ ℓα + β , (6.3)
ℓ+2 ℓ+1
where ℓ = ⌊dβ/α⌋ ∈ {0, 1, . . . , d}, was derived in [173] by carefully
analyzing the p-c matrices of ER RGCs. This bound establishes a
piecewise linear outer bound on the ER tradeoff as it applies to linear
RGCs, that is tighter than the bound provided by the FR tradeoff. The
same bound was independently derived in [60] by solving an optimization
problem involving the file-size of a linear ER RGC.
624 Storage-Repair-Bandwidth Tradeoff

In [238], the outer bound in (6.3) is shown to hold for the specific
parameter set (n = 5, k = 4, d = 4) even in the case of a general ER
RGC, by adopting a computational approach to handling information-
theoretic inequalities. By a general ER RGC, we mean an ER RGC
that is not necessarily linear. A subsequent outer bound, appearing in
[56] and that also applies to a general ER RGC, coincides with that
in [60], [173] when specialized to the linear setting and to the case of
parameter sets of the form (n, d, d).
The class of Determinant codes [61], [63] discussed in Section 7.1,
turn out to achieve the outer bound in (6.3). This both establishes the
tightness of the bound in (6.3) as it applies to linear ER RGCs, as well
as the optimality of Determinant codes when one restricts attention to
linear ER RGCs.

The Best-Known Upper Bound on File Size Under ER In [162],


Mohajer and Tandon derived an upper bound
 p(2(d−k)+p+1)β 
(3k−2p)α+ +(d−k+1) min{α,pβ}
B ≤ min 2
(6.4)
0≤p≤k 3
on file size for the general case, that was significantly tighter than
prior bounds in the literature. The bound was derived by bounding
the conditional joint entropy of certain repair data random variables in
three different ways and adding them together to cancel out a few terms,
that were otherwise difficult to estimate. The above Mohajer-Tandon
bound was improved in [203] leading to a strictly better bound for the
case d > k. The improved Mohajer-Tandon bound derived in [203] is
given by
 (1+2a)p(2(d−k)+p+1)β 
α(2(k−p)(1+a)+k(1+2a))+b min{α,pβ}+
B ≤ min 2
(6.5)
0≤p≤k 3 + 4a
where d − k + 1 = a(p − 1) + b and 0 ≤ b < (p − 1). The bound in
[203] adopts the same approach as in [162]. The improvement arises
from identifying the symmetry in certain entropic terms observed by
representing repair data random variables in matrix form, and leveraging
this symmetry to avoid the need for employing certain union bounds.
The improved Mohajer-Tandon bound remains the best-known outer
bound on ER tradeoff for general (n, k, d).
6.4. Outer Bounds on the Tradeoff Under ER 625

Open Problem 5. Characterize the storage-repair-bandwidth tradeoff


of an (n, k, d) regenerating code under exact repair.

Remark 6. The parameters of the Cascade and Moulin codes (described


in Section 7) provide the best known inner bound to the ER tradeoff.
7
Interior-Point ER Codes

Let (ᾱMSR , dβ̄ MSR ) and (ᾱMBR , dβ̄ MBR ) denote the (ᾱ, dβ̄) values at
the MSR and MBR points respectively, given by:
1 d
 
(ᾱMSR , dβ̄ MSR ) = , ,
k k(d − k + 1)
d d
 
(ᾱMBR , dβ̄ MBR ) = ,  .
dk − k2 dk − k2


An ER RGC with normalized parameters {(n, k, d), (ᾱ, β̄)} will be


said to be an interior-point ER (IP-ER) RGC if ᾱMSR < ᾱ < ᾱMBR .
Given an ᾱ0 lying strictly between ᾱMSR and ᾱMBR , and the minimum
possible value β̄ = β̄ 0 attainable by an ER RGC, the code operating
at the (ᾱ0 , dβ̄ 0 ) point will be said to be an optimal IP-ER RGC. The
locus of all such points (ᾱ0 , dβ̄ 0 ) is the ER storage-repair-bandwidth
tradeoff. Clearly, (ᾱMBR , dβ̄ MBR ) and (ᾱMSR , dβ̄ MSR ) are at the two
extreme ends of the ER tradeoff.
A listing of some of the codes in the literature that attain, or are
conjectured to attain, some portion in the interior of the storage-repair-
bandwidth tradeoff under ER is given in Table 7.1.
In this section, we will first present an (n, d, d) construction, i.e., a
construction for the case k = d. The associated RGC will be referred

626
7.1. Determinant Code 627

Table 7.1: A listing of some of the codes in the literature that attain, or are
conjectured to attain, some portion in the interior of the storage-repair-bandwidth
tradeoff under exact repair.

Code Parameter Extent to which Construction


Set Attains the ER Tradeoff
(4, 3, 3) Tian [237] (4, 3, 3) Achieves entire ER tradeoff
Canonical (n, d, d) Achieves single
Layered [236] point for (n, n − 1, n − 1) case
Improved (n, k, d) Entire ER tradeoff for
Layered [210] (n, k = 3, n − 1); Achieves single
point for (n, k = 4, n − 1) case
Determinant [61], [63] (n, d, d) Achieves entire ER tradeoff
applicable to linear RGCs
Cascade [64]
Conjectured to achieve
(n, k, d)
entire ER tradeoff
Moulin [54]

to as the Signed Determinant code. This construction has an auxiliary


parameter σ ∈ Zd . Setting σ = 0 yields the Determinant code presented
in [61], [63] that attains the storage-repair-bandwidth tradeoff as it
applies to linear ER RGCs for the (n, d, d) case. A discussion on the
tradeoff in the (n, d, d) case, applicable to linear ER RGCs, can be
found in Section 6.4. Following this, we will briefly discuss two code
constructions, namely the Cascade code construction and the Moulin
code construction, due respectively, to Elyasi and Mohajer [64] and
Duursma et al. [54] sharing identical (ᾱ, β̄) parameters for given (n, k, d).
These codes yield the best-known inner bound to the storage-bandwidth
tradeoff under ER. It is conjectured in [64] that the tradeoff achieved
by the Cascade code construction (and hence also by the Moulin code
construction) represents the storage-repair-bandwidth tradeoff under
exact repair.

7.1 Determinant Code

Signed Determinant Code Let σ ∈ Zd be a fixed d-length vector


of integers. Let σ(j) denote the jth entry of σ. We now describe the
628 Interior-Point ER Codes

Signed Determinant code due to Elyasi and Mohajer [61], [63], [64]
for parameters (n, d, d), i.e., for the case k = d. The code is called the
Signed Determinant code because of the sign factor introduced by the
components of σ. It turns out that if one is interested solely in the case
k = d, i.e., the (n, d, d) case, one can set the vector σ = 0, i.e., σ(j) = 0,
all j ∈ [d]. As noted above, setting σ = 0 yields the Determinant code
construction appearing in [61], [63]. While both papers [61], [63] describe
the same Determinant code, the repair process described in [63] has
the advantage that the helper data supplied by a helper node does not
depend upon the identity of the remaining (d − 1) helper nodes1 . We
have retained σ in the expressions below, as the vector σ is needed when
the Signed Determinant code is used as a building block to construct
Cascade code [64]. Our description of the Signed Determinant code
below, follows the description of the code given in [64]. The repair
process of the Signed Determinant code described below is helper-set
independent.
The Signed Determinant construction is parameterized by an integer
variable m, with 1 < m < d. The associated (α, β, B) parameters are
then given by:
!
d
αm = ,
m
d−1
!
βm = , (7.1)
m−1
d+1
! ! !
d d
Bm = m +m = m . (7.2)
m m+1 m+1
Let
V = {vAj | A ⊂ [d], |A| = m, j ∈ A},
W = {wSj | S ⊆ [d], |S| = m + 1, j ∈ S}
be two sets of symbols that take on values in a finite-field F. Let
W ′ = {wSj | wSj ∈ W, j ̸= max S}
1
This is the helper-set-independent property described in Section 5.4. It turns
out that in the case of the Determinant and Signed Determinant codes that the
repair process is linear and involves constant repair matrices. These terms are defined
in Section 5.4.
7.1. Determinant Code 629

be a subset of W. Then V ∪ W ′ is of size Bm and this is the set of


message symbols associated to the data file being stored. The symbol
wS,max S for every S is determined by the p-c equation

(−1)τS (j) wSj = 0,


X

j∈S

where τS (j) is the position of j, given that the elements of S are listed
in ascending order. In other words, τS (j) = |{i ∈ S | i ≤ j}| for any
j ∈ S. The symbols in V ∪ W are used to populate two matrices V, W
having respective size ( m d d 
× d) and ( m+1 × d). The two matrices
will respectively be referred to as the V -array and the W -array. The
rows of the V -array are indexed by m-subsets of [d] and the columns by
1, 2, . . . d. The symbol vAj ∈ V occupies a cell in the V -array, determined
by row A and column j ∈ A. In similar fashion, the rows of the W -array
are indexed by the (m + 1)-subsets of [d] and the columns by 1, 2, . . . d.
The symbol wSj ∈ W occupies a cell in the W -array, determined by the
row S and the column j ∈ S. Note that each row in the V -array contains
m symbols, the remaining (d − m) cells in each row are empty. Similarly
each row in the W -array contains (m + 1) symbols, the remaining
(d − m − 1) cells in each row are empty.
Next, we will construct a matrix which we will refer to as the data
matrix D (at times, we will also refer to D as the D-array) having m d

rows and d columns. Again, the rows are indexed by m-subsets of [d]
and the columns by 1, 2, . . . , d. The dAj th entry of D is given by:
(
(−1)σ(j) · vAj , if j ∈ A
dAj = .
(−1)σ(j) · wA∪{j},j if j ∈
/A

The population of the data matrix D is illustrated for d = 4 in Fig. 7.1.


The codeword array C of size (αm × n) formed by the contents of the n
nodes each containing αm symbols is generated :

C = DΦ

where D is the (αm × d) data matrix and Φ is a (d × n) generator


matrix. The generator matrix Φ is required to satisfy that set of every
d columns must be of full-rank.
630 Interior-Point ER Codes

1 2 3 4
1 v11 v12 1 2 3 4
2 v21 v22 1 w11 w12 w13
3 v31 v32 2 w21 w22 w23
4 v41 v42 3 w31 w32 w33
5 v51 v52 4 w41 w42 w43
6 v61 v62

1 2 3 4
1 v11 v12 w13 w23
2 v21 w12 v22 w33
3 v31 w22 w32 v32
4 w11 v41 v42 w43
5 w21 v51 w42 v52
6 w31 w41 v61 v62

Figure 7.1: The V , W and D matrices used in the construction of (n, 4, 4) Signed
Determinant code with m = 2 and σ = (0, 0, 0, 0). For simplicity, in the figure, the
m-subsets of {1, 2, 3, 4} have been ordered in lexicographically ascending order and
indexed from 1 to 6. Similarly, the (m + 1)-subsets of {1, 2, 3, 4} have also been
ordered lexicographically and indexed from 1 to 4.

Data Collection: Let J denote the set of k = d nodes from which


data is to be recovered. By construction, the matrix Φ restricted to the
columns indexed by J is invertible. Thus the data matrix D can be
recovered.
Node Repair: Without loss of generality, suppose that the first
node has failed, and that the helper nodes are {2, 3, . . . , d + 1}. Let R
d 
denote a matrix of size ( m−1 × αm ), which we will term as the repair
matrix, since we will use R to generate the repair data that is used
to repair failed node 1. It turns out that R has rank no larger than
βm = m−1 d−1 
so that in practice, R can be replaced by a (βm × αm )
matrix. If ϕj , 1 ≤ j ≤ n denotes the jth column of Φ, then the repair
data passed on by the helper nodes to the failed node is given by

Z0 = R × D × [ϕ2 ϕ3 · · · ϕd+1 ].
7.1. Determinant Code 631

By construction, the matrix [ϕ2 ϕ3 · · · ϕd+1 ] is invertible and thus the


replacement of the failed node has access to the product matrix

Z = RD.

Our goal is to identify a matrix R such that we can recover the contents
c1 = Dϕ1 of the failed node given the product RD. Each row of c1 is
indexed by an m-subset A of [d], and we write cA1 to denote the entry
of c1 in row A.
d 
The matrix R is of size m−1 × md
. The entries of R are completely
determined from the symbols {ϕi1 | 1 ≤ i ≤ d} making up ϕ1 . Thus
R is solely a function of the index of the failed node, index 1 in the
present case. The rows and columns of R are respectively indexed by
(m − 1)-subsets and m-subsets of [d]. If rP A denotes the entry in the
P -th row and A-th column of R, then
(
(−1)σ(y)+τA (y) · ϕy1 , if y exists such that P ∪ {y} = A,
rP A =
0, otherwise.

For an arbitrary A ⊂ [d], |A| = m, we now show that cA1 can be


recovered using the equation:

cA1 = (−1)σ(i)+τA (i) RA\{i} Di (7.3)


X

i∈A

where RA\{i} is the row-vector of R associated to the row A \ {i} and Di


is the ith column-vector of D. We begin by introducing some notation:

A∼i := A \ {i}, A∼i,y := {y} ∪ A \ {i} and Ay := A ∪ {y}.

We then have

(−1)σ(i)+τA (i) RA\{i} Di


X

i∈A

= (−1)σ(i)+τA (i)
X X
rA∼i ,L dLi
i∈A L⊂[d],|L|=m

= (−1)σ(i)+τA (i) rA∼i ,A dAi


X

i∈A

+ (−1)σ(i)+τA (i)
X X
rA∼i ,A∼i,y dA∼i,y ,i
i∈A y∈[d]\A
632 Interior-Point ER Codes

=
X
ϕi1 dAi
i∈A
σ(y)+τA∼i,y (y)
+ (−1)σ(i)+τA (i) (−1)
X X
ϕy1 dA∼i,y ,i
i∈A y∈[d]\A

=
X
ϕi1 dAi
i∈A
σ(i)+σ(y)+τA (i)+τA∼i,y (y)
+ (−1) (−1)σ(i) wAy ,i . (7.4)
X X
ϕy1
y∈[d]\A i∈A

To proceed further, we observe that for i ̸= y:

τA (i) + τA∼i,y (y)


= |{u ∈ A | u ≤ i}| + |{u ∈ A∼i,y | u ≤ y}|
= |{u ∈ Ay | u ≤ i}| − 1(y < i) + |{u ∈ Ay | u ≤ y}| − 1(i < y)
= |{u ∈ Ay | u ≤ i}| + |{u ∈ Ay | u ≤ y}| − 1
= τAy (i) + τAy (y) − 1.

Substituting back in (7.4), we have

(−1)σ(i)+τA (i) RA\{i} Di


X

i∈A

=
X
ϕi1 dAi
i∈A

+ (−1)σ(y)+τAy (y) ϕy1 (−1)σ(i)+τAy (i)−1 (−1)σ(i) wAy ,i


X X

y∈[d]\A i∈A
" #
= ϕi1 dAi + (−1) σ(y)+τAy (y)
(−1)
τAy (i)
X X X
ϕy1 − wAy ,i
i∈A y∈[d]\A i∈A

= ϕi1 dAi + (−1)σ(y)+τAy (y) ϕy1 · (−1)τAy (y) wAy ,y


X X

i∈A y∈[d]\A

= ϕi1 dAi + ϕy1 dAy = dAi ϕi1 = cA1 .


X X X

i∈A y∈[d]\A i∈[d]

In this way, we are able to recover all the contents {cA1 A ⊂ [n], |A| = m}
of node 1. As noted earlier, the matrix R can be shown to have rank
at most m−1d−1 
[64] and thus the repair bandwidth is no larger than
d−1 
βm = m−1 .
7.2. Cascade Code 633

Determinant Code A careful reading of the derivation of the data-


collection and node-repair properties of the Signed Determinant code
shows that the arguments go through regardless of the specific value
assigned to vector σ ∈ Zd . In particular, we can set the vector σ = 0,
i.e., σ(j) = 0, all j ∈ [d]. Setting σ = 0 yields the Determinant code
construction appearing in [61], [63] under the helper-set-independent
repair process appearing in [63]. The ER tradeoff achieved by Deter-
minant codes coincides with the piece-wise linear outer bound given
in (6.3) if we set the parameter ℓ appearing in the bound, to equal the
parameter m of the Signed Determinant code. Thus as noted previously
in Section 6.4, the performance of Determinant codes characterizes the
storage-repair-bandwidth tradeoff as it applies to a linear ER RGC
having parameters of the form (n, d, d).

7.2 Cascade Code

In [64], the authors introduce a code termed as the Cascade code, that is
constructed using multiple Signed Determinant codes as building blocks.
The resulting Cascade code, as well as the Moulin code described below
in Section 7.3, both have the best-known storage-repair-bandwidth
tradeoff offered by an ER RGC, operating at an interior point. The
parameters of a Cascade code for given (n, k, d) are given by:
µ !
k
α(µ) = (d − k)
X
µ−m
,
m=0
m
µ
k−1
!
β(µ) = (d − k)
X
µ−m
,
m=0
m−1
µ ! !
k k
B(µ) = (7.5)
X
µ−m
k(d − k) − ,
m=0
m µ+1

where µ, 1 ≤ µ ≤ k, is an auxiliary integer parameter. In evaluating


these expressions, we set ℓℓ12 = 0 if ℓ2 < 0 or ℓ2 > ℓ1 . It can be verified


that setting µ = 1 yields an MBR code, setting µ = k, an MSR code.


In addition, it turns out that setting µ = k − 1 also yields a point
on the FR tradeoff, close to the MSR point. The construction of the
Cascade code is linear and the field size is Θ(n). As is the case with the
634 Interior-Point ER Codes

Determinant code, the repair process is helper-set-independent. Setting


d = k leads to the parameters of a Determinant code.

0.55

0.5 FR tradeoff
Cascade-Moulin

0.45

0.4

0.35

0.3
0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34

Figure 7.2: Comparing the common storage-repair-bandwidth performance of the


Cascade and Moulin codes against the corresponding FR tradeoff for the parameter
set (n = 8, k = 4, d = 6). (To identify the auxiliary parameter appearing in the
Moulin code construction, we set s = µ + 1.)

The performance achieved by the Cascade code is compared in


Fig. 7.2, with the FR tradeoff for an example parameter set (n = 8, k =
4, d = 6). It has been conjectured in [64] that the performance of the
Cascade code (and hence of the Moulin code described below as well),
characterizes the piecewise-linear, tradeoff of an ER RGC.

7.3 Moulin Code

The Moulin2 code is a linear ER RGC due to Duursma et al. [54] that
is described in terms of a multilinear algebra framework. In the Moulin-
code framework, each codeword is associated to a linear functional
acting on a parent vector space, i.e., a linear transformation from the
parent vector space to its underlying field of scalars. The symbols stored
in a node are obtained by evaluating the linear functional on elements of

2
Name given by the authors of [54] who indicate that their choice of name was
inspired by the words cascade (waterfall or moulin) and multilinear algebra.
7.3. Moulin Code 635

a subspace of the parent vector space. The repair data transferred from
a helper node are derived by evaluating the linear functional on elements
of a subspace properly contained within the subspace associated with
the helper node. We limit ourselves in this subsection to providing a brief
description of the Moulin-code construction, to convey a sense of the
multilinear algebra framework on which the construction is based, and
refer the reader to [54] for the more complete mathematical description.
Since the Moulin code and Cascade code have the same values of
(α, β, B) for a given (n, k, d), they offer the same performance. (We
identify auxiliary parameter s in the case of the Moulin code with the
parameter µ + 1 in the case of the Cascade code). Thus, both have
the same sub-packetization level. The Moulin code also has a linear
field-size requirement and even here, repair data passed on by a specific
helper node to a failed node, does not depend upon the identity of the
remaining (d − 1) helper nodes.

Multilinear Algebra Background Let X, Y be two finite-dimensional


vector spaces over a field F having ordered bases {x1 , x2 , . . . , xℓ1 } and
{y1 , y2 , . . . , yℓ2 } respectively. The tensor product X ⊗ Y of X and Y is
an ℓ1 ℓ2 -dimensional vector space consisting of all tensors

x⊗y = bj yj = ai bj (xi ⊗ yj ),
X X X
ai xi ⊗
i j i,j

where (xi ⊗ yj ) may be regarded as an unbreakable expression. The


tensor product extends naturally to more than two vector spaces.
Let V and W be vector spaces over F of dimension d − k and
k respectively, so that U = V ⊕ W is isomorphic to Fd . Let BW =
{w1 , w2 , . . . , wk } be a basis of W . We define T p W as the p-fold tensor
product of W with itself:
 

 


 

= aj · (wj1 ⊗ wj2 ⊗ · · · ⊗ wjp ) | aj ∈ F ,
X
T pW
 


 j=(j ,j
1 2 ,...,j p ) 


ji ∈[k],∀i

where the sum is a formal sum and where wj1 ⊗ wj2 ⊗ · · · ⊗ wjp may be
regarded as an unbreakable expression. We set T 0 W = F, T 1 W = W .
636 Interior-Point ER Codes

Next we define Λq W as the exterior product of W with itself q times:


 

 


 

Λq W = aj · (wj1 ∧ wj2 ∧ · · · ∧ wjq ) | aj ∈ F ,
X
 
 j=(j1 ,j2 ,...,jq )

 


1≤j1 <j2 <···<jq ≤k

where once again, the sum is a formal sum and wj1 ∧ wj2 ∧ · · · ∧ wjq
is to be regarded as an unbreakable expression. Here as well, we set
Λ0 W = F, Λ1 W = W . Clearly dim(T p W ) = k p and dim(Λq W ) = kq .


For any vector space S over F, we define the dual space S ∗ to be the
space of all functionals from S to F. As is well-known, in the finite-
dimensional case, S is isomorphic to S ∗ .

Structure of the Moulin code Given any set of integers (n, k, d, s)


satisfying:
1 ≤ (s − 1) ≤ k ≤ d ≤ (n − 1),
there exists a Moulin ER {(n, k, d), (α, β), B, F} RGC over a finite field
F where
!
k
α = (d − k)p
X
,
p≥0,q≥0
q
p+q+1=s

k−1
!
β = (d − k)p
X
,
p≥0,q≥0,
q
p+q+2=s
! !
k k
B = (d − k)p (7.6)
X X
d(d − k)p − ,
p≥0,q≥0,
q p,q:p≥0,q≥0,
q
p+q+1=s p+q=s

and where the field size |F| satisfies |F| ≥ n. In the description above, s
is an auxiliary parameter of the Moulin-code construction. Thus one
sees from (7.5) and (7.6), after setting s − 1 = µ, that the Moulin code
and Cascade codes share identical parameters.
The data file or equivalently, codeword being stored, is identified
with a linear functional ϕ acting on the vector space M given by
M = T p V ⊗ U ⊗ Λq W.
M

p≥0,q≥0
p+q+1=s
7.3. Moulin Code 637

Thus ϕ is an element of the dual space


 ∗

M∗ =  T p V ⊗ U ⊗ Λq W 
 M 
 
p≥0,q≥0
p+q+1=s

= (T p V ⊗ U ⊗ Λq W )∗ .
M

p≥0,q≥0
p+q+1=s

When d = k, the summand T p V ⊗ U ⊗ Λq W reduces to W ⊗ Λq W


 

because V vanishes and U degenerates to W . The codewords in the


resultant (n, k, k) code corresponding to functionals acting on the direct
sum

M(n,k,k) = W ⊗ Λq W.
M

0≤q≤(s−1)

It is explained in [54] how the Moulin code construction for the general
(n, k, d) case, can be viewed as being made up layers of (n, k, k) Moulin
codes.

File Size Computation To estimate the file size B, we follow [54] and
introduce the following terminology:

V -spaces: {T p V ⊗ Λq W | p ≥ 0, q ≥ 0, p + q = s},
U -spaces: {T p V ⊗ U ⊗ Λq W | p ≥ 0, q ≥ 0, p + q + 1 = s},
W -spaces: {T p V ⊗ W ⊗ Λq W | p ≥ 0, q ≥ 0, p + q + 1 = s}.

As mentioned above, each codeword is associated to a functional ϕ


belonging to the dual space

M∗ = (T p V ⊗ U ⊗ Λq W )∗ .
M

p≥0,q≥0
p+q+1=s

However, we do not pick every functional within M∗ , but only those


that satisfy certain parity-checks.
For any p ≥ 0, q ≥ 1, we define a transformation, termed as cowedge
multiplication, by ∇ : T p V ⊗ Λq W → T p V ⊗ W ⊗ Λq−1 W in a recursive
manner. Notice that the domain and range of ∇ are elements belonging
638 Interior-Point ER Codes

to V -spaces and W -spaces respectively. Let ν ∈ T p V , ω ∈ Λq W . For


q = 1,

∇(ν ⊗ ω) = ν ⊗ ω,

and for q = 2, ω = w1 ∧ w2 for some w1 , w2 ∈ W ,

∇(ν ⊗ ω) = ∇(ν ⊗ (w1 ∧ w2 ))


= ν ⊗ w1 ⊗ w2 − ν ⊗ w2 ⊗ w1 .

For q > 2, ω = ω1 ∧ w1 for some ω1 ∈ Λq−1 W, w1 ∈ W ,

∇(ν ⊗ ω) = ∇(ν ⊗ ω1 ∧ w1 )
= ∇(ν ⊗ ω1 ) ∧ w1 + (−1)q−1 ν ⊗ w1 ⊗ ω1 .

The parity-check constraints are then given by the following equations.


For every ν ∈ T p V and ω ∈ Λq W for all possible p ≥ 1, q ≥ 1 and
p + q = s, the following constraints need to be satisfied:

ϕ(ν ⊗ w) = ϕ(∇(ν ⊗ w)). (7.7)

We have the additional constraints

ϕ(∇(ω)) = 0, for ω ∈ Λs W, (7.8)


ϕ(ν) = 0, for ν ∈ T V. s
(7.9)

Thus a data file is represented by a functional in M∗ that satisfies


(7.7), (7.8) and (7.9). The file-size can be obtained by subtracting from
dim(M∗ ) the number of linearly-independent parity-check constraints.
The quantity dim(M∗ ) = dim(M) is the sum of dimensions of vector
spaces that belong to the class of U -spaces. The p-c constraints are in
one-one correspondence with the elements of the V -spaces. Thus by
subtracting the dimensionality of the sum of the V -spaces from the
dimensionality of M, we obtain the lower bound on file size B given
below
! !
k k
(d − k)p (7.10)
X X
p
B ≥ d(d − k) − .
p≥0,q≥0,
q p≥0,q≥0,
q
p+q+1=s p+q=s
7.3. Moulin Code 639

Node Contents Let A be a (d × n) matrix over F having the property


that

• Any d columns of A are linearly independent and

• If B is the (k × n) sub-matrix of A obtained by selecting the first


k rows of A, then any k columns of B are linearly independent.

We identify the column space of A with the vector space U , both having
dimension d over F. Let {u1 , i = 1, 2, · · · , n}, be the elements in U
associated with the columns of A. In the Moulin-code construction, the
ith node stores

(T p V ⊗ ui ⊗ Λq W )},
M
{ϕ(v) | v ∈
p≥0,q≥0
p+q+1=s

where ϕ is the functional associated to the data file stored. The number
of symbols stored is thus given by
!
k
α = (d − k) (7.11)
X
p
.
p≥0,q≥0
q
p+q+1=s

Repair Download In the Moulin-code construction, the β repair sym-


bols sent by helper node h to replace node f are given by:

{ϕ(v) | v ∈ ∂uUf (T p V ⊗ uh ⊗ Λq W ), such that p + q + 2 = s},

where ∂uUf is a certain co-boundary operator ([54] for details). It is


shown (in [54]) that the number of symbols transferred to the helper
node is given by

k−1
!
β = (d − k)p (7.12)
X
.
p≥0,q≥0,
q
p+q+2=s

Notes

1. Early interior-point constructions: The first IP-ER RGC having


parameters (n, d, d) was constructed in [202], [236] by the process
640 Interior-Point ER Codes

of carefully layering codewords belonging to an MDS code having


parameters [w + n − d, w] where w ≥ 2 is an auxiliary parameter.
This code was shown to achieve a point on the FR tradeoff in the
near-MSR region. The construction makes use of block designs and
can be extended to build (n, k, d) IP-ER RGCs by precoding the
symbols using an MDS code whose code symbols correspond to the
evaluation of linearized polynomials. This linearized-polynomial
approach, results however, in significantly large field-size. In [210],
a thumb-rule to transform an (n, d, d) code to an (n, k, d) without
field-size increase is presented, that increases instead, the value
of sub-packetization level α. The resultant code construction,
referred to as the Improved Layered Code construction, turns
out to yield codes achieving the entire ER tradeoff for the case
(n, k = 3, d = n − 1). The construction also achieves a single
interior point on the ER tradeoff associated with the parameter
set (n, k = 4, d = n − 1).
8
Lower Bounds on Sub-Packetization Level of
MSR Codes

As has been the case in much of this monograph, we will restrict our
attention in this section to linear RGCs, where linearity is as defined in
Section 3. We will present two lower bounds1 on the sub-packetization
level α of an MSR code having parameters
 
(n, k, d = (n − 1)), (α = (n − k)β, β), B = αk, Fq .

The first bound [11] is applicable to MSR codes possessing the optimal-
access property and builds on the derivation of a prior bound, applicable
in the case of systematic-node repair and appearing in [233]. The second
bound [6] applies to MSR codes in general. More recently, in independent
work, the results in [6] have been improved upon in [7] and [9]. This
improved bound is briefly discussed in the notes subsection. We set
r = (n − k) throughout the section.

1
A brief overview of known bounds on sub-packetization for the case d < (n − 1)
is given in the notes subsection.

641
642 Lower Bounds on Sub-Packetization Level of MSR Codes

8.1 Properties of Repair Subspaces

We begin with some helpful notation. We assume throughout this section


that the linear MSR code is encoded in systematic form, that the n nodes
are ordered, and that the first k nodes, denoted by {u1 , u2 , · · · , uk }
store the message symbols, while the remaining r nodes denoted by
{p1 , p2 , · · · , pr } store parity. Let ℓ, 2 ≤ ℓ ≤ (k − 1), be an integer and
set
U = {u1 , u2 , · · · , uℓ },
V = {uℓ+1 , uℓ+2 , · · · , uk },
P = {p1 , p2 , · · · , pr }.
While the sets U, V are clearly functions of ℓ, for much this section,
ℓ can be regarded as a fixed integer and for this reason, we use the
simplified notation above. Note that our choice of ℓ ensures that neither
U nor V is empty.

Generator Matrix of MSR Code


Since the MSR code C is systematic, its generator matrix G can be
expressed in the form:

0 0

Iα ...
0 Iα ... 0
 
 
.. .. .. ..
 
. . . .
 
 
 
 0 0 ... Iα 
G = , (8.1)
 
 Ap1 u1 Ap1 u2 . . . Ap1 uk


 
 A Ap2 u2 . . . Ap2 uk
 p2 u1

.. .. .. ..

 

 . . . . 

Apr u1 Apr u2 . . . Apr uk
where each Aij is an (α × α) sub-matrix. The data-collection property of
an MSR code forces every (k × k) block sub-matrix of G to be invertible.
From this and elementary (block) row reduction, it follows that each
sub-matrix Aij is necessarily invertible.
Rows (i − 1)α + 1, iα of G are associated to the α contents of the
 

ith node in the list of nodes {u1 , . . . , uk , p1 , . . . , pr }. Each codeword in


8.1. Properties of Repair Subspaces 643

C can then be expressed in the form:


Gm = [cTu1 . . . cTuk cTp1 . . . cTpr ]T ,
where m ∈ Fkq is the underlying vector of kα message symbols and cui
is the code symbol associated to node ui . Let cTui = [ci1 , ci2 , . . . , ciα ].

Repair Matrices and Subspaces


Since the RGC is assumed to be linear, β symbols transferred from
node p to node u for the repair of node u are of the form Spu cp where
Spu is a (β × α) matrix. For any matrix A, we use the notation Ab to
refer to the subspace spanned by the rows of matrix A. We will refer
to any matrix of the form Spu as a repair matrix and the associated
subspace Sbpu as a repair subspace.

Interference Alignment
In the repair of an MSR code, a phenomenon called interference align-
ment takes place. While the explanation of the phenomenon presented
below is for the case d = (n − 1) that is the focus of the present section,
the principle extends to the case d < (n − 1).
Consider the repair of a systematic node ui . The helper information
passed on by the xth parity node px is a collection of β symbols, each
of which is a linear combination of the kα message symbols {cjm |
1 ≤ j ≤ k, 1 ≤ m ≤ α} making up the data file. Of this, only the
portion of this information that is a linear combination of the α symbols
{cim | 1 ≤ m ≤ α} contributes directly to the reconstruction of the
contents of the failed node.
The helper information passed on by the jth, systematic node
uj , j ̸= i, plays only an indirect role in the reconstruction of node ui .
This is because, the set of kα message symbols constitutes a collection
of kα independent scalar random variables. It follows that the role of
the jth systematic node uj , j ̸= i, in the repair process, is to supply
a set of β symbols, that will allow the undesired contribution from
each parity node that is a linear combination of the contents of the jth
systematic node, to be cancelled out. For this to happen, the repair
information passed on by the r parity nodes must be linearly aligned
644 Lower Bounds on Sub-Packetization Level of MSR Codes

so as to permit such cancellation by a collection of just β symbols


from the systematic node uj . This requirement is one of two conditions
referred to as interference alignment. The other requirement is that the
component of helper information passed on by the r parity nodes that
is a linear combination of the contents {cim | 1 ≤ m ≤ α} of the ith
node, suffices for reconstruction of the failed node.
The lemma below formally phrases in matrix form these interference
alignment conditions. Given a subspace S and a matrix A, we define
the vector space SA := {v T A | v ∈ S}.

Lemma 4 (Interference Alignment). With notation as introduced above,


for every 1 ≤ i, j ≤ k, j ̸= i, we must have

(a) (interference cancellation condition)

Sbuj ui = Sbp1 ui Ap1 uj = · · · = Sbpr ui Apr uj .

(b) (full-rank condition)


 
Sp1 ui Ap1 ui
rank 
 ..  = α.


. 
Spr ui Apr ui

The lemma appears in [233]. A formal proof can be found for example,
in [11].
We will now use Lemma 4 to prove a second Lemma, Lemma 5 that
appears in [11] and which deals with the intersection of repair subspaces.
Lemma 5 will be used in turn, to establish the lower bounds on the
sub-packetization level of an optimal-access MSR code. The statement
of Lemma 5 is illustrated in Fig 8.1. Recall that 2 ≤ ℓ ≤ (k − 1),

U = {u1 , u2 , · · · , uℓ },
V = {uℓ+1 , uℓ+2 , · · · , uk },
P = {p1 , p2 , · · · , pr }.
8.1. Properties of Repair Subspaces 645

dim 𝐽! ∩ 𝐽" ≤
𝛽 𝐽"
𝐽! 𝑟

dim(𝐽! ) = 𝛽 dim(𝐽" ) = 𝛽

𝛽
dim(𝐽! ∩ 𝐽" ∩ 𝐽# ) ≤
𝑟"
𝛽 𝛽
dim 𝐽! ∩ 𝐽# ≤ dim 𝐽" ∩ 𝐽# ≤
𝑟 𝑟

𝐽#
dim(𝐽# ) = 𝛽

Figure 8.1: The figure illustratesnthe inequality appearing


o in Lemma 5, equation
(8.3). Each of the three subspaces bpui | i = 1, 2, 3
Ji := S has dimension β. Their
β
pairwise intersection has dimension ≤ that is smaller by at least a factor of r. The
r
intersection of all three of the subspaces has dimension ≤ rβ2 that is thus smaller by
at least a factor of r2 .

Repair Subspace Intersection


Lemma 5 (Repair Subspace Intersection).
 
r
!
dim ≤ dim  (8.2)
X \ \
Sbpi u Sbpu  ,
i=1 u∈U u∈U \{uℓ }

where p is an arbitrary
T node
 in P , i..e, an arbitrary parity node. Fur-
thermore, dim u∈U Spu is the same for all p ∈ P . Hence for any
b
p ∈ P,
T 
dim u∈U \{uℓ } Spu
! b
dim (8.3)
\
Sbpu ≤ .
u∈U
r

Proof.

(a) Invariance of Dimension of ℓ-fold Intersection of Repair Subspaces


Contributed by a Parity Node
646 Lower Bounds on Sub-Packetization Level of MSR Codes

Clearly, the nodes in U ∪V are the systematic nodes and Tthe nodes
in P are the parity nodes. We will first prove that dim u∈U Sbpu
is the same for all p ∈ P , i.e., that the dimension of intersection
of the ℓ subspaces {Sbpu }u∈U obtained by varying the failed node
u ∈ U is the same, regardless of the parity node p ∈ P from which
the helper data originates.
By Lemma 4, ∀p, p′ ∈ P and uj ∈ U ,

Sbpuj Apuℓ+1 = Sbp′ uj Ap′ uℓ+1 . (8.4)

Since Aij are invertible for all i, j, equation (8.4) implies ∀p, p′ ∈
P:
   
ℓ ℓ
Sbpuj  Apuℓ+1 =  (8.5)
\ \
 Sbp′ uj  Ap′ uℓ+1 .
j=1 j=1

It follows then from non-singularity of the matrices Aij and equa-


tion (8.5) that dim( u∈U Sbpu ) is same for all p ∈ P . Now it
T

remains to prove the main inequality (8.2).

(b) (ℓ − 1)-fold Intersection of Repair Subspaces


We proceed similarly in the case of an (ℓ − 1)-fold intersection,
replacing ℓ by ℓ − 1 in (8.5). We will then obtain, ∀p, p′ ∈ P :
   
ℓ−1 ℓ−1
Sbpuj  Apuℓ =  (8.6)
\ \
 Sbp′ uj  Ap′ uℓ .
j=1 j=1

(c) Relating ℓ-fold and (ℓ − 1)-fold Intersections


Next consider the repair of the node uℓ . It follows from (8.6) that
for any p′ , p ∈ P :
    
ℓ ℓ−1
= Sbpuℓ Apuℓ
\ \ \
 Sbpuj  Apuℓ  Sbpuj  Apuℓ 
j=1 j=1
 
ℓ−1
(8.7)
\
⊆  Sbp′ uj  Ap′ uℓ .
j=1
8.1. Properties of Repair Subspaces 647

As a consequence of (8.7) it follows that for any p ∈ P :


 
r ℓ−1
!
(8.8)
M \ \
Sbpi u Api uℓ ⊆  Sbpuj  Apuℓ .
i=1 u∈U j=1

By Lemma 4, we must have that


 
Sbp1 uℓ Ap1 uℓ
Sbp2 uℓ Ap2 uℓ
 
 
 
rank  Sbp3 uℓ Ap3 uℓ = α. (8.9)
 
..

 

 . 

Sbpr uℓ Apr uℓ

It follows as a consequence that


r
= Fαq , (8.10)
M
Sbpi uℓ Api uℓ
i=1

and hence for every p ∈ P , we must have that


= {0}. (8.11)
\ M
Sbpuℓ Apuℓ Sbp′ uℓ Ap′ uℓ
p′ ∈P,p′ ̸=p

It follows from (8.11) that if we set


!
Wi := Api uℓ , 1 ≤ i ≤ r,
\
Sbpi u
u∈U

that
\  
= {0}. (8.12)
X
Wj Wi
1≤i,j≤r,i̸=j

Hence, since the Aij ’s are non-singular, from equations (8.8) and
(8.12) we can conclude that:
 
r
!
dim ≤ dim  (8.13)
X \ \
Sbpi u Sbpu  ,
i=1 u∈U u∈U \{uℓ }

for any p ∈ P , which is precisely the desired equation (8.2).


648 Lower Bounds on Sub-Packetization Level of MSR Codes

8.2 Lower Bound for Optimal-Access MSR Codes

In this subsection, we present the lower bound for the sub-packetization


level of optimal-access MSR codes due to Balaji and Kumar [11], that
k−1
makes use of Lemma 5. In [233], Tamo et al. showed that α ≥ r r
must hold in any systematic vector MDS code that is able to repair
in help-by-transfer (optimal-access) fashion, the k systematic nodes.
The lower bound from [11] that we derive below can be viewed as an
extension of this lower bound to the case of all node, optimal-access
repair.
In an optimal-access MSR code, the rows of the repair matrices are
picked from among the standard basis vectors {e1 , . . . , eα }. Lemma 5
above, presented an upper bound on the dimension of the intersection of
repair subspaces. This places an upper bound on the number of repair
subspaces that contain a fixed standard basis vector ei . There are only
α standard basis elements in all, and each of the (n − 1) repair matrices
{Sn1 , . . . , Sn(n−1) } contains β of them.
This leads to a lower bound on α by means of counting in two ways,
the inclusion relation between standard basis vectors and repair matrices
in the light of the fact that number of repair matrices which contains
a fixed standard basis element ei is upper bounded. The argument is
illustrated in Fig. 8.2.

Figure 8.2: The bipartite graph appearing in the counting argument used to prove
Theorem 4 is shown. Each node on the left corresponds to an element of the standard
basis {e1 , ..., eα }. The nodes to the right are associated to the repair matrices
Sn1 , ..., Snn−1 . An edge connecting the vector ei to node Snj is drawn if ei is a row
vector of the repair matrix Snj .
8.2. Lower Bound for Optimal-Access MSR Codes 649

Theorem 4 (Bound on sub-packetization level of an optimal-access MSR


code [11]). Let C be a linear optimal-access MSR code having parameter
set

{(n, k, d = (n − 1)), (α, β), B, Fq )} .

Then we must have:


n−1
α ≥ min{r⌈ r

, rk−1 }.

Proof. (a) Invariance of Repair Matrices to Choice of Generator Ma-


trix
We first observe that the repair matrices can be kept constant, even
if the generator matrix of the code changes. This is because the
repair matrices only depend upon relationships that hold among
code symbols of any codeword in the code and are independent
of the particular generator matrix used in encoding. In particular,
the repair matrices are insensitive to the characterization of a
particular node as being either a systematic or a parity node.

(b) Implications for the Dimension of the Repair Subspace


From Lemma 5, we have that
T 
dim u∈U \{uℓ } Spu
! b
dim
\
Sbpu ≤
u∈U
r
T 
dim u∈U \{uℓ ,uℓ−1 } Spu
b
≤ 2
 r
dim Sbpu1

rℓ−1
α α
= ℓ
= |U | . (8.14)
r r
Lemma 5 and its proof holds true for any set U ⊆ [n] of size
2 ≤ |U | ≤ (k − 1). As a result, equation (8.14), also holds for
any set U ⊆ [n] of size 2 ≤ |U | ≤ (k − 1). We would like to
extend the above inequality to hold even for the case when U is
replaced by a set F of size k ≤ |F | ≤ (n − 1). Since the repair
650 Lower Bounds on Sub-Packetization Level of MSR Codes

matrices and their associated subspaces are invariant to the choice


of generator matrix, from here onwards, we drop the distinction
between systematic and parity nodes. In place of using the labels
{u1 , u2 , · · · , uk }, {p1 , p2 , · · · , pr }, we will simply use the integers
from 1 to n to denote the n nodes. It will be convenient in the
argument, to assume that F is a collection of nodes having size
k ≤ |F | ≤ (n − 1) that does not contain the nth node.
Let us suppose that α < rk−1 . We will show that this assumption
n−1
leads to α ≥ r⌈ r ⌉ , thereby proving the Theorem. If α < rk−1
and F is of size (k − 1), we get:
!
α
dim < 1, (8.15)
\
Sbnu ≤
u∈F
rk−1

which is possible iff


!
dim = 0.
\
Sbnu
u∈F

But this would imply that


!
dim = 0.
\
Sbnu
u∈F

for any subset F of nodes having size (k − 1) ≤ |F | ≤ (n − 1).


We are therefore justified in extending the inequality in (8.14) to
the case when U is replaced by a subset F whose size now ranges
from 2 to (n − 1), i.e.,
!
α
dim (8.16)
\
Sbnu ≤
u∈F
r|F |

for any F ⊆ [n − 1] with 2 ≤ |F | ≤ (n − 1). A consequence of the


inequality (8.16) is that
!
dim ≥1
\
Sbnu
u∈F

implies that
|F | ≤ ⌊logr (α)⌋. (8.17)
8.3. Lower Bound for General MSR Codes 651

Thus any given non-zero vector can belong to at most ⌊logr (α)⌋
repair subspaces among the repair subspaces {Sbn1 , . . . , Sbnn−1 }.

(c) Counting in a Bipartite Graph


The remainder of the proof then follows the steps outlined in
[233]. We form a bipartite graph with the standard basis vectors
{e1 , ..., eα } as the set of left nodes and {Sn1 , ..., Snn−1 } as the set
of right nodes as shown in Fig. 8.2. We place an edge (ei , Snj )
in the edge set of this bipartite graph iff ei ∈ Sbnj . Now since
the MSR code is an optimal-access code, the rows of each repair
matrix Snj must all be drawn from the set {e1 , ..., eα }.
Counting the number of edges of this bipartite graph in terms of
node degrees on the left and the right, we obtain from (8.17):
α
α⌊logr (α)⌋ ≥ (n − 1) ,
r
(n − 1)
 
logr (α) ≥ ⌊logr (α)⌋ ≥ ,
r
(n − 1)
 
logr (α) ≥ ,
r
n−1
α ≥ r⌈ r

.
n−1
Thus we have shown that if α < rk−1 , we must have α ≥ r⌈ r

.
It follows that
n−1
α ≥ min{r⌈ r

, rk−1 }.

8.3 Lower Bound for General MSR Codes

We present in this subsection, a lower bound on sub-packetization


level that is exponential in the code dimension given by Alrabiah and
Guruswami [6]. Throughout this subsection we set U ∪ V = [1, k] and
P = [k + 1, n] and further we assume that the MSR codes discussed are
defined over the finite field Fq . We begin with a lemma from [233].
Lemma 6. [233] If there exists an {(n, k, d = (n − 1)), (α, β)} MSR
code with repair matrices {Spu : u ∈ [k], p ∈ [n] \ {u}}, then it is
652 Lower Bounds on Sub-Packetization Level of MSR Codes

possible to construct an {(n − 1, k − 1, d = (n − 2)), (α, β)} MSR code


with new repair matrices {Spu ′ : u ∈ [k − 1], p ∈ [n − 1] \ {u}} with

Su′ = Spu
′ , ∀u ∈ [k − 1], p ∈ [n − 1] \ {u} i.e., repair matrices of the new

MSR code do not depend on p.


A proof of the lemma can be found in [233]. We now present some
definitions.
Definition 5. Let {V1 , . . . , Vt } be a set of t, γ1 -dimensional subspaces
of Fαq and let {W1 , . . . , Wt } be a set of t, γ2 -dimensional subspaces of
Fαq . Let γ2 ≤ γ1 ≤ α. We define:
F(V1 → W1 , V2 → W2 , . . . , Vt → Wt )
= {ψ : ψ is a (α × α) matrix over Fq and Vi ψ ⊆ Wi , ∀1 ≤ i ≤ t},
I(V1 → W1 , V2 → W2 , . . . , Vt → Wt )
= dim(F(V1 → W1 , V2 → W2 , . . . , Vt → Wt )),
F(V1 , V2 , . . . , Vt )
= F(V1 → V1 , V2 → V2 , . . . , Vt → Vt ),
I(V1 , . . . , Vt )
= I(V1 → V1 , V2 → V2 , . . . , Vt → Vt ).
Lemma 7. [6] Let U1 , U2 , . . . , Us be s γ-dimensional subspaces of Fαq
such that ∩si=1 Ui = {0}. Then
s s
!
dim(Ui ) ≤ (s − 1)dim
X X
Ui .
i=1 i=1
Proof. We prove by induction. For the s = 2 case,
dim(U1 ) + dim(U2 ) = dim(U1 + U2 ) + dim(U1 ∩ U2 )
dim(U1 ) + dim(U2 ) = (2 − 1) × dim(U1 + U2 ).
We now assume the inequality holds for s = ℓ and prove it also holds
for s = ℓ + 1.
dim(U1 ) + dim(U2 ) + dim(U3 ) + . . . + dim(Uℓ+1 )
= dim(U1 + U2 ) + dim(U1 ∩ U2 ) + dim(U3 ) + . . . + dim(Uℓ+1 )
≤ dim(U1 + U2 ) + (ℓ − 1) × dim(U1 ∩ U2 + U3 + . . . + Uℓ+1 )
≤ ℓ × dim(U1 + U2 + U3 + . . . + Uℓ+1 ).
8.3. Lower Bound for General MSR Codes 653

Lemma 8. [6] For any {(n, k, d = n − 1), (α, β)} MSR code with repair
matrices {Spu : u ∈ [k], p ∈ [n] \ {u}} with Su = Spu for all u ∈ [k] and
p ∈ [n] \ {u} i.e., repair matrix Spu does not depend on p,
2r − 1
 
I(Sb1 , . . . , Sbt ) ≤ I(Sb1 , . . . , Sbt−1 ),
2r
where 1 ≤ t ≤ k.

Proof. Let p ∈ [k + 1, n]. From Lemma 4 and invertibility of Apt , the


following left multiplication is a vector-space isomorphism:

LA−1 : F(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) → F(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt ).
pt

where LA−1 (ψ) = A−1


pt ψ. This can be argued as follows . For i < t, by
pt
Lemma 4,

LA−1 (ψ)(S ci A−1 ψ = S


ci ) = S ci ψ ⊆ S
ci ,
pt
pt

and for i = t,

LA−1 (ψ)(S ct Apt A−1 ψ = S


ct Apt ) = S ct ψ ⊆ S
ct .
pt
pt

Hence,

I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt ).

Similarly left multiplication by inverse of Apt coupled with right multi-


plication by A(k+i)t is also an isomorphism. Hence we have,

I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt A(k+1)t ),
I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt A(k+2)t ).

Let,

Vpt = F(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt A(k+1)t ),


Wpt = F(Sb1 , Sb2 , . . . , Sbt−1 , Sbt Apt → Sbt A(k+2)t ).

This implies,
n n
2r × I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = dim(Vpt ) + dim(Wpt ).(8.18)
X X

p=k+1 p=k+1
654 Lower Bounds on Sub-Packetization Level of MSR Codes

From Lemma 4 (full-rank condition), we also have that


n
Vpt = F(Sb1 , Sb2 , . . . , Sbt−1 , Fαq → Sbt A(k+1)t ),
\

p=k+1
n
Wpt = F(Sb1 , Sb2 , . . . , Sbt−1 , Fαq → Sbt A(k+2)t ).
\

p=k+1

Again from Lemma 4 (full-rank condition), we have that Sbt A(k+1)t ∩


Sbt A(k+2)t = {0}. Hence,
   
n n
Wpt  = {0}. (8.19)
\ \ \
 Vpt  
p=k+1 p=k+1

Now applying Lemma 7 to equation (8.18) using the condition in (8.19):

n
X n
X
2r × I(Sb1 , Sb2 , . . . , Sbt−1 , Sbt ) = dim(Vpt ) + dim(Wpt )
p=k+1 p=k+1
 
n
X n
X
≤ (2r − 1) × dim  Vpt + Wpt 
p=k+1 p=k+1
 
≤ (2r − 1) × dim F(Sb1 , Sb2 , . . . , Sbt−1 )

= (2r − 1)I(Sb1 , . . . , Sbt−1 ).

Theorem 5. [6] For any {(n, k, d = (n − 1)), (α, β)} MSR code
k−1
α ≥ e 4r .
Proof. From Lemma 6, we can construct an {(n − 1, k − 1, d = (n −
2)), (α, β)} MSR code with repair matrices not depending on p. By
repeated application of Lemma 8 for this new derived MSR code,
2r − 1
 (k−1)
I(Sb1 , . . . , Sbk−1 ) ≤ α2 .
2r
Since we have the identity matrix which keeps all the subspaces Sbi
invariant, we have:
I(Sb1 , . . . , Sbk−1 ) ≥ 1.
8.3. Lower Bound for General MSR Codes 655

Hence we have
2r − 1
 k−1
α2 ≥ 1.
2r
By manipulation of the above inequality using log(1 + x) ≥ x
1+x for
x ≥ 0, we get the bound mentioned in the theorem.

Notes

1. Lower bounds on the sub-packetization level of a general MSR


code with d = n − 1: The lower bound on the sub-packetization
level of a general MSR code with d = (n − 1) given by
√ r

α ≥ eΩ (k−1) loge ( r−1 ) ,

appeared in [77]. This was improved in [107] to


√ r

Ω (k−1) loge ( r−1 ) loge (r)
α ≥ e ,

and then improved again to the result appearing in Theorem


5. Recently, the lower bound given in Theorem 5, was further
improved in two independent works [7], [9] to the following lower
bound:
(k−1)(r−1)
α ≥ e 2r 2 .

2. Optimal sub-packetization level of optimal-access MSR codes with


d = n − 1: Note that ⌈ n−1
r ⌉ = ⌈ r ⌉ except when n = 1 mod r.
n

Hence for n ̸= 1 mod r, the sub-packetization level of the code in


[200] and the CL-MSR code described in Section 5.3 matches with
the lower bound given in Theorem 4, thereby determining the
optimal sub-packetization level of optimal-access MSR codes for
the case d = n − 1. The case n = 1 mod r remains open however.

Open Problem 6. Determine the least possible sub-packetization


level of an optimal-access (n, k, d) MSR code for the case when
d = (n − 1) and n = 1 mod r.
656 Lower Bounds on Sub-Packetization Level of MSR Codes

3. The case d < n − 1: Lower bounds for the d < n − 1 case can be
derived by replacing r = n − k with s = d − k + 1 and replacing n
with d + 1 in the bounds presented in this section. This is because
we can puncture the MSR code, retain only d + 1 nodes, and then
apply the bounds in this section. For optimal-access repair, when
the choice of repair matrices does not depend on the identity of
the set of remaining d − 1 helper nodes used for repair, we have
the following lower bound presented in [11],
n−1
n o
α ≥ min s⌈ s

, sk−1 .

Despite this assumption concerning the repair matrices, there are


constructions in the literature [190], [239] of MSR codes which
satisfy this assumption and have sub-packetization level achieving
this lower bound.

Since the above tight bound on sub-packetization level of an optimal-


access MSR code for d < (n − 1) is under the assumption of repair
matrices that satisfy the helper-set-independent property, the following
problem is still open.

Open Problem 7. Determine the minimum possible sub-packetization


level of an optimal-access (n, k, d) MSR code for the case d < n − 1.

Open Problem 8. Determine the minimum possible sub-packetization


level of a general (n, k, d) MSR code.

Remark 7. Open Problems 6-8 are closely related to Open Problems


2-4. The difference is that in the earlier section, the focus was on code
construction. Here it is on determining the smallest possible value of
sub-packetization.
9
Variants of Regenerating Codes

9.1 MDS Codes that Trade Repair Bandwidth for Reduced


Sub-Packetization Level

In this subsection, we discuss vector MDS codes that do not have


minimum possible repair bandwidth and hence do not qualify to be
an MSR code. These codes nevertheless offer some savings in repair
bandwidth in comparison to the conventional repair of RS codes, while
keeping the sub-packetization level α to a small level. The piggybacking
framework introduced in [184] was one of the first such efforts. In [84],
the authors introduced a family of codes that offer a choice of sub-
packetization levels α = rp for 1 ≤ p ≤ nr over a field of size at least
 

n(r−1)α+1 , where r = n − k. The corresponding repair download from


each helper node is given by β = (1 + p1 )rp−1 . When p = nr these
 

codes coincide with the MSR code construction presented in [200].


In [194], a framework termed as the ϵ-MSR framework was intro-
duced, that enabled the construction of MDS codes that in exchange for
a small increase in repair bandwidth by a multiplicative factor (1 + ϵ),
offer in return, sub-packetization α that is impressively, logarithmic
in n. In [138], a generic transformation for deriving MDS codes hav-
ing low sub-packetization and near-optimal repair bandwidth, starting

657
658 Variants of Regenerating Codes

from an MSR code is presented. More recently in [44], the Diagonal


MSR code [255] was modified to obtain a vector MDS code having sub-
packetization level α = um+n−1 , where (n − k) = r = um for an integer
m. The repair bandwidth of this code is shown to be asymptotically
optimal for fixed r as n → ∞. We discuss some of these developments
below.

9.1.1 The Piggybacking Framework


The piggybacking framework introduced by Rashmi et al. [184] begins
with a collection of α codewords drawn from an MDS code and proceeds
to modify the code symbols as described below. Let C be an MDS
code. Each individual code symbol can be regarded as a function of
the message and let (f1 (u), f2 (u), . . . , fn (u)) represent the codeword
corresponding to message u. Next, consider codewords of C correspond-
ing to α distinct messages, {u1 , . . . , uα }. The α code symbols {fj (ui ),
i = 1, 2, · · · , α} thus correspond to node j. The code is first modified
by adding a function gij (u1 , . . . , ui−1 ) to the jth symbol of the ith
codeword fj (ui ), for all i ∈ {2, . . . , α}, j ∈ {1, . . . , n}. The values so
added are termed as piggybacks. Clearly, this modification does not
affect our ability to decode the code, if the codewords are decoded in
sequence starting with i = 1. Applying an invertible linear transform
Tj to the α code symbols contained in the jth node, similarly does not
affect our ability to decode the α codewords, nor a node’s ability to
serve as a helper node. By carefully choosing the piggybacking functions
and the invertible linear transformations, it turns out that it is possible
to reduce the repair bandwidth for the collective repair of the α MDS
codewords, in comparison with the repair bandwidth needed for the
conventional repair of α MDS codewords.
Three families of piggybacking-based MDS codes with reduced repair
bandwidth and disk read are constructed in [184]. The piggybacking
framework typically provides bandwidth savings between 25% to 50%
over the conventional decoding of MDS codes.
We present in Fig. 9.1 an example of a code that illustrates the
piggybacking principle. This code is the modification of a systematic
[4, 2] MDS code with sub-packetization level α set equal to 2 in such a
9.1. MDS Codes that Trade Repair Bandwidth 659

a1 b1 a1 b1 a1 b1
a2 b2 a2 b2 a2 b2
a 1+ a 2 b 1+ b 2 a 1+ a 2 b 1+ b 2 a 1+ a 2 b 1+ b 2
a1+ 2a2 b1+ 2b2 a1+ 2a2 b1+ 2b2+ a1 2a2- 2b2- b1 b1+ 2b2+ a1

X
a1 X
b1 a1 b1
a2 b2 a2
X b2
X
a 1+ a 2 b 1+ b 2 a 1+ a 2 b 1+ b 2
2a2- 2b2- b1 b1+ 2b2+ a1 2a2- 2b2- b1 b1+ 2b2+ a1

Figure 9.1: In this example, two codewords of a systematic [4,2] MDS code over the
finite field F5 , are piggybacked and appear as columns in the upper-left table. Each
row represents the contents of one of the 4 nodes. The piggyback modification results
in the code shown on the upper right. The tables in the bottom row correspond to
failure of the first and second systematic nodes. The first systematic node can be
repaired by reading {b2 , (b1 + b2 ), (b1 + 2b2 + a1 )}, (shown in blue), the second by
reading {b1 , (b1 + b2 ), (2a2 − 2b2 − b1 )}.

way that the systematic nodes can be repaired by reading 3 symbols


(instead of the 4 symbols customarily required for MDS decoding),
resulting in a 25% savings in repair bandwidth and disk reads.

9.1.2 The ϵ-MSR Framework


An (n, k, α)F ϵ-MSR code is an [n, k] vector MDS code over Fα having
the additional property that any failed node i ∈ [n], can be repaired
by downloading ≤ (1 + ϵ) n−k α
symbols from each of the remaining
(n − 1) nodes. Thus the number of helper nodes equals (n − 1) in this
construction. It is shown by Rawat et al. in [194], [195] that it is possible
to construct an ϵ-MSR code with α = O(log n) for all ϵ > 0. The ϵ-MSR
code construction technique involves combining a short block-length
MSR code with a code having large minimum distance.
Let CI be an (n = k + r, k, d = n − 1) MSR code with sub-
packetization level α over a finite field F having p-c matrix
 
A0,0 A0,1 ... A0,n−1

A1,0 A1,1 ... A1,n−1 
H = 
 
.. .. .. .. ,
. . . .
 
 
Ar−1,0 Ar−1,1 . . . Ar−1,n−1
where the sub-matrices Ai,j are of size (α × α).
660 Variants of Regenerating Codes

Let CII be a second (not necessarily linear) code having block length
N , size M and minimum distance D = δN over an alphabet A of size
|A| ≤ n that we identify with a subset A ⊆ [0, n − 1]. We associate with
every codeword c = (c1 c2 · · · cN ) of CII , an (rN α × N α) matrix:

u1,c Diag(A0,c1 , . . . , A0,cN )


 

Hc = 
 .. 

. .

ur,c Diag(Ar−1,c1 , . . . , Ar−1,cN )

Here the {ui,c }, are codeword-dependent, non-zero coefficients, drawn


from F. Next, we form an (rN α × M N α) matrix H by horizontally
stacking the M matrices Hc corresponding to M codewords in CII . It can
be shown that the code having H as its p-c matrix, is an (M, M −r, N α)F
ϵ-MSR code, where ϵ = (r − 1)(1 − δ). Ensuring this requires judicious
selection of the base MSR code CI as well as the non-zero scalars {ui,c }.
An additional requirement is that for a given ϵ > 0, the code CII
should be chosen such that the parameter δ satisfies δ ≥ 1 − r−1 ϵ
. The
ϵ-MSR codes constructed using this approach can be made to have
sub-packetization level scaling logarithmically in the block length.
In [194], an ϵ-MSR code construction is provided, in which the
Diagonal MSR code constructed in [255], is used as the building block.
An example construction is described below.
Let CI be chosen to be an (n = 3, k = 1, d = 2, α = 23 = 8)
Diagonal MSR code. Let CII be chosen to be a code with (N = 20, M =
27, D = 13) over F3 . Using these two codes, one can construct an
(M = 27, M − r = 25, N α = 160) ϵ-MSR code with ϵ = 0.35. Note
that the (n = 27, k = 25, d = 26) Diagonal MSR code requires a
sub-packetization level of 227 , whereas this ϵ-MSR code has a sub-
packetization level of 160 (≪ 227 ) and repair bandwidth that is no more
than 1.35 times that of the Diagonal MSR code.

9.1.3 Li-Liu-Tang Transformation


In [138], a generic transformation that makes use of MSR codes to build
vector MDS codes having near-optimal repair bandwidth and small
sub-packetization level is presented. Four different vector MDS codes
are obtained by applying this transformation to various MSR codes
9.1. MDS Codes that Trade Repair Bandwidth 661

known in the literature. A fifth MDS code construction is also presented,


that does not make use of the generic transformation.
The idea behind the generic transformation can be traced back
to [229]. The p-c matrix H ′ of an (n′ , k ′ = n′ − r, d = n′ − 1) MSR
code having sub-packetization level α can be expressed in the following
block-matrix form
A′0,0 A′0,1 A′0,n′ −1
 
...
 A′1,0 A′1,1 ... A′1,n′ −1 
H′ = 
 
.. .. .. .. ,
. . . .
 
 
A′r−1,0 A′r−1,1 . . . A′r−1,n′ −1

where each A′i,j is an (α × α) matrix. For the generic transformation


to work each A′i,j is required to be non-singular. There are MSR code
constructions in the literature that satisfy this requirement, for example,
the Diagonal MSR code presented in [255].
Under the generic transformation, one passes on to a code hav-
ing larger block length n = sn′ , while maintaining the same sub-
packetization level α. The p-c matrix H of the code having the block
length n = sn′ takes on the form:
 
A0,0 A0,1 ... A0,n−1

A1,0 A1,1 ... A1,n−1 
H =
 
 .. .. .. .. ,
. . . .
 
 
Ar−1,0 Ar−1,1 . . . Ar−1,n−1
with

Ai,j = xi,j A′i,(j (mod n′ )) , ∀i ∈ [0, r − 1], j ∈ [0, n − 1],

where {xi,j } are indeterminates. It can be shown using the Combinatorial


Nulstellensatz [5] that over a sufficiently large finite field, there exists
an assignment of values to the {xi,j } under which this code is an
[n, k = n − r] vector MDS code. Under this argument, the requirement
placed on the field size q is

n−1
!
q > α + 1.
r−1
662 Variants of Regenerating Codes

For repair of node i, (s − 1) nodes with indices j ̸= i such that j = i


(mod n′ ) send α symbols whereas the remaining (n − s) nodes send
β = αr symbols resulting in the repair bandwidth of
α α α (s − 1)(r − 1)
   
(n − 1) + (s − 1) α − = (n − 1) 1+ .
r r r (n − 1)
 
The term 1 + (s−1)(r−1)
(n−1) represents the factor by which the new code
has larger bandwidth in comparison with an MSR code. Thus this
method can substantially reduce the sub-packetization level while keep-
ing the increase in repair bandwidth to a manageable level since the
factor (s−1)(r−1)
(n−1) is < 1.
In [138], the authors apply this simple generic transformation to four
MSR codes: the Diagonal MSR code [255], the Permuted-Diagonal MSR
code [255], an optimal-access MSR code in [148] and the CL-MSR code
[137], [205], [256]. This yields four vector MDS codes {C1 , C2 , C3 , C4 }
respectively, that offer significantly reduced sub-packetization level for
a modest increase in repair bandwidth. The drawback here is the large
field-size requirement given by q > α n−1 r−1 + 1. For codes C1 , C2 and


C3 , the field size requirement is reduced to q > rn′ ⌈ rs ⌉ with r|(q − 1),

q > r⌈ nr ⌉(s − 1) + n′ and q > sr respectively, by identifying a specific
assignment of the {xi,j } as opposed to appealing to the Combinatorial
Nullstellensatz.
The fifth vector MDS code C5 in [138] also has same repair bandwidth
as the four codes described above. This code is constructed directly
without using the transformation and draws upon the form of the
Diagonal MSR code [255]. The structure of C5 is similar to C1 .
Open Problem 9. Characterize the tradeoff between repair bandwidth,
sub-packetization level and field size for the general class of vector MDS
codes.

9.2 Fractional Repetition Codes

Fractional repetition codes introduced by El Rouayheb and Ramchan-


dran in [59] may be regarded as a generalization in a certain direction,
of the polygonal MBR code presented in Section 4.1 having the RBT
property.
9.2. Fractional Repetition Codes 663

A fractional repetition code is associated with a parameter set


{n, k, d, ρ, B}. Let B be the size of file to be stored using the fractional
repetition code. The B message symbols are first encoded using a scalar
[N, B]q MDS code C, called the outer code, to obtain a codeword c ∈ C.
Each of the N code symbols in c is then replicated ρ ≥ 2 times. The
ρN symbols thus obtained, are stored across n nodes in such a way that

• Each node stores d code symbols of c

• Each code symbol of c is stored in exactly ρ nodes.

Clearly, for this to be possible, we need N ρ = nd. Also, the sub-


packetization level α = d. The assignment of code symbols to nodes in
fractional repetition codes in accordance with the above requirements,
can be accomplished with the aid of combinatorial designs such as Steiner
system. In [67], the author identifies necessary and sufficient conditions
for the existence of fractional repetition codes. The parameter k is the
smallest integer such that the B message symbols can be retrieved from
any set of k nodes.
A key difference between an MBR code and a fractional repetition
code is that in an MBR code, any collection of d nodes can be selected
as helper nodes for the repair of a failed node. In contrast, a fractional
repetition code only guarantees the existence of at least one set of d
helper nodes that enable RBT of a failed node. This is also referred to in
the literature as table-based repair. Given the ρ-wise replication of code
symbols from the scalar code, it follows that in a fractional repetition
code, RBT of up to ρ − 1 simultaneous node failures is possible.
We now present an example fractional repetition code construction
[59] with parameters {n = 6, k = 3, d = 3, ρ = 2, B = 7}. The outer
code here is an [N = 9, B = 7] MDS code C and hence there are 9
code symbols {c1 , c2 , . . . , c9 } in each codeword of C. Each of the n = 6
lines in Fig. 9.2 indicates a node. The α = 3 points lying on a line
denote the code symbols stored in the corresponding node. For example,
symbols {c1 , c4 , c7 } are stored in the node corresponding the straight
line connecting the three points. Each code symbol or point, lies at the
intersection of two lines, resulting in ρ = 2. It can be easily verified
that any collection of k = 3 nodes contain at least B = 7 distinct
664 Variants of Regenerating Codes

C1 C4 C7

C2 C5 C8

C3 C6 C9

Figure 9.2: An example of a fractional repetition code having parameters {n =


6, k = 3, d = 3, ρ = 2, B = 7}. The code symbols (ci , i = 1, 2, · · · , 9) form a codeword
in a scalar [9, 7] MDS code. Each straight line represents a node and the points
(encircled code symbols) represent the contents of the node. Thus α = 3 as three
points lie on each line. We have n = 6 as there are 6 lines in all and ρ = 2 since each
point lies at the intersection of two lines.

code symbols, from which the file can be retrieved. Next, suppose one
of the nodes has failed. There are d = 3 lines intersecting the line
corresponding to the failed node in a point and these d = 3 nodes
will serves as helper nodes. The failed node can be repaired by just
downloading one code symbol each from d = 3 helper nodes.
Note that an MBR code with identical parameters, i.e., (n = 6, k =
d = 3, β = 1), can only store a file of size (dk − k2 )β = 6, whereas


this fractional repetition code has file size B = 7. Thus the relaxation
in code-design requirement arising from permitting restricted choice
of helper-node sets has allowed, in this case, a fractional repetition
code to store a larger number of message symbols in comparison to the
corresponding MBR code. An upper bound on file size of fractional
repetition codes is derived in [59] and constructions achieving this bound
for some parameters are presented in [221].
In [125], the authors study fractional repetition codes that have sub-
packetization α much larger than the replication degree ρ. A randomized
version of fractional repetition codes can be found in [170]. Different
generalizations of fractional repetition codes have been studied in the
literature, including those in [81], [165], [264].
9.3. Cooperative Regenerating Codes 665

In related work [4], the (n, k, d) parameter range over which table-
based repair results in a strictly-improved, storage-repair-bandwidth
tradeoff when compared with the corresponding tradeoff that applies to
an FR RGC having the same (n, k, d) parameters, is characterized.

9.3 Cooperative Regenerating Codes

Two approaches have been adopted in the RGC literature to handle the
case when t > 1 nodes fail simultaneously. Under centralized-repair, a
single repair center downloads helper data from a set of d helper nodes
and uses this data to determine the contents of the t replacement nodes.
In the case of an [n, k] vector MDS code with sub-packetization α, the
least amount of data download required from d helper nodes for the
simultaneous repair of t failed nodes under centralized-repair [31] is
given by
αtd
d−k+t
and codes achieving this can be found described in [255]. The FR
storage repair bandwidth tradeoff under centralized-repair of multiple
node failures is explored in [97], [191], [265].
An alternate method of repairing multiple failed nodes simultane-
ously is cooperative-repair, under which there is a separate repair center
for each replacement node. The repair centers are permitted to exchange
data. The potential benefit of allowing such data exchange was first
investigated by Hu et al. in [100]. As with an RGC, in a cooperative
RGC, each of the n nodes store α symbols and the contents of any k
nodes are sufficient to reconstruct the stored data file of size B.
The cooperative-repair of t node failures takes place in two phases.
In the first phase, each of the t replacement nodes selects a set of d
helper nodes and downloads β1 symbols from each of them. In the
second phase, every replacement node downloads β2 symbols from each
of the other (t − 1) replacement nodes. Hence, the repair bandwidth
per node is given by

γ = dβ1 + (t − 1)β2 .
666 Variants of Regenerating Codes

The minimum storage cooperative regenerating (MSCR) point and


minimum bandwidth cooperative regenerating (MBCR) point are de-
termined in [118] and [218], and are given by:
 B B(d + t − 1) 
(αMSCR , γMSCR ) = , ,
k k(d + t − k)
B(2d + t − 1)
αMBCR = γMBCR = .
k(2d + t − k)
Note that when t = 1, these reduce to the corresponding points for
single node repair. A cooperative RGC operating at the MSCR point is
once again an MDS code.
The entire storage versus repair bandwidth per node tradeoff curve
under FR is derived in [218]. In the case of exact repair, explicit con-
structions of cooperative RGCs for all parameters at the MBCR point
are presented in [244] and at the MSCR point in [257]. In [146], the
cooperative-repair model was extended to a partial collaboration model
under which, during the second phase of node repair, a replacement
node exchanges β2 symbols with (t − s) other replacement nodes, where
1 ≤ s ≤ t. The security of cooperative RGCs is investigated in [106],
[126].

9.4 Secure Regenerating Codes

Three secrecy models in the context of an RGC were introduced by


Pawar et al. in [169]:

• A passive eavesdropper model, where the eavesdropper can read


the content and repair data of any ℓ < k nodes, but cannot modify
the content of these nodes,

• An active omniscient adversary model, where the adversary knows


the data stored in all the nodes, and can modify the content of b
nodes where 2b < k,

• An active limited-knowledge adversary model, where the adversary


can read content and repair data of ℓ < k nodes and can modify
the content of b ≤ ℓ nodes among them.
9.4. Secure Regenerating Codes 667

Both passive eavesdropper and active adversary settings are associated


to notions of capacity as described below. In the passive eavesdropper
setting, the secrecy capacity Bs is defined to be the maximum amount of
information that can be stored without any information being revealed
to the eavesdropper. In the active adversary setting, the resiliency
capacity Br is defined to be the maximum amount of information that
can be stored in such a manner that it can be reliably made available
to a legitimate data collector, despite tampering by the adversary, of
the data contained in b nodes.
The following upper bound on secrecy capacity of the passive eaves-
dropper model was derived in [169]:
k
min{(d − i + 1)β, α}. (9.1)
X
Bs ≤
i=ℓ+1

For the case when α is unconstrained, i.e., α > (d − ℓ)β, the resultant
bandwidth-limited secrecy capacity Bs,BL is determined in [169] for
the case d = (n − 1), where a bound and matching construction are
presented. It was also shown that the resiliency capacity satisfies
k
min{(d − i + 1)β, α},
X
Br ≤
i=i0

where the lower limit i0 is equal to 2b + 1 in the omniscient case and to


b + 1 in the limited knowledge case.
In an alternate setting, Rashmi et al. in [181] assume a noisy channel
for the transmission of data during repair and reconstruction, and
introduce the notion of an (s, t)-resilient RGC that can correct up to t
errors and s erasures during both repair and data collection. The model
is aligned with the active adversary model where the adversary can
tamper with the contents of b nodes. The capacity or file size B of an
(s, t)-resilient RGC is shown to satisfy
κ
min{(∆ − i + 1)β, α}
X
B≤
i=1

where ∆ := (d − 2t − s) and κ := (k − 2t − s). Constructions of MSR


and MBR codes that are (s, t)-resilient are also provided in [181]. In
668 Variants of Regenerating Codes

[255], the authors extend this model to the repair of multiple nodes
and provide MSR constructions that are resilient to t errors during the
repair process.
In [182], the authors extend the passive eavesdropper model to the
setting where out of the ℓ nodes accessed, the eavesdropper can read
the contents of ℓ1 nodes and can observe the information passed on for
the repair of ℓ2 = ℓ − ℓ1 nodes. The upper bound in (9.1) also holds
for this case. In the case of an MBR code, since the amount of data
stored equals the amount of data received for node repair, the breakup
of ℓ between ℓ1 , ℓ2 is immaterial. This is not true in the case of an
MSR code where dβ > α. In [182], the authors provide secure MBR
code constructions matching the upper bound in (9.1) for all possible
parameters. A secure, low-rate MSR code construction that achieves
the upper bound (9.1) for ℓ2 = 0 is also presented in [182]. This secure
MSR construction establishes a lower bound to the secure file size of an
MSR code: Bs ≥ (k − ℓ)(α − ℓ2 β) for ℓ2 > 0.
The upper bound on secure MSR file size Bs ≤ (k − ℓ)α given by
(9.1) is improved in [75], [105], [187], [235]. In [188], the authors establish
that the secrecy capacity of an MSR code with d = n − 1 is given by
 1 ℓ2
Bs = (k − ℓ) 1 − α
n−k
by providing a secure MSR construction matching the upper bound
on secure file size given in [75]. The authors of [177] extended this
construction to determine the secrecy capacity of MSR codes with
d < n − 1, for the ℓ1 = 0 case. In [113], secure MSR codes having smaller
field size are constructed for all parameters. In [120], [121], [216], [253]
the ER tradeoff for secure RGCs is studied.

9.5 Rack-Aware Regenerating Codes

The storage nodes in a data center are typically organized into racks
that contain an equal number of nodes. The communication between
nodes within a rack is less expensive than cross-rack communication.
With this in mind, rack-aware regenerating codes (RRGCs) [99] focus
on minimizing the number of symbols that are exchanged across racks
during node repair.
9.5. Rack-Aware Regenerating Codes 669

In an RRGC, the n nodes are divided into r racks, such that each
rack contains nr nodes, where n is a multiple of r. Each node continues
to store α symbols. The data file of size B stored using an RRGC must
be retrievable from any k nodes, as in the case of an RGC. For the repair
of a failed node in an RRGC, the replacement node is given access to
the entire content of all the nodes belonging to the same rack, as well as
to an additional set of dβ symbols, obtained by downloading β symbols
from each of d other, helper racks. The β symbols downloaded from any
such helper rack can be a function of the entire content of that helper
rack. Communication between nodes lying within the same rack does
not count towards the repair bandwidth, so that the aim in node repair
in the RRGC setting, is to minimize the quantity dβ, referred to as
the cross-rack repair bandwidth. The FR storage-bandwidth tradeoff
for RRGCs was characterized in [95]. The minimum storage rack-aware
(MSRR) and minimum bandwidth rack-aware regenerating (MBRR)
points are given by,
B B 
(αMSRR , βMSRR ) = , ,
k k(d − m + 1)
dB
αMBRR = dβMBRR = ,
kd − m(m−1)
2

where m = ⌊ krn ⌋. Explicit ER constructions of MSRR codes for all


parameters can be found in [41] and MBRR codes in [263]. There
are other rack-aware models that have been studied in the literature,
including those in [171], [224].
10
Locally Recoverable Codes

10.1 Background

While an RGC aims at minimizing the repair bandwidth, the principal


aim in the case of a locally recoverable code (LRC) is on keeping to
a small number the number of helper nodes contacted for repairing a
failed node, termed the repair degree.
Several papers have appeared in the literature introducing the con-
cept of locality in an error-correcting code from slightly different per-
spectives. These include the paper on subline coding by Han and Lastras-
Montaño [87], the paper on pyramid codes by Huang et al. [102], the
paper by Oggier and Datta [164] on self-repairing homomorphic codes
and the paper by Gopalan et al. [72] presenting a comprehensive treat-
ment of codes with locality. With the exception of [87], the focus in all
the above papers is on linear codes. Apart from [87], the list of early
papers containing a treatment of nonlinear LRCs include the papers by
Papailopoulos and Dimakis [168], Forbes and Yekhanin [68] and Tamo
and Barg [228].
We begin this section with a brief discussion on nonlinear LRCs
before going on to treat the case of linear codes in greater detail.

670
10.2. Nonlinear LRCs 671

10.2 Nonlinear LRCs

Definition 6 (Nonlinear LRC with All-Symbol Locality). A code C of block


length n and size M over an alphabet Aq of size q is said to be an
(n, M ) code with (all-symbol) locality (r, δ) if associated to every code
symbol ci , i = 1, 2, · · · , n, of a codeword c = (c1 , · · · , cn ) ∈ C, there is
a set Si ⊆ [n] of size ni := |Si | ≤ (r + δ − 1) such that the restriction
Ci := C|Si of C to Si is a code of block length ni and minimum distance
≥ δ. The code Ci := C|Si , is called the local code associated with code
symbol ci .
It is typically assumed that the size M is of the form M = q k , so
that the LRC can be viewed as encoding a set of k message symbols
over the alphabet Aq .
Theorem 6. Let C be an (n, M ) LRC having (r, δ) locality and of size
M = q k over an alphabet Aq of size q. Then the rate and minimum
distance of the LRC are respectively upper bounded by
k r
≤ ,
n r+δ−1
k
  
dmin ≤ (n − k + 1) − − 1 (δ − 1).
r
A proof of the above theorem is given in Section 10.8. Alternative
proofs for the case δ = 2 can be found in [68], [168], [228].

10.3 Linear LRCs

The early study of LRCs in the linear case was mostly centered on sys-
tematic linear codes, where only the message symbols were guaranteed
to be repaired with low degree. These codes were accordingly termed
as codes with information-symbol locality. The study was subsequently
expanded to include all-symbol locality codes, i.e., LRCs where it was
possible to repair all the code symbols with low repair degree. In this
section, we begin with information-symbol locality before moving on to
discuss all-symbol locality.
The original treatment in [72] was for the case when the local codes
have minimum distance δ = 2, corresponding to single-parity-check
672 Locally Recoverable Codes

codes. The practical usage of LRCs in the Azure code, described in


Section 10.6, also involves local codes having minimum distance δ = 2.
However, we state and prove the bounds on dmin and code rate nk here
for the more general case δ ≥ 2 appearing in [172], as the proof technique
remains the same.

Definition 7 (Linear LRC with Information-Symbol Locality). An [n, k]


systematic, linear code C is to be an (r, δ) LRC with information-symbol
locality if associated to every message symbol ui , 1 ≤ i ≤ k, there is a
set of ℓ other code symbols (ci1 , ci2 , · · · , ciℓ ) with ℓ ≤ r + δ − 2 such
that the set of ℓ + 1 code symbols (ui , ci1 , ci2 , · · · , ciℓ ) forms a code C i
of block length = ℓ + 1 and minimum distance ≥ δ. We will refer to C i
as a local code associated to message symbol ui .

The reason for regarding an (r, δ) LRC as having locality r can be


seen from the following. If (δ − 1) symbols from a local codeword are
erased, one is left with (ℓ + 2 − δ) unerased symbols. On the other hand,
in any [n, k, dmin ] linear code, all message symbols can be recovered from
any collection of (n − dmin + 1) code symbols. In the case of a local code
of block length (ℓ + 1) and minimum distance ≥ δ, this works out to
≤ (ℓ + 2 − δ). Further, since ℓ + 1 ≤ (r + δ − 1), we have (ℓ + 2 − δ) ≤ r.
Thus even in the presence of (δ − 1) erasures, each local code is always
guaranteed to be able to recover the local codeword from any r or less
of the remaining code symbols.

Remark 8. We make the following additional observations.

1. The minimum distance dmin of the LRC is ≥ δ. This follows from


noting that the minimum Hamming weight of codewords in a local
code C i is ≥ δ, hence the same is true of a codeword in the LRC.

2. It is possible for a given local code to be associated to more than


one message symbol and conversely, a given message symbol can
be associated to more than one local code.

3. By the Singleton bound, the minimum distance of a local code


cannot be larger than 1 plus its redundancy. It follows that (i) the
redundancy of a local code C i of minimum distance ≥ δ must be
10.4. Bounds on dmin and Rate for Linear LRCs 673

≥ (δ − 1) and consequently that (ii) dimension can be no larger


than r + δ − 1 − (δ − 1) = r.


4. If the local code has minimum distance δ, it can have redundancy


equal to (δ − 1) iff the code is MDS. If the local code has minimum
distance δ and is an MDS code, then the dimension can equal r
iff the length (ℓ + 1) of the local code is equal to (r + δ − 1).

10.4 Bounds on dmin and Rate for Linear LRCs

We now present an upper bound on the minimum distance of a linear


(r, δ) LRC. This bound was derived for the case δ = 2 of primary interest
by Gopalan et al. in [72]. The extension to the case of general δ appears
in [172]. We begin with a useful Lemma.
Lemma 9. Let C be an [n, k, dmin ] linear code over a finite field Fq . Let
h i
G = g1 g2 . . . gn
be a generator matrix for C. Let s be the largest possible integer such
that there exists a subset S ⊂ [n] of size |S| = s such that the (k × s)
sub-matrix of G associated to the columns whose indices lie in S has
rank = k − 1. Then dmin = n − s.
Proof: Given a subset S = {i1 , i2 , . . . , iℓ } ⊆ [n], we will mean by
G|S , the sub-matrix of G given by
h i
G|S = gi gi . . . gi ,
1 2 ℓ

and refer to G|S as the restriction of G to S.


Let S ⊂ [n] be of largest possible size s such that rank(G|S ) = k − 1.
Then there exists u ∈ Fkq , u ̸= 0, such that ut G|S = 0t . Let ct = ut G.
Clearly, ct ̸= 0t and wH (ct ) ≤ n − |S|. It follows that dmin ≤ n − |S| =
n − s.
Next, let c ∈ C have minimum Hamming weight, i.e., wH (c) = dmin .
Let T ⊆ [n] be the support of c and set S = [n] \ T . The S has size
|S| = n − dmin . Let u ∈ Fkq be the message vector associated to c. Then
clearly, ut ̸= 0t and ut G|S = 0t . It follows that
s ≥ n − dmin =⇒ dmin ≥ n − s.
674 Locally Recoverable Codes

Hence dmin = n − s. □

Theorem 7. The minimum distance dmin of an [n, k, dmin ] (r, δ) LRC


with information-symbol locality, is upper bounded by:
k
  
dmin ≤ (n − k + 1) − − 1 (δ − 1). (10.1)
r
Proof: Let {u1 , u2 , · · · , uk } denote the k message symbols. Let G be
a (k × n) generator matrix for the LRC. Let {Si ⊆ [n] | 1 ≤ i ≤ k}, be
subsets of indices such that for each i, (cj , j ∈ Si ) is a local code of
length ≤ r + δ − 1 and minimum distance ≥ δ that contains the message
symbol ui . The subsets are not necessarily distinct. The first step in
our proof is achieving the following goal.
Goal: Use the subsets Si to construct a set T ⊆ [n] of large size
such that rank(G|T ) = k − 1. By Lemma 9, this will establish that
dmin ≤ n − |T |. Thus in the proof below, we will regard each set Si as a
set of column indices associated to the generator matrix G of the LRC.
We will construct the set T recursively and begin with T0 = ϕ.
Assuming that we have not stopped at the end of iteration a, we will
have obtained at the end of iteration a, a set Ta of the form Ta = aj=1 Sij
S

with rank(G|Ta ) < (k − 1). We will abuse notation and regard the empty
set as also having such a representation. We begin iteration (a + 1) by
picking an index ia+1 such that

rank(G|P ) > rank(G|Ta ) where P = Ta ∪ Sia+1 . (10.2)

Having selected such an index ia+1 and having set P = Ta ∪ Sia+1 , we


proceed as follows:

1. If rank(G|P ) < (k − 1), we set Ta+1 = Ta ∪ Sia+1 and continue the


recursion,

2. If rank(G|P ) = (k − 1), we stop, set T = Ta+1 = P and the flag


to J = 0.

3. If rank(G|P ) = k, we delete some elements from Sia+1 to obtain a


set Ŝia+1 with rank(G|P̂ ) = (k − 1) where P̂ = Ta Ŝia+1 . Clearly,
S

this can always be done. We then set T = P̂ and stop. We set the
flag to J = 1.
10.4. Bounds on dmin and Rate for Linear LRCs 675

Case (i) Suppose we exited the recursion at the end of the (a + 1)th
iteration with flag J = 0. In this case, T = Ta+1 = a+1 j=1 Sij . The
S

inclusion of each set Sij can increase the rank by at most r since each
local code has dimension at most r. Therefore,
k−1
 
(a + 1) ≥ .
r
Next, we claim that each set Sij , j = 1, 2, · · · , a + 1, brings in additional
column indices associated to at least (δ − 1) redundant columns, i.e.,
indices associated to columns that do not contribute to an increase in
rank. This can be explained as follows. Let

∆ℓ = rank G|Tℓ − rank G|Tℓ−1 , ℓ = 1, 2, · · · , a + 1.


 

Choose a subset Uiℓ ⊆ Siℓ of size |Uiℓ | = ∆ℓ − 1 such that if

Vℓ = Uiℓ ∪ Tℓ−1 ,

then

rank G|Vℓ = rank G|Tℓ − 1.


 

This is clearly possible. (If ∆ℓ = 1, Uiℓ can be chosen to be the empty


set ϕ). Clearly, we can also write Tℓ = Vℓ ∪ Siℓ . Since

rank G|Tℓ > rank G|Vℓ ,


 

we claim that

| Siℓ \ Vℓ | ≥ δ,

i.e., that while we have increased the rank by 1, we have increased the
number of column indices by a quantity ≥ δ. In this way, there are
always (δ − 1) column indices associated to redundant columns that are
added at every step.
The justification for the claim is as follows: in any [n, k, dmin ] code
A, any (k × m) submatrix of a (k × n) generator matrix GA for A, must
have rank k if m ≥ n − dmin + 1. Thus if we partition the column indices
of GA according to [n] = B1 ∪ B2 , B1 ∩ B2 = ϕ, then

rank(GA ) > rank(GA |B1 )


676 Locally Recoverable Codes

is possible iff |B2 | ≥ dmin .


It follows that
|T | ≥ (k − 1) + (a + 1)(δ − 1)
k−1
 
≥ (k − 1) + (δ − 1).
r
We thus have
k−1
 
dmin ≤ n − |T | ≤ n − {(k − 1) + (δ − 1)}
r
k−1
 
∴ dmin ≤ (n − k + 1) − (δ − 1). (10.3)
r
Case (ii) Suppose we exited the recursion at the end of the (a + 1)th
iteration and flag J = 1. We then have,
k
 
(a + 1) ≥ and rank(G|Ta ) < k − 1.
r
We also have T = Ta Ŝia+1 , with rank(G|T ) = (k − 1). We can now
S

apply our earlier arguments about increasing the size of the column
index set by at least (δ − 1) at each of the first a steps. Since we
have replaced Sia+1 by Ŝia+1 , we cannot assert that this last step has
introduced any column indices associated to redundant columns at all.
Thus we can only assert that
|T | ≥ (k − 1) + a(δ − 1)
k
  
≥ (k − 1) + − 1 (δ − 1).
r
This gives us
k
  
∴ dmin ≤ (n − k + 1) − − 1 (δ − 1). (10.4)
r
l m l m
Claim: k−1
r ≥ k
r − 1. This can be seen by verifying that
k k−1
   
−1 = , ∀k = ar + b, 0 ≤ b ≤ (r − 1).
r r
Thus the RHS of (10.4) is larger than the RHS of (10.3). Thus (10.4)
is the desired upper bound on dmin since we can always be sure that
dmin satisfies the upper bound given by (10.4). □
10.4. Bounds on dmin and Rate for Linear LRCs 677

Remark 9. Thus in comparison with an MDS code having the same


block length and dimension, we see that the penalty to be paid for
requiring locality is
k
  
− 1 (δ − 1).
r
We note that equation (10.1) reduces in the case when δ = 2 to
k
  
dmin ≤ (n − k + 1) − −1 .
r
As shown below, the minimum distance upper bound in Theorem 7
can be turned around to yield an upper bound on the code rate of an
LRC.

Corollary 1. The rate nk of an [n, k, dmin ] code having (r, δ) information-


symbol locality is upper bounded by
k r
≤ .
n (r + δ − 1)
Proof: Clearly, as noted in Remark 8, we must have dmin ≥ δ. This
along with the upper bound on dmin in Theorem 7 gives us
k
  
δ ≤ (n − k + 1) − − 1 (δ − 1).
r
It follows that
k
 
n ≥ k+ (δ − 1)
r
leading to the rate bound
k r
≤ .
n (r + δ − 1)

Table 10.1 provides a listing of the constructions of LRCs presented
in this section. These constructions appear in the subsections that
follow.
678 Locally Recoverable Codes

Table 10.1: LRC constructions described in this monograph. All of the constructions
appearing in the table are explicit.

Type of LRC Code Section


Information-Symbol Locality Pyramid LRC [102] 10.5
Information-Symbol Locality Azure LRC [103] 10.6
All-Symbol Locality Tamo-Barg LRC [228] 10.7

10.5 Pyramid LRC

The pyramid-code construction technique by Huang et al. [102], [104]


yields LRCs with information-symbol locality that are optimal with
respect to the minimum distance bound in (10.1).
Let us assume that it is desired to construct an [n, k] code C with
(r, δ) information symbol locality
l mand minimum distance dLRC attaining
the bound in (10.1). Set s = kr and let n0 be such that n0 = n −
(s − 1)(δ − 1). Note that

n0 = n − (s − 1)(δ − 1) = dLRC + k − 1 ≥ k + (δ − 1),

since as noted earlier, dLRC ≥ δ. The starting point of pyramid code


construction is the systematic generator matrix G0 = [Ik P ] of an [n0 , k]
MDS code C0 . Since n0 ≥ k + δ − 1, it follows that the matrix P has
at least (δ − 1) columns. The pyramid code construction proceeds to
replace the first (δ − 1) columns of P by s(δ − 1) columns that are
derived by splitting each column of P into s columns. The resultant
matrix is then the generator matrix G of the desired [n, k] code C.
We explain the manner of column splitting through an l illustrative
m
example with (n = 12, k = 5, r = 2, δ = 3). Here s = kr = 3 and
n0 = n − (s − 1)(δ − 1) = 8. We begin with the systematic generator
matrix:
10.5. Pyramid LRC 679

h i
G0 = I5 p1 p2 p3
1 0 0 0 0
 
p11 p12 p13
0 1 0 0 0 p21 p22 p23
 
 
= 0 0 1 0 0
 

 p31 p32 p33 


 0 0 0 1 0 p41 p42 p43 

0 0 0 0 1 p51 p52 p53

of an [n0 = 8, k = 5] MDS code C0 . We will now proceed to split the


first (δ − 1) = 2 columns of P into s = 3 columns each. Equivalently,
we will replace each of the first (δ − 1) = 2 columns p1 , p2 of P by a
(k × s) sub-matrix as shown below:

p11 0 0 p12 0 0
       
p11 p12
p21 p21 0 0 p22 p22 0 0
       
       
 =⇒  0 p31 0  =⇒  0 p32 0
       
 p31 ,  p32 .
       

 p41 


 0 p41 0 


 p42 


 0 p42 0 

p51 0 0 p51 p52 0 0 p52

In general, if k = ar, we split each of the first (δ − 1) columns of P


into a columns, each containing r nonzero elements. If k = ar + b, with
0 < b ≤ (r−1), we split each column into s columns with (s−1) columns
having r elements each and the last column containing b elements. In
the present case r = 2 and b = 1. This yields the generator matrix
1 0 0 0 0 p11 0 0 p12 0 0
 
p13
0 1 0 0 0 p21 0 0 p22 0 0 p23
 
 
=  0 0 1 0 0 0 p31 0 0 p32 0
 
G  p33 .


 0 0 0 1 0 0 p41 0 0 p42 0 p43 

0 0 0 0 1 0 0 p51 0 0 p52 p53
The next step is to rearrange the columns of the matrix G as shown
below:
1 0 0 0 0 0 0 0 0
 
p11 p12 p13

 0 1 p21 p22 0 0 0 0 0 0 0 p23 

0 0 0 0 1 0 p31 p32 0 0 0
 
 p33  .(10.5)

0 0 0 0 0 1 p41 p42 0 0 0


 p43 

0 0 0 0 0 0 0 0 1 p51 p52 p53
 
 |{z} 
|{z}
local parity global parity
680 Locally Recoverable Codes

The rearrangement makes it easy to recognize that the resultant [12, 5]


code C generated by G has (r = 2, δ = 3) locality. The minimum
distance of the [8, 5] MDS code is 4. Hence the minimum Hamming
weight of a codeword of C0 is also 4. A little thought will show that
the expansion in the columns in the splitting manner just carried out
will not decrease the minimum Hamming weight. Hence C has dmin ≥ 4.
But by the dmin bound, we have
k
  
dmin ≤ (n − k + 1) − − 1 (δ − 1) = 4.
r
Hence C has dmin = 4 which is the best dmin possible.
In the general case, the MDS code C 0 is an [n0 , k, n0 − k + 1] code,
and we have chosen n0 such that the minimum distance of the MDS
code (n0 − k + 1) = dLRC . The column-splitting process results in a
code C with parameters [n, k, dmin ≥ dLRC ], by the same argument
used in the example. But by the dmin bound in (10.1), the minimum
distance can be no larger than dLRC . It follows that C is an LRC with
information-symbol locality that attains the dmin bound in (10.1).
We will refer to code symbols in the pyramid code corresponding to
columns that have been split as local parity symbols. The remaining
parity symbols will be termed as global parity symbols. Thus in the
example pyramid code there are a total of 6 local parities, with 2 local
parities associated to each of the three local codes and a single global
parity-check as noted in equation (10.5).

10.6 Azure LRC

The Windows Azure storage system employs an [n = 18, k = 14] LRC


with (r = 7, δ = 2) information-symbol locality [101], [103]. This code
has a structure similar to the pyramid code. Fig. 10.1 illustrates the
structure of this LRC. The dotted boxes identify the code symbols
of each of the two local codes, each having a single parity symbol.
In addition, there are two global parities. This code has minimum
distance dmin = 4 and can tolerate erasure of any 3 code symbols. We
now compare the Azure code against an [9, 6] RS code that also has
dmin = 4, and thus is also tolerant to 3 erasures. The repair degree of
10.7. Tamo-Barg LRC 681

x0 x1 x2 x3 x4 x5 x6 Px
Q0

Q1
y0 y1 y2 y3 y4 y5 y6 Py

Figure 10.1: The [18, 14, 4] LRC employed in Windows Azure storage. Here Px and
Py are the local parities, while Q0 , Q1 represent the two global parities.

the two codes are comparable, at 6 for the [9, 6] RS code, and 7 for
the Azure LRC. The primary difference between the two codes lies in
the storage overhead. While the [9, 6] RS code has a storage overhead
of 1.5, this falls to 1.29 in the case of the Azure LRC. This difference
in storage overhead has reportedly resulted in a large cost savings to
Microsoft [161].

10.7 Tamo-Barg LRC

Analogous to the definition of a linear LRC with information-symbol


locality, we have the definition below of a linear code having all-symbol
locality.

Definition 8 (Linear LRC with All-Symbol Locality). An [n, k] linear


code C is to be an (r, δ) LRC with all-symbol locality if associated to
every code symbol cj , 1 ≤ j ≤ n, there is a set of ℓ other code symbols
(ci1 , ci2 , · · · , ciℓ ) with ℓ ≤ r + δ − 2 such that the set of ℓ + 1 code
symbols (cj , ci1 , ci2 , · · · , ciℓ ) forms a code C j of block length = ℓ + 1 and
dmin ≥ δ. We will refer to C j as a local code associated to code symbol
cj .

Let C be an [n, k, dmin ] linear code with (r, δ) all-symbol locality.


Clearly, even under a permutation of code symbols, the code will re-
main an [n, k, dmin ] code with (r, δ) all-symbol locality. One can always
construct a systematic generator matrix Gsys = [Ik P ] either for the
code C or else, a code obtained by permuting code symbols in C. Clearly,
when encoded by Gsys , one obtains a code that has information-symbol
682 Locally Recoverable Codes

locality. It follows that the minimum distance bound for an information-


symbol locality code given in (10.1) also holds for an all-symbol code.
This raises the question as to whether there exist codes with all-symbol
locality that achieve the minimum distance bound in (10.1).
The answer is in the affirmative for (r+δ−1) | n and the construction
described below for an optimal, all-symbol locality LRC due to Tamo and
Barg appears in [228]. Let us assume that it is desired to construct an
[n, k] code with all-symbol locality over a finite field Fq where each code
symbol is a code symbol of a local code of block length ℓ ≤ r + δ − 1 and
minimum distance ≥ δ. We describe the construction as it applies to the
case (r + δ − 1) | n. The construction can however, be generalized [123],
[176] to obtain LRCs with all-symbol locality having minimum distance
at most one less than the bound in (10.1) for the case δ = 2 that avoids
this restriction, and this is described in the notes subsection.
Let S = {θi }ni=1 be a set of n distinct elements lying in Fq . Set
m = (r+δ−1)
n
and let {Sj }m−1
j=0 be a collection of m, pairwise disjoint
subsets of size (r + δ − 1) that partition S i.e.,
m−1
S =
[
Sj .
j=0

Let us assume that it is possible to identify a polynomial g(x) that we


will refer to as a “good” polynomial satisfying:

(i) deg(g) = (r + δ − 1) and

(ii) g(x) = bj for all x ∈ Sj , i.e., g(·) is constant on each subset Sj .

For simplicity, we describe the construction for the case r | k. We will


then show how this can be extended to the general case. Let F ⊆ Fq [x]
be the set of polynomials {f (·)} over Fq that can be expressed in the
form:
k
−1
X r−1
r
f (x) = aij [g(x)]i xj .
X

i=0 j=0

Let C be the linear code over Fq given by;

C = {(f (θi ), i = 1, 2, . . . , n) | f ∈ F}.


10.7. Tamo-Barg LRC 683

Note that f (x) x∈S i.e., the polynomial f (·) restricted to the subset Sj ,
j
reduces to a polynomial of degree ≤ (r − 1). We regard the m subcodes

{(f (x), x ∈ Sj ) | f (·) ∈ F}, 0 ≤ j ≤ (m − 1),

as the local codes. We see that each local code is an [r + δ − 1, r, δ] MDS


code.
Next, we claim that dim(C) = k. To see this, we first note that the
polynomials [g(x)]i xj for different pairs (i, j), 0 ≤ i ≤ kr −1, 0 ≤ j ≤ r−1
have different degrees and are hence linearly independent. Hence the
subspace of Fq [x] spanned by the polynomials in F has dimension = k.
The maximum degree of a polynomial in F equals
k
 
(r + δ − 1)
− 1 + (r − 1)
r
k
 
= k−1+ − 1 (δ − 1).
r
Note that n = (r+δ −1)m. We showed in Corollary 1 that the maximum
rate of an LRC is upper bounded by
k r
≤ .
n r+δ−1
It follows from this that k ≤ rm.
k
 
∴k−1+ − 1 (δ − 1) ≤ rm − 1 + (m − 1)(δ − 1)
r
= (r + δ − 1)m − δ
< (r + δ − 1)m − 1 = n − 1,

since δ ≥ 2. It follows that the mapping f (x) ∈ F ⇐⇒ (f (θ), θ ∈ S)


is an injection. This establishes that the code C has dimension k.
 Since the maximum degree of polynomial in F equals k − 1 +
r − 1 (δ − 1), it follows that
k

k
 
dmin (C) ≥ (n − k + 1) − − 1 (δ − 1).
r
It follows from (10.1) that C is an optimal code with (r, δ) all-symbol
locality.
684 Locally Recoverable Codes

For the case when r ∤ k, let k = ur + v, 0 < v ≤ (r − 1). Then simply


replace the set F ⊆ Fq [x] as follows.
u−1
X r−1 v−1
f (x) ∈ F ⇐⇒ f (x) = aij [g(x)] x + auj [g(x)]u xj .
X X
i j

i=0 j=0 j=0

The resultant code can be shown to be an optimal (r, δ) all-symbol


locality code having minimum distance
k
  
dmin (C) = (n − k + 1) − − 1 (δ − 1).
r
by arguing exactly as for the case r|k.

Example Construction
m−1
Let H be a multiplicative subgroup of F∗q and let S = Sj be
S
j=0
the union of m distinct cosets of H in F∗q . Thus we are assuming
that m|H| ≤ (q − 1). Let mH (x) = h∈H (x − h) be the annihilator
Q

polynomial of H.
Claim 1. mH (x) is constant on each multiplicative coset Sj = θH, θ ∈
F∗q of H.
Proof: Let y ∈ Sj =⇒ y = θh′ , h′ ∈ H. Then
mH (y) = (θh′ − h) = h′ [θ − (h′ )−1 h]
Y Y

h∈H h∈H

= (θ − h) = mH (θ)
Y

h∈H

and is hence constant on the coset. □


We can use this fact to construct a good polynomial and hence a
code with all-symbol locality by proceeding as follows. Let the field size
q and n, (r, δ) be such that:
(r + δ − 1) | n | (q − 1).
Let α be an element of order n in F∗q and set
n
β = αm , m = .
(r + δ − 1)
10.7. Tamo-Barg LRC 685

Let H be the multiplicative subgroup of F∗q given by

H = {β i | 0 ≤ i ≤ r + δ − 2},

and let

Sj = αj H, 0 ≤ j ≤ m − 1,

be the m distinct cosets of H in F∗q and set


m−1
S =
[
Sj .
j=0

The annihilator mH (x) is given in this case by:

mH (x) = xr+δ−1 − 1

and is hence constant on the cosets {Sj }m−1


j=0 of H. Also, mH (x) has
degree = (r + δ − 1) and hence mH (x) is a good polynomial. We can
simplify the choice of good polynomial in this case by noting that the
polynomial

g(x) = xr+δ−1

is a good polynomial as well.


Example: Let q = 16, n = 15, r = 3, δ = 3, (r + δ − 1) = 5. Set
g(x) = x5 . Suppose it is desired to construct an optimal code having
dimension k = 8. Note that the expansion k = ur + v gives us u = v = 2.
We set

C = {(f (θ), θ ∈ F∗q ) | f (·) ∈ F},

where
u−1
X r−1 v−1
f (x) ∈ F ⇐⇒ f (x) = aij [g(x)]i xj + auj [g(x)]u xj .
X X

i=0 j=0 j=0

This yields
1 X
2 1
f (x) = aij x5i+j +
X X
a2j x10+j .
i=0 j=0 j=0
686 Locally Recoverable Codes

This code has dmin = 4 which matches with


k
  
dmin ≤ (n − k + 1) − − 1 (δ − 1)
r
8
  
= (15 − 8 + 1) − − 1 (3 − 1) = 4.
3
An analogous proof shows that the annihilator mH (x) of an additive
subgroup H of Fq is also constant on each coset of H in Fq and may
also be used to construct good polynomials and consequently codes
with all-symbol locality as well.

10.8 Bounds on dmin and Rate for Nonlinear LRCs

We present below the proof of the bound on rate and minimum distance
of a nonlinear LRC appearing in Theorem 6.
Proof: Let C be an (n, M, dmin ) code of size M = q k , over an
alphabet Aq of size q, having (r, δ) locality. We will establish the bound
on minimum distance appearing in Theorem 6. The bound on code
rate will then follow from Corollary 1. Recall from Definition 6, that
associated to each code symbol ci , there is a set Si ⊆ [n] of size
ni := |Si | ≤ (r + δ − 1) such that the restriction Ci := C|Si of C to Si
is a code of block length ni and minimum distance ≥ δ. Note by the
Singleton bound that

| Ci | ≤ q ni −δ+1 ≤ q r .

The minimum distance of the code C can be expressed in the form

dmin = n − max { |J| | |CJ | < q k }, (10.6)


J⊆[n]

where CJ is the restriction of C to the coordinates in J. This follows since


the minimum Hamming distance between a pair of distinct codewords
is equal to n minus the maximum number of coordinates in which two
distinct codewords can agree. We next select a set of m = ⌊ k−1 r ⌋ local
codes, which without loss of generality, we may assume to be the local
codes Cj , j ∈ [m] in such a way that if Fi = ∪ij=1 Sj , i ∈ [m], then

| C|Fi | < | C|Fi+1 |, i = 1, 2, · · · , (m − 1). (10.7)


10.8. Bounds on dmin and Rate for Nonlinear LRCs 687

This is possible since,


m
(10.8)
Y
| C|Fm | ≤ |Ci | ≤ q rm < q k .
i=1

For 1 ≤ i ≤ (m − 1), let Pi = Fi ∩ Si+1 . Note that by (10.7), Pi is


a strict subset of Si+1 , i.e., Pi ⊊ Si+1 . Let x ∈ Ci+1 |Pi . Consider the
following subcode of Ci+1
C ′ i+1 = {c ∈ Ci+1 : c|Pi = x}.
Among all the possibilities for x, we choose x such that |C ′ i+1 | is of
maximum size. It is possible that Pi = ϕ in which case we will have
C ′ i+1 = Ci+1 . Clearly, we have dmin (C ′ i+1 ) ≥ δ, as every local code is
assumed to have minimum distance ≥ δ. Let us next puncture the
code C ′ i+1 on Pi , i.e, let us pass on to the restriction C ′ i+1 |Si+1 \Pi .
The restriction will then be a code of the same size, of block length
|Fi+1 | − |Fi | and minimum distance ≥ δ. The Singleton bound then
gives us:
|C ′ i+1 | ≤ q |Fi+1 |−|Fi |−δ+1 .
From the maximal manner in which x was selected, we have:
|C|Fi+1 |
≤ |C ′ i+1 |,
|C|Fi |
leading to:
|C|Fi+1 |
logq ≤ |Fi+1 | − |Fi | − δ + 1,
|C|Fi |
i.e.,
|C|Fi+1 |
logq + δ − 1 ≤ |Fi+1 | − |Fi |.
|C|Fi |
Summing both sides over the range i = 1, 2, · · · , m − 1, we obtain,
|C|Fm |
logq + (m − 1)(δ − 1) ≤ |Fm | − |F1 |.
|C|F1 |
Also, by the Singleton bound, we have:
logq |C|F1 | + δ − 1 ≤ |F1 |.
688 Locally Recoverable Codes

Adding the two equations above, we get,


logq |C|Fm | + m(δ − 1) ≤ |Fm |.
Let the integer j > 0 be such that q k−j+1 > |C|Fm | ≥ q k−j . We can then
write
k − j + m(δ − 1) ≤ |Fm |.
Since q k−j+1 > |C|Fm |, it is possible to identify a subset Q ⊂ [n] \ Fm of
size (j − 1) such that if Fm ′ = F ∪ Q, we have |C| ′ | < q k . This gives
m Fm
us:

k − j + m(δ − 1) + j − 1 ≤ |Fm | + |Q| = |Fm |

=⇒ k − 1 + m(δ − 1) ≤ |Fm |.
The equation above, combined with (10.6), gives us

dmin ≤ n − |Fm | ≤ n − k + 1 − m(δ − 1),

i.e.,
k−1
 
dmin ≤ n − k + 1 − (δ − 1)
r
leading to the desired bound
k
  
dmin ≤ n − k + 1 − − 1 (δ − 1).
r

10.9 Extended Notions of Locality

The theory of LRCs has been extended in various directions and an


overview of these different extensions is provided in Fig. 10.2. Availability
codes, codes with sequential recovery, codes with hierarchical locality
and maximally recoverable codes are discussed in Sections 11, 12, 13 and
14 respectively. A brief discussion of LRCs with cooperative recovery
appears in the present section.
A listing of the code constructions appearing in Sections 11, 12, 13
and 14, is provided in Table 10.2.
10.9. Extended Notions of Locality 689

Locally recoverable codes

Sequential Hierarchical Cooperative Maximally


Availability
recovery locality recovery recoverable

Figure 10.2: Extended notions of locality.

Table 10.2: This table provides a listing of the constructions for availability codes,
codes with sequential recovery, codes with hierarchical locality and maximally recov-
erable codes (MRCs) that appear in Sections 11, 12, 13 and 14 respectively. With the
exception of the construction of the MRC based on the Combinatorial Nullstellensatz,
all other constructions appearing in the table are explicit.

Type of extended LRC Code Section


Codes with Availability Product Code 11.2.1
Codes with Availability Wang et al. Code [246] 11.2.2
Codes with Sequential Recovery Near-Regular Graph Code [174] 12.1
Codes with Sequential Recovery 2 Dimensional Product Code [226] 12.1
Codes with Sequential Recovery Graph based Construction [13] (Example) 12.2.2
Chinese Remainder Theorem based
Codes with Hierarchical Locality Construction [199] (Example) 13.2
Maximally Recoverable Codes Combinatorial Nullstellensatz based MRC 14.3
Maximally Recoverable Codes Linearized Polynomials based MRC 14.4
Linearized Polynomials based MRC
Maximally Recoverable Codes with Reduced Field Size [71] 14.5

10.9.1 Cooperative Local Recovery


An [n, k] code with (r, t)-cooperative locality is a code such that if a
subset (ci1 , ci2 , · · · , cit ) of code symbols are erased, then there exists a
second subset {cj1 , cj2 , · · · , cjr } of r other code symbols (i.e., ia ̸= jb
for any pair (a, b)) such that for all a ∈ [t]:
r
cia =
X
θa,b cjb , θa,b ∈ Fq .
b=1

In [192], the authors provide the following bound on minimum


distance of codes with cooperative locality:
k−t
 
dmin (n, k, r, t) ≤ n − k + 1 − t .
r
690 Locally Recoverable Codes

The same paper also contains the following alphabet-size dependent


bound on dimension:

k≤ min rγ + logq (Aq (n − γ(r + t), d)),


n
γ≤min(⌊ r+t ⌋,⌊ k−1
r
⌋)

where Aq (n, d) is the maximum size of a q-ary code of block length


n and minimum distance d. Constructions of codes with cooperative
locality are also provided in [192].

Notes

Unless otherwise specified, when we speak of an LRC in this notes


subsection, we will mean a linear LRC.

1. As mentioned in the introduction, the concept of locality can be


found discussed in the early papers [87], [102], [164] as well as in
the subsequent paper [72], that explored the topic in greater depth.
The treatment in [72] focuses on the linear case, includes upper
bounds on the minimum distance, the identification of optimal
codes as well as generalizations. The class of optimal linear codes
having information-symbol locality includes the class of pyramid
codes, that first appeared in [102]. The extension to the case where
the codes are linear and where the local codes have minimum
distance ≥ 2 appears in [172]. Generalization of LRCs to the
non-linear case can be found in [68], [168], [228] as well as in the
early paper [87]. The paper [168] also considers the case of LRCs
over a vector code-symbol alphabet.

2. Codes achieving the minimum distance bound for general δ:

(a) The pyramid code [102] discussed in Section 10.5 is an ex-


ample of a construction for an information-symbol locality
code, achieving the upper bound on minimum distance, and
having field size that is linear in the block length n.
(b) Constructions of all-symbol locality codes achieving the upper
bound on minimum distance with field size linear in the block
length n for the case (r + δ − 1)|n appear in [38], [172], [228].
10.9. Extended Notions of Locality 691

A detailed investigation of codes achieving the dmin upper


bound can be found in [225].

3. Codes with all-symbol locality for the case δ = 2:

(a) For the status on constructions for the case when (r + 1)|n,
please see note above and set δ = 2.
(b) A construction with field size linear in n and minimum
distance within 1 of the upper bound on dmin can be found
in [228] for the case r ∤ k and n ̸= 1 mod (r + 1).
(c) A construction with minimum distance within 1 of the upper
bound on dmin for all parameters, having exponential field
size appears in [66].
(d) A construction achieving the minimum distance bound under
the assumption of disjoint repair sets that holds for all n ≤ q
and all n with n mod (r + 1) ̸= 1 appears in [123]. The
condition n mod (r + 1) ̸= 1 is removed in [176] and an
optimal code construction where optimality is under the
assumption of disjoint repair sets is provided in [176], for all
n ≤ q, where q is the field size.
(e) Upper bounds on dmin tighter than the one given in [72],
can be found in [160], [175], [242], [261]. Constructions for
codes achieving the tightened bound in [242] for the case of
n1 > n2 where n1 = ⌈ r+1 n
⌉, n2 = n1 (r + 1) − n and having
exponential field size can also be found there.

In summary, for the case δ = 2, the problem of constructing


optimal LRCs with field size linear in block length is completely
solved for the cases (a) (r + 1)|n and (b) for any n ≤ q under the
assumption of disjoint repair sets.

Open Problem 10. Construct an optimal LRC for the case (δ = 2)


with field size that is linear in the block length n for the general
case when (r + 1) ∤ n and where the repair sets are not necessarily
disjoint.
692 Locally Recoverable Codes

Open Problem 11. Let n1 = ⌈ r+1 n


⌉, and n2 = n1 (r + 1) − n.
Construct LRCs for the case (δ = 2) with best possible minimum
distance for the case when n1 ≤ n2 and where the repair sets are
not necessarily disjoint. (There is no constraint on field size here).

4. On the construction of LRCs with large block length for given field
size: There is interest in determining the largest possible block
length of a code with locality that achieves the upper bound in
(10.1) on minimum distance, for a given field size. This is analogous
to the problem of determining the maximum possible block length
n for which an MDS code having field size q exists. The focus of
the research effort here has been on the case δ = 2. Constructions
for LRCs with block length exceeding the field size q can be found
described in [21], [112], [142]. Bounds on the maximum possible
block length n of an LRC having minimum distance achieving
(10.1) for a given field size q can be found in [86], [90]. In [34], the
authors focus on the general case δ > 2, derive an upper bound on
the maximum length possible and provide a construction having
length that is super-linear in the size q of the underlying finite
field.

5. LRCs with all-symbol locality and small alphabet size for the case
δ = 2:

(a) Upper bounds on dmin for given (n, k, r) or on dimension k


for given (n, dmin , r): For fixed alphabet size q, upper bounds
can be found in [14], [30], [109]. In the binary (q = 2) case, a
Hamming-like upper bound on dimension appears in [243].
Upper bounds assuming disjoint repair sets can be found
in [1], [74], [155], [243], [259]. Upper bounds assuming that
the code is cyclic appear in [231]. Asymptotic upper bounds
i.e., upper bounds on code rate as a function of fractional
minimum distance dmin n , can be found in [1].
(b) Constructions: Constructions for cyclic LRCs were introduced
in [74] and these codes are for specific values of (dmin , r) and
are optimal w.r.t. upper bounds derived in the same paper.
10.9. Extended Notions of Locality 693

Construction of cyclic LRCs achieving the upper bound on


minimum distance given in [72] with field size linear in block
length n can be found in [231]. The authors also study cyclic
LRCs with smaller field size obtained by looking at sub-field
subcodes and trace codes. The locality property of classical
binary cyclic codes and codes that can be obtained from
them by operations such as shortening can be found in [108].
Construction of cyclic LRCs with local codes that are not
MDS codes, can be found in [259] and these codes are optimal
w.r.t. upper bounds derived in the same paper. Construction
of optimal cyclic LRCs for specific values of (dmin , r) can be
found in [119], [154]. Other optimal constructions that are
not necessarily cyclic and for specific values of (dmin , r) can
be found in [88], [89], [92], [142], [155], [163], [214], [219],
[243]. Constructions having good performance with respect to
code rate for a fixed alphabet size q and fractional minimum
distance can be found in [30], [230].
(c) Constructions of LRCs based on algebraic geometry (AG)
codes can be found in [19], [21], [112], [141], [142], [197].
Part of the motivation for exploring the use of AG codes,
comes from the fact that for a fixed field size, AG codes
can have larger block length in comparison with an MDS
code. The authors of [19] were also led to study AG codes
as a means of generalizing the construction of LRCs from
RS codes in [228], which could be viewed as arising from a
simple covering map of the projective line. The idea here is
to replace the simple covering map by a covering map from
one curve to another, for example from the Hermitian curve
to the projective line. In [142], the authors construct optimal
LRCs of length larger than the size q of the underlying finite
field using elliptic curves. In [141], the authors extend the
construction in [19] and make use of the automorphism group
of a tower of function fields to derive asymptotically-good
LRCs. The construction in [112] is based on automorphism
groups of rational function fields.
694 Locally Recoverable Codes

6. Codes with all-symbol locality for general δ and having small


alphabet size: Upper bounds on dimension for a given alphabet
size q can be found in [1]. Constructions yielding asymptotic lower
bounds on rate for a fixed alphabet size q as a function of fractional
minimum distance can be found in [19]. In [91], the authors prove
that for δ > 2 there are only two classes of binary codes which
achieve the upper bounds given in [172].

7. In a different direction, in [131] the authors explore the use of


locality for reducing the decoding complexity of a cyclic code.
11
Codes with Availability

This section, as well as the two sections that follow, may be viewed as
providing additional, alternative approaches to handling multiple node
failure.
Codes with availability, discussed in the present section, provide
multiple, node-disjoint means of accessing the data contained within a
particular node, thereby enabling recovery from multiple-node failure.
Availability codes have an additional appeal: they are able to handle
multiple, simultaneous requests for the data contained within a partic-
ular node, a useful feature when storing popular content. The notion
of codes with availability was introduced by Wang and Zhang in [241]
in the setting of linear codes. As in the case of LRCs, we begin with a
discussion of the more general case of nonlinear codes with availability,
before specializing to the linear case.

Definition 9 (Availability Code). An (n, M, dmin ) code C over an alphabet


Aq of size q is said to be a code with availability with parameters (r, t),
if for each code symbol ci , i ∈ [n], there are t disjoint repair sets

{ Rij ⊆ [n] | |Rij | ≤ r, i ∈


/ Rij j = 1, 2, · · · , t} ,

695
696 Codes with Availability

|Rij |
and t associated functions {fij : Aq → Aq } such that

ci = fij (cℓ , ℓ ∈ Rij ) , all j = 1, 2, · · · , t.




Theorem 8 (Upper Bound on Rate and dmin for Nonlinear Codes [230]).
Let C be an (n, M, dmin ) code over an alphabet Aq of size q with
availability having parameters (r, t) and where the repair sets {Rij | i ∈
[n], j ∈ [t]} are of constant size |Rij | = r. Let k = ⌊logq (M )⌋. The rate
and minimum distance of C then satisfy the upper bounds
k 1
≤  , (11.1)
n Qt
1+ 1
j=1 jr
t 
k−1

(11.2)
X
dmin ≤ n − .
i=0
ri

In our proof of the rate bound, we follow Tamo et al. [230] and refer
the reader to [230] for a proof of the upper bound on minimum distance.
The rate bound will follow from Lemma 10 and Lemma 11, given
below. The proofs adopt a graphical approach that involves associating
a directed graph on n nodes with the code, called the recovery graph.
The ith node is associated to code symbol ci . The edges are colored
using one of t colors which we associate with elements of the set [t].
There is a directed edge bearing color ℓ, ℓ ∈ [t], from node j to node
i iff j ∈ Riℓ . Next, a random permutation π(·) of the set [n] is chosen
and the nodes are linearly ordered from left to right with the ith node
appearing in position π(i). We then turn to a coloring of the nodes.
Node i is assigned color ℓ iff

π(j) < π(i), ∀ j ∈ Riℓ .

It is possible for a node to be assigned up to a maximum of t colors.


It is also possible that a node is not assigned any color, i.e., is left
uncolored. Clearly for a fixed permutation π(·), if the values of the
code symbols associated with the uncolored nodes are known, then all
the remaining symbols can be determined. It follows that if ku is the
number of uncolored nodes under a given permutation π(·), that the
file size of the availability code is bounded above by M ≤ q ku . This is
illustrated in Fig. 11.1 for an example linear code of block length n = 5.
697

!! !" !# !$ !%

Figure 11.1: Illustrating the recovery graph for a binary linear code, satisfying the
parity checks: c1 + c2 = c3 , c2 + c4 = c5 . The nodes are ordered in accordance with a
random permutation π, so that the node associated to ci appears in position π(i).
Here (π(1), π(2), π(3), π(4), π(5)) = (3, 4, 5, 1, 2). The edges in red are associated to
the first p-c equation and the edges in green, with the second (only the edges relevant
to node coloring are shown). Under this permutation, node 3 is colored red and node
2 is colored green. The number 3 of uncolored symbols leads to the upper bound
M ≤ q 3 on code size.

Lemma 10. Let C be an (n, M, dmin ) code having availability parameters


(r, t) and constant repair-set size, i.e., |Rij | = r for all {i, j}. Then there
exists a coordinate permutation π : [n] → [n] such that under the
ordering of code symbols determined by π, the number |U | of colored
nodes, i.e., nodes that are assigned at least one color as per the method
of assigning colors described above, satisfies the lower bound:
 
1
|U | ≥ n 1 − Q   .
t
j=1 1+ 1
jr

Proof. Let us pick a permutation π randomly from the n! possibilities.


We color the nodes of the recovery graph associated to the code as
described above, for the ordering defined by π. Let Aij be the event
that node i, i.e., the node associated to code symbol ci , is colored with
color j. Let U denote the set of colored nodes. Clearly, U is a function
of the particular realization of the random permutation π(·). We have

P (i ∈ U ) = P (∪tj=1 Aij ),

where P (·) denotes the probability function. We can employ the inclusion-
exclusion principle to calculate the above probability if we know
P (∩j∈S Aij ) for every subset S ⊆ [t]. This latter probability is the proba-
bility of the event that node i is colored with all colors in the set S which
698 Codes with Availability

implies the event π(ℓ) < π(i) for all ℓ ∈ Rij , j ∈ S. It follows that in the
linear ordering determined by π, the code symbol ci must necessarily
appear to the right of all the code symbols cm , m ∈ ∪j∈S Rij . If we
restrict attention to the set of code symbols {cm | m ∈ {i} ∪j∈S Rij },
all orderings of these symbols are equally likely. Hence the probability
that code symbol ci ends up in the rightmost position within this set,
is given by:
1
P (∩j∈S Aij ) =
| ∪j∈S Rij | + 1
1 1
= = .
j∈S |Rij | + 1 |S|r + 1
P

By the inclusion-exclusion principle, we have:

P (i ∈ U ) = P (∪tj=1 Aij )
t
!
t
= (−1) P (Ai1 ∩ Ai2 ∩ ... ∩ Aij )
X
j−1

j=1
j
t
1
!
t
= (−1)j−1
X
.
j=1
j jr + 1

Through algebraic manipulation, this can be reduced to:


 
1
P (i ∈ U ) = 1 − Q   .
t
j=1 1+ 1
jr

The expected value of the number of colored codes is then given by:

E(|U |) = P (i ∈ U )
X

i
 
1
= n 1 − Q   .
t
j=1 1+ 1
jr

The proof is completed by observing that there exists at least once


choice of π for which |U | ≥ E(|U |).

Lemma 11. Let C be an (n, M, dmin ) code over an alphabet Aq of size


q with availability parameters (r, t) and repair sets Rij of constant size
11.1. Linear Availability Codes 699

|Rij | = r for all {i, j}. Let π be an arbitrary permutation on [n] and
let the nodes of the associated recovery graph be colored as described
above, under the ordering of nodes associated to π(·). Let U be the set
of colored nodes. Then we have the upper bound
M ≤ q n−|U | ,
on code size.
Proof. Follows from the fact that under any ordering of code symbols,
the code symbols associated to colored nodes can be determined given
the values of the code symbols associated to uncolored nodes.

The upper bound on rate given in Theorem 8 then follows from an


application of Lemmas 10 and 11.

11.1 Linear Availability Codes

As in the case of LRCs, there is greatest interest in the linear case, and
we provide below, a formal definition of a linear availability code.
Definition 10 (Linear Availability Code). An [n, k, dmin ] code C over a
field Fq , is said to be a code with availability with parameters (r, t), if
for each code symbol ci there are t disjoint sets
{ Rij ⊆ [n] | |Rij | ≤ r, i ∈
/ Rij j = 1, 2, · · · , t } ,
such that
ci = aijℓ cℓ , aijℓ ∈ Fq , holds for j = 1, 2, · · · , t.
X

ℓ∈Rij

11.2 Constructions of Linear Availability Codes

11.2.1 The Product Code Construction


The t-fold product of the [r + 1, r] single-parity-check code gives rise
to a t-dimensional (r + 1)t , rt product code. It is straightforward to
 

verify that this code is an availability code with availability parameters


(r, t) having code rate
k rt
= .
n (r + 1)t
700 Codes with Availability

11.2.2 A High-Rate Construction


We present here a construction due to Wang et al. [246] of an (r, t)
availability code C having parameters
r+t r+t r+t−1
! ! !!
n = , k= − .
t t t−1
In comparison with the product code, this code not only has a signifi-
cantly improved rate given by,
k r
= ,
n r+t
it also has shorter block length.
Construction 2 (Wang et al. [246]). Let us define the sets:
C = {S ⊆ [r + t] : |S| = t}
R = {S ⊆ [r + t] : |S| = t − 1}.
We will assume that the sets C and R are lexicographically ordered.
By lexicographic ordering, we mean the following. In the ordering, a
set E1 = {i1 , . . . , it } with i1 < i2 < . . . < it appears before a set
E2 = {j1 , . . . , jt } with j1 < j2 < . . . < jt iff for some ℓ ∈ [1, t], ip = jp
for all p ∈ [1, ℓ] and iℓ+1 < jℓ+1 . Clearly, |C| = r+t := n(r, t) and

t
r+t
|R| = t−1 := m(r, t). Define an (m(r, t) × n(r, t)) binary matrix H(r, t)


whose (i, j)th entry hij is given by:



1, if R(i) ⊆ C(j),
hij =
0, otherwise,
where R(i) is the ith element in R and C(j) is the jth element in C,
both under lexicographic ordering. We define an [n(r, t), k] code C with
parameters (r, t) as the linear, binary code with p-c matrix H(r, t) where
H(r, t) is as defined above.
We first establish that the code is an (r, t) availability code as
claimed. Following this, we will go on to provide an expression for the
dimension k and rate nk of the code.
Lemma 12. The code C is a code with availability with parameters
(r, t).
11.2. Constructions of Linear Availability Codes 701

Proof. Clearly, each row of H has Hamming weight (r + 1) and each


column of H has Hamming weight t. We claim that the real inner
product of any two rows of H is ≤ 1. Suppose the inner product of two
distinct rows i1 and i2 of H is ≥ 2. This is possible iff there exist two
distinct column indices j1 , j2 such that
R(i1 ) ⊆ C(j1 ) ∩ C(j2 ),
R(i2 ) ⊆ C(j1 ) ∩ C(j2 ),
But this is impossible since
R(i1 ) ⊆ C(j1 ) ∩ C(j2 ) ⇒ R(i1 ) = C(j1 ) ∩ C(j2 ),
and as a result, i1 is uniquely determined from (j1 , j2 ). It follows that
C is an availability code having parameters (r, t).

The two lemmas below will show that the code C has dimension
r−1+t r+t r−1+t
! ! !
k = n(r, t) − = − ,
t−1 t t−1
and hence rate
r+t r−1+t
k t − t−1 r
= = .
n r+t
t
r+t
Lemma 13. The matrix H(r, t) has the following recursive structure
H(r, t − 1) 0
 

H(r, t) = 
 I
|{z} H(r − 1, t) 

| {z }
n(r,t−1) columns n(r,t)−n(r,t−1) columns

Proof. There are four blocks in the above recursive structure for H(r, t).
We prove the presence of these four blocks separately.
• By lexicographic ordering, the first n(r, t − 1) subsets in columns
i.e., the sets C(i), 1 ≤ i ≤ n(r, t − 1) all contain the element 1.
Similarly the first m(r, t − 1) subsets in rows i.e., the sets R(i),
1 ≤ i ≤ m(r, t − 1) all contain the element 1. Hence 1 is fixed in
all these sets and we can think of them as if t has reduced by one.
Hence the first n(r, t − 1) columns and the first m(r, t − 1) rows
of H(r, t) has the form: H(r, t − 1).
702 Codes with Availability

• The rest of the subsets in columns C(i), n(r, t − 1) + 1 ≤ i ≤ n(r, t)


are sets which does not contain 1 which means these are subsets
of [2, r + t] of size t and the rest of the subsets in rows R(i),
m(r, t − 1) + 1 ≤ i ≤ m(r, t) are sets which does not contain 1
which means these are also subsets of [2, r + t] of size t − 1 and
this equivalent to saying r has reduced by one. Hence H(r, t) in
these columns and rows correspond to H(r − 1, t).

• The first m(r, t − 1) rows of the columns C(i), n(r, t − 1) + 1 ≤


i ≤ n(r, t) has zeros as these columns are subsets which does not
contain 1 and rows are subsets which contain 1.

• The rest of subsets in rows R(i), m(r, t − 1) + 1 ≤ i ≤ m(r, t) are


sets which does not contain 1 which means these are subsets of
[2, r + t] of size t − 1. Hence these subsets when added with {1}
forms a unique subset containing 1 of size t. Hence we get the
identity part.

Lemma 14. By Lemma 13, the matrix H(r, t) has the recursive struc-
ture:
" #
H(r, t − 1) 0
H(r, t) =
I H(r − 1, t)
In this recursive structure we have that
rank(H(r, t)) = rank(H ′ ),
where
h i
H′ = I H(r − 1, t) .
It follows that the dimension k of the availability code C is given by:
r−1+t
!
k = n(r, t) − .
t−1
Proof. The row-reduction process in block-matrix form applied to
" #
H(r, t − 1) 0
H(r, t) =
I H(r − 1, t)
11.3. Upper Bounds on dmin of Linear Availability Codes 703

gives us the matrix


" #
0 −H(r, t − 1)H(r − 1, t)
.
I H(r − 1, t)

We claim that

H(r, t − 1)H(r − 1, t) = 0.

Clearly, if we can show this, this establishes the Lemma. We will show
the product H(r, t − 1)H(r − 1, t) to be the zero matrix by showing that
each inner product of a row in H(r, t − 1) and a column in H(r − 1, t)
over F2 equals 0. Each row of H(r, t − 1) is associated to a subset D
of size (t − 2) drawn from a set of size (r + t − 1). Each column of
H(r − 1, t) is associated to a subset F of size t drawn from a set of
size (r + t − 1). The inner product is precisely equal to the number of
subsets E that satisfy

D ⊆ E ⊆ F,

given that

|D| = (t − 2), |E| = (t − 1), |F | = t.

It follows that this number is either 0 or 2. Thus the inner product in


either case, is equal to 0 (mod 2). Hence

H(r, t − 1)H(r − 1, t) = 0.

11.3 Upper Bounds on dmin of Linear Availability Codes

11.3.1 Bounds Depending upon Field Size q


Most of the upper bounds in the literature on the minimum distance
of a linear code C with availability, and that take into account the
size q of the underlying finite field, are based on the following code-
shortening approach. Let G be a generator matrix of an (n, k, r, t, q),
linear availability code C with maximum possible minimum distance.
Let S ⊆ [n] and let G|S denote the corresponding sub-matrix of G. Let
704 Codes with Availability

s = |S| and ν be an upper bound to the rank of G|S . Let C S = {c|[n]\S :


c ∈ C, c|S = 0} denote the code of block length (n − s) obtained by
shortening C with respect to S. Then we can upper bound the minimum
distance of C via

dmin (n, k, r, t, q) = dmin (C) ≤ min dmin (C S )


{S:S⊆[n],ν<k}
≤ min dmin (n − s, k − ν, r, t, q),
{S:S⊆[n],ν<k}

where dmin (n, k, r, t, q) is the maximum possible minimum distance of


an [n, k] linear code over Fq with availability with parameters (r, t). The
bound presented in Theorem 9 below is derived in terms of generalized
Hamming weights (GHW) of the dual code C ⊥ . GHWs are defined
below.

Definition 11. [250] The ith generalized Hamming weight1 (GHW) di


of a code C is the smallest support of an i-dimensional subcode of C.
We use d⊥i to denote the ith GHW of the dual code C .

The shortening approach and the approach via GHW are closely
connected. Knowing the GHW of the dual code makes it easier to
identify candidate sets S to be used in conjunction with the shortening
approach. However, in practice, the GHW of the dual code may not
be precisely known, while upper bounds to the GHW of the dual code,
might be more easily available. For this reason, the bound in Theorem 9
below, is phrased in terms of upper bounds d⊥ i ≤ ei , i = 1, 2, · · · , b on
the first 1 ≤ b ≤ n − k GHWs of the dual code.

Theorem 9. [14] Let C be an [n, k] linear availability code with param-


eters (r, t) over the field Fq . Then

dmin (n, k, r, t, q) ≤ min dmin (n − ei , k + i − ei , r, t, q)


i∈T

where T := {i : ei − i < k, 1 ≤ i ≤ b} and b ∈ [1, n − k] and where the


integers ei for 1 ≤ i ≤ b, must have the property that they upper bound
the corresponding GHW, i.e., d⊥ i ≤ ei .

1
Generalized Hamming weights are at times referred to in the literature, as
minimum support weights, see for example [94].
11.3. Upper Bounds on dmin of Linear Availability Codes 705

Proof. As noted above, the {ei }bi=1 play the role of known upper bounds
on the GHW of the dual code, as in most cases, the exact GHW of
the dual code will be unknown. Let the support of a subcode of C ⊥ of
dimension i be Si , where |Si | = d⊥
i . Let some arbitrary indices of code
symbols be added to Si so that the augmented set satisfies |Si | = ei .
Next, let C be shortened at the co-ordinates indexed by Si i.e.,
Cshorten = {c|[n]\Si : c ∈ C, c|Si = 0}.
It follows that Cshorten is also an availability code having parameters
(r, t) and block length (n − ei ). We claim that Cshorten has dimension
≥ n − ei − (n − k − i) = (k + i − ei ). This can be seen as follows. From
the definition of GHW, the p-c matrix of the code C can be written in
the following form:
0
 
Hi
i rows { 
H= A H ′
.

n − k − i rows {
 |{z} |{z}
co-ordinates in Si ( ei columns) n−ei columns

In the above, H′ is the p-c matrix of Cshorten . Clearly, rank(H ′ ) ≤ (n−k−


i) and it follows that Cshorten has dimension ≥ n−ei −(n−k−i) = k+i−ei .
We also have:
dmin (C) ≤ dmin (Cshorten )
≤ dmin (n − ei , k + i − ei , r, t, q).
The bound follows. The restriction ei − i < k appearing in the definition
of the set T in the Theorem is to ensure that the code whose minimum
distance appears on the right has dimension ≥ 1.

For the choice of b and ei , 1 ≤ i ≤ b appearing in equation (11.5)


below, the bound in Theorem 9, is tighter than the corresponding
bounds based on the shortening approach, that appear in [109], [133]
for t ≥ 2, r ≥ 2. A different upper bound on minimum distance, also
based on GHW, appears in [133].

11.3.2 Field-Size-Independent Bounds


The field-size dependent bound appearing in Theorem 9, can be con-
verted into one that is independent of field size, and is given in the
corollary below.
706 Codes with Availability

Corollary 2. [14] Let dmin (n, k, r, t) be the maximum-possible minimum


distance of an [n, k] linear code C with availability with parameters
(r, t). Then

dmin (n, k, r, t) ≤ min dmin (n − ei , k + i − ei , r, t)


i∈T
≤ min n − k − i + 1, (11.3)
i∈T

where T = {i : ei − i < k, 1 ≤ i ≤ b} and b ∈ [1, n − k] and d⊥


i ≤ ei for
1 ≤ i ≤ b.

Proof. From Theorem 9 we have that,

dmin (n, k, r, t, q) ≤ min dmin (n − ei , k + i − ei , r, t, q),


i∈T
⇒ max dmin (n, k, r, t, q) ≤ min max dmin (n − ei , k + i − ei , r, t, q),
q i∈T q
⇒ dmin (n, k, r, t) ≤ min dmin (n − ei , k + i − ei , r, t).
i∈T

Applying the Singleton bound to dmin (n − ei , k + i − ei , r, t), we obtain


the bound in (11.3).

Remark 10. Most of the field-size-independent upper bounds in the


literature on the minimum distance of a linear code C with availability,
rely first on finding a set S ⊂ [n] such that rank(G|S ) ≤ k − 1 where G
is the generator matrix and G|S is the generator matrix restricted to
columns indexed by S. This then leads to the upper bound,

dmin ≤ n − |S|.

We show below how the very same bound can be obtained by applying
Corollary 2 with i = |S| − k + 1, and ei = |S|. The motivation for
making this connection, is that this will make it easier to compare prior
bounds in the literature, with the bound appearing in Corollary 2.
The dual of the restriction C|S of the code C to the set S, is a
shortened version of the dual C ⊥ having dimension ≥ |S| − k + 1.
The definition of GHWs of the dual C ⊥ , allows us to conclude that:
|S|−k+1 ≤ |S|. This now allows us to apply the bound
d⊥

dmin ≤ min n − k − i + 1
i∈T
11.3. Upper Bounds on dmin of Linear Availability Codes 707

in Corollary 2 with i = |S| − k + 1, and ei = |S| since the required


conditions

d⊥
|S|−k+1 ≤ ei = |S|,
ei − i < k,

are both satisfied. We end up, as mentioned earlier, with the very same
bound:

dmin ≤ min n − k + 1 − (|S| − k + 1) = n − |S|.


i∈T

Remark 11. By the remark above, prior bounds appearing in the


literature, correspond to different choices of i in the bound in (11.3).
All choices of i can be shown to satisfy the requirements

d⊥
i ≤ ei ,
ei − i < k,

for a suitable value of ei . We identify the value of i employed in the


various bounds in the following. Let t ≥ 2, r ≥ 2.

1. The bound in [241] can be obtained by setting:


t(k − 1) + 1 t(k − r)
   
i = −1=
t(r − 1) + 1 t(r − 1) + 1
j k

k−r
  k−2 ,
r−1 k if (r − 1) | (k − 1)
≤ ≤ j k−1
r−1  , otherwise.
r−1

2. The bound presented above in Theorem 8 and appearing in [230]


is applicable to the case of a general nonlinear code. However,
when specialized to the linear case, the same minimum distance
bound can be obtained by setting :
j k
t 
k−1
  k−2 ,
r−1 k if (r − 1) | (k − 1)
i =
X
≤ j k−1
i=1
ri  , otherwise.
r−1

3. A particular bound in [133] can be obtained by setting:


k−2
 
i = .
r−1
708 Codes with Availability

4. The tightest known bound for t ≥ 2, r ≥ 2 that is derived from


this approach appears in [14]. Here, the authors use:

b = ⌈n − nRmax ⌉ (11.4)
2ej
 
eb = n, ej−1 = ej − + r + 1, ∀j ∈ [2, b] (11.5)
j
i = max j
{j : (ej −j) < k, j∈[b]}

where Rmax is the maximum rate of a code with availability with


parameters (r, t). One can substitute an upper bound for Rmax in
the event that the precise value of Rmax is unknown. For example
one can substitute the upper bound on rate given in Theorem 8
in the place of Rmax .

Constructions achieving any of the known upper bounds on minimum


distance for codes with availability are known only for the cases of very
small rate [22], [222].

11.4 Strict Availability

Every linear availability code with parameter set (r, t) possesses a p-


c matrix H with associated rowspace H, having the property that
associated with each coordinate i ∈ [n], there is a collection Ti ⊆ H
of row vectors of size |Ti | = t where each vector in Ti has Hamming
weight ≤ (r + 1) and where the intersection of support of any two rows
has size = 1 and where each row vector has a non-zero value in the ith
coordinate.
Linear codes with strict availability, are a further-constrained subset
of linear availability codes as defined below. We note that both the
product-code construction as well as Construction 2 by Wang et al., are
examples of linear codes with strict availability. Imposing the condition
of strict availability makes it possible to derive tighter bounds on
minimum distance and rate.

Definition 12. An [n, k, dmin ] code C is said to be a code with strict


availability with parameters (r, t) and denoted as an (n, k, r, t)sa code,
if there exists an ( r+1
nt
× n) matrix Hsa having the following properties:
11.4. Strict Availability 709

• Each row has Hamming weight r+1 and each column has Hamming
weight t,

• The support of any two rows of Hsa intersect in at most one index
and

• C lies in the nullspace of Hsa .

With respect to the definition above, we note that for strict avail-
ability to hold, we need that (r + 1) | nt. We do not require the rows
of the matrix Hsa to be linearly independent. It is straightforward to
verify that a code with strict availability is also a code with availability.
We will now derive the upper bound on the rate of codes with strict
availability appearing in [14].

Theorem 10. [14] Let us define

Sr,t = {(k, n) : an (n, k, r, t)sa code exists over a field Fq },


k
R(r, t) = sup .
{(k,n) ∈ Sr,t } n

Then R(r, t) satisfies the functional equation and upper bound given
respectively by:
t t
R(r, t) = 1 − + R(t − 1, r + 1),
r+1 r+1
t t 1
R(r, t) ≤ 1 − + . (11.6)
r + 1 r + 1 r+1
j=1 (1 + j(t−1) )
1
Q

Proof. Let a block length n be fixed. Let us define:

Sn,r,t = {k : an (n, k, r, t)sa code exists over some field Fq },


k
R(r, t, n) = max .
{k∈Sn,r,t } n

It follows that R(r, t) = supn R(r, t, n). Next, we pick an integer n such
that R(r, t, n) > 0. Note that by Construction 2, such an n does exist.
Let C be an (n, k, r, t)sa code having rate R(r, t, n). Since our interest
is in deriving an upper bound on code rate, we can without loss of
generality, assume that C has a p-c Hsa that satisfies the conditions laid
710 Codes with Availability

out in Definition 12. It follows that the rank of this p-c matrix satisfies
rank(Hsa ) = n − nR(r, t, n). Next, we note that HsaT is the p-c matrix of

a code with strict availability having parameters r′ = (t − 1), t′ = (r + 1)


and block length n′ = r+1nt
. But Hsa
T may not define an availability code

having maximum possible rate R(t − 1, r + 1, n′ ). It follows that


n′ − n′ R(r′ , t′ , n′ ) = n′ − n′ R(t − 1, r + 1, n′ ) ≤ n − nR(r, t, n),
⇒ nR(r, t, n) ≤ n − n′ + n′ R(t − 1, r + 1, n′ ),
t t
⇒ R(r, t, n) ≤ 1 − + R(t − 1, r + 1, n′ ),
r+1 r+1
t t
⇒ R(r, t, n) ≤ 1 − + sup R(t − 1, r + 1, n′ ),
r + 1 r + 1 n′
t t
⇒ sup R(r, t, n) ≤ 1 − + sup R(t − 1, r + 1, n′ ),
n r + 1 r + 1 n′
t t
⇒ R(r, t) ≤ 1 − + R(t − 1, r + 1).
r+1 r+1
Next, reversing the roles of Hsa and Hsa T , meaning that this time, if
T were instead, the p-c matrix of an availability code having block
Hsa
length n′ = r+1
nt
and rate R(r′ , t′ , n′ ), we would obtain the inequality in
the reverse direction:
t t
R(r, t) ≥ 1 − + R(t − 1, r + 1).
r+1 r+1
This gives us the desired functional equation:
t t
R(r, t) = 1 − + R(t − 1, r + 1).
r+1 r+1
Applying Theorem 8 to upper bound the rate R(t − 1, r + 1), we obtain
the upper bound appearing in (11.6).

Notes
1. Upper bounds on the rate of binary codes with strict availability:
In [114], [115], Kadhe and Calderbank provide the following upper
bound on rate of binary codes with strict availability for t = 3
and any r:
r−2 3 1
 
R(r, 3) ≤ + H2 ,
r+1 r+1 r+2
11.4. Strict Availability 711

where H2 (p) = −p log2 (p) − (1 − p) log2 (1 − p). For the specific


case t = 3, any r and n = (r+1)(2r+3)
3 , they derive the result:

3 3 log(2r + 4)
R(r, 3) ≤ 1 − + (11.7)
r + 1 (r + 1)(2r + 3)
The bound in (11.7) can be achieved by constructing a code that
makes use of the incidence matrix of a Steiner Triple System as
the p-c matrix, a construction pointed out by [12], and [245] in the
availability context and thus the upper bound in (11.7) is tight.
In the same paper [115], the authors provide the following upper
bound on the rate of binary codes with strict availability for the
case r = 2 and arbitrary t:
1
 
R(2, t) ≤ H2 .
t+1

2. Codes with availability constructed from AG codes can be found


in [19], [21], [93], [111]. The constructions in [19], [21], [93] make
use of fiber products of curves. The construction in [111] is based
on automorphisms of rational function fields.

3. Asymptotic lower bounds on rate as a function of relative minimum


distance, with parameters (r, t) appear in [19], [133], [230].

4. High-rate binary constructions of availability codes: The construc-


tion given in [246] and described above in this section has rate
r+t for any (r, t) which is the highest rate among the known
r

general constructions. For some sporadic parameters, codes with


rate larger than r+t
r
are known:

(a) The cyclic code construction given in [245] has the following
parameters

n = 2t − 1, k = 2(t−1) − 1, r = t − 1
2(t−1) −1
and hence has rate 2t −1 which is larger than r
r+t = 2t−1 .
t−1

(b) The constructions obtained by using the incidence matrix of


a balanced incomplete block design (BIBD) as p-c matrix
712 Codes with Availability

pointed out in [12] and [245] in the context of availability


codes have rate exceeding r+t
r
. As an example for t = 3, the
construction by using the incidence matrix of the Steiner
triple system (STS), a special case of a BIBD, as p-c matrix
has the following parameters
(2s − 1)(2s − 2)
n = , k = n − (2s − 1 − s), r = 2s−1 − 2,
6
and hence rate
k 6(2s − 1 − s) r 3
= 1− s ≥ = 1 − s−1
n (2 − 1)(2s − 2) r+t 2 +1
for s ≥ 2. Note that the rate of this code based on STS
achieves the upper bound on rate given in equation (11.7).
The authors of [245] also generalize this construction to derive
codes for other values of r for t = 3 by shortening.
(c) In [109], the authors point out a class of majority logic
decodable codes as examples of codes with availability. Some
of these codes have rate better than r+t
r
.

5. Constructions with minimum distance > (t + 1): In [222], the


authors present constructions obtained by puncturing the genera-
tor matrix of the Simplex code in positions indicated by columns
of the generator matrix of an anti-code which are optimal w.r.t.
the alphabet-dependent bounds given in [30] and the Griesmer
bound for linear codes for r ∈ {2, 3}. Construction of codes with
minimum distance > (t + 1) can also be found in [21], [22], [42],
[93], [111], [153], [228], [245], [247], [262].

6. Lower bound on block length: In [12], the authors provide the


following lower bound on block length of a code with strict avail-
ability,
r(r + 1)
n ≥ (r + 1)2 − ,
t
and show that the above lower bound can be achieved with equality
iff the p-c matrix of the code can be expressed as the incidence
matrix of a BIBD.
11.4. Strict Availability 713

7. Tight upper bounds on the minimum distance: A tight upper


bound on the minimum distance of codes possessing the availabil-
ity property only for information symbols and that also satisfy an
additional restriction on the structure of the code can be found in
[193]. An asymptotically tight upper bound on minimum distance
of codes with availability under a certain restriction on the struc-
ture of the parity checks giving rise to the availability property,
can be found in [8].

Open Problem 12. Determine the smallest possible block length n for
which an (n, M ≥ q k ) availability code over an alphabet of size q exists,
having availability parameters (r, t).

Open Problem 13. Determine the maximum rate of an availability


code having parameters (r, t).

Open Problem 14. Derive a tight upper bound to the minimum distance
dmin of an (n, M ≥ q k ) availability code over an alphabet of size q having
parameters (r, t).
12
LRCs with Sequential Recovery

LRCs with sequential recovery are a subclass of linear LRCs introduced


by Prakash et al. in [174]. This class of codes is designed to recover
from a set of t ≥ 1 erasures via a t-step sequential process. In each
step, an additional erased code symbol is recovered as a function of
r code symbols that have either not been erased, or else, have been
recovered in a prior round. We will use the notation seq-LRC to denote
an LRC with sequential recovery, more specifically, an (r, t) seq-LRC.
In comparison with other classes of LRCs designed for recovery from
multiple erasures, such as availability codes or (r, δ) codes, sequential
recovery imposes the least restriction on the recovery process. Thus an
(r, t) availability code is also an (r, t) seq-LRC and an (r, δ) all-symbol
LRC is also a seq-LRC with parameters (r, δ − 1).
In this section, we begin by formally defining a seq-LRC. We then
present a tight upper bound on the rate of an (r, t) seq-LRC. The bound
is tight as there is a matching construction by Balaji et al. [13], that
achieves the upper bound on code rate. We prove the rate bound in this
section only for the cases t = 2, t = 3 and refer the reader to [13] for
the proof in the general case. Similarly, we provide illustrative, example
constructions for the parameter sets (r = 6, t = 4) and (r = 3, t = 5).

714
12.1. Recovery from Two or Three Erasures 715

Definition 13. An [n, k] linear code over Fq is said to be a seq-LRC


with parameters (r, t) if any set {ci1 , . . . , cit } of t erased code symbols,
can be recovered in a sequential manner, where the jth erased symbol
cij , j = 1, 2, · · · , t, can be recovered using an equation of the form:

cij = aℓj cℓ , for some aℓj ∈ Fq ,


X

ℓ∈Sj

where Sj ⊆ [n] \ {ij , ij+1 , · · · , it } and |Sj | ≤ r.

12.1 Recovery from Two or Three Erasures

We now present an upper bound due to Prakash et al. [174] on the rate
n of a seq-LRC for the case t = 2. The bound takes on the form of a
k

lower bound on block length for given code dimension k.


Theorem 11. [174] Let C be an [n, k] seq-LRC over Fq having parameters
(r, t = 2). Then
2k
 
k+ ≤ n.
r
Proof. Let B be the vector space spanned by all codewords with Ham-
ming weight ≤ (r + 1) in the dual C ⊥ i.e.,
B = < h1 , . . . , hℓ >
where each hi is a codeword in the dual code C ⊥ of Hamming weight
≤ (r +1). Without loss of generality, we assume that the set {h1 , . . . , hℓ }
is a linearly independent set of vectors over Fq . Let us set
 
h1

h2 
H0 =
 
 .. .
.
 
 
hℓ
Note that the nullspace of H0 is a vector space that contains C, but could
potentially be of larger dimension. Let w1 be the number of columns in
H0 having Hamming weight 1. Counting the number of non-zero entries
in H0 in two different ways, row-wise and column-wise we obtain,
w1 + 2(n − w1 ) ≤ ℓ(r + 1). (12.1)
716 LRCs with Sequential Recovery

Since B contains all codewords in C ⊥ having weight ≤ (r + 1), the dual


of B is also an (r, t = 2) seq-LRC Cseq , whose p-c matrix is precisely
the matrix H0 . Since Cseq can recover from 2 erasures, it follows that
dmin (Cseq ) ≥ 3. It follows that we cannot have two distinct columns of
Hamming weight 1 in H0 having the same support set of size 1. Hence
we have that w1 ≤ ℓ. Substituting this inequality in equation (12.1), we
obtain
2n − ℓ ≤ ℓ(r + 1)
2n
=⇒ ≤ ℓ. (12.2)
r+2
Since B ⊆ C ⊥ and the dimension of C ⊥ is n − k, we have that ℓ ≤ n − k.
Substituting this in equation (12.2), we get
2n
≤ ℓ ≤ n−k
r+2
nr
k ≤
r+2
k(r + 2)
≤ n.
r
From the above equation, since n and k are integers,
2k
 
k+ ≤ n.
r

The construction below due to [174] achieves the above lower bound
on block length n.
Construction 3. [174] Let 2k = ur + b, 1 ≤ b ≤ r. Let G be a graph with
u + 1 nodes where u of the nodes have degree r and the remaining node
has degree b. Let each edge represent a message symbol. Thus there are
a total of k message symbols. Let each node represent a parity symbol
storing the binary sum of message symbols corresponding to the edges
incident on that node. Thus there are (u+1) parity symbols. The message
and parity symbols put together, yield an [n = k + u + 1, k] seq-LRC
l m
with parameters (r, t = 2). Furthermore, since n = k + u + 1 = k + 2k
r ,
by Theorem 11, this code has the minimum possible block length, and
hence maximum possible rate for given parameter set (k, r, t = 2).
12.1. Recovery from Two or Three Erasures 717

For any code based on the above construction, it is straightforward


to graphically verify that the constructed code is a seq-LRC with
parameters (r, t = 2). We do this verification as follows. If two message
symbols are erased, then at least one end of the edges corresponding
to the erased message symbols is incident on a node that has only one
message symbol erased. This allows us to sequentially recover both
message symbols. We can similarly recover from the erasure of one
parity symbol and one message symbol, or from the erasure of two
parity symbols.
We now present the upper bound on the rate of a seq-LRC having
parameters (r, t = 3) presented by Song et al. in [226]. The derivation
presented here, is taken from [13].

Theorem 12. [226] Let C be an [n, k] seq-LRC over Fq having parameters


(r ≥ 3, t = 3). Then,
2
k r

≤ .
n r+1
Proof. The following proof is based on [13]. As in the proof of The-
orem 11, let B be the span of all codewords with Hamming weight
≤ (r + 1) in the dual C ⊥ i.e.,

B = < h1 , . . . , hℓ >

where hi is a codeword in C ⊥ of Hamming weight ≤ (r +1). Without loss


of generality, we assume that {h1 , . . . , hℓ } form a linearly independent
set of vectors. Let
 
h1

h2 
H =
 
 .. .
.
 
 
hℓ
Without loss of generality, we assume that H is of the form:
h i
H = H1 H2 Hrest .

where the submatrix H1 contains the columns of H having Hamming


weight 1, H2 the columns of weight 2 and Hrest the remaining columns.
718 LRCs with Sequential Recovery

By permuting the rows of H and by permuting the columns within the


matrices H1 and H2 , the matrix H can be brought into the form:
0
 
D0 A1
Hrest 
 

H=
 0 (12.3)
 
D1 ,

 
 
0 C
where
1. D0 is an (a0 × a0 ) diagonal matrix,

2. A1 is an (a0 × a1 ) matrix with each row of weight ≤ r and column


weight ∈ {1, 2},

3. D1 is an (ρ1 × a1 ) matrix with each column of weight ≤ 1,


h i
4. The weight of each column of the matrix A1
D1 is exactly 2.

5. C is an ((ρ1 + p) × a2 ) matrix with each column of weight exactly


2.
We now draw some conclusions relating to the submatrices of H. Each
column in A1 must have weight exactly equal to 1. If a column in A1
had Hamming weight 2, this would imply that a column of D1 had
Hamming weight 0. But then it would be possible to find a set of three
columns in H including the column in H corresponding to this column
in D1 that are linearly dependent, contradicting the fact that dmin ≥ 4.
It follows that each column in D1 also has Hamming weight 1. Counting
the non-zero entries in matrix A1 row-wise and column-wise, we obtain
a1 ≤ a0 r. (12.4)
h i
Counting the non-zero entries in matrix D1
0 |C row-wise and column-
wise, we get
a1 + 2a2 ≤ (ρ1 + p)(r + 1). (12.5)
By equating the number of rows in H to the sum of the number of rows
in D0 and C, we get
ℓ = a0 + ρ1 + p. (12.6)
12.1. Recovery from Two or Three Erasures 719

Substituting equation (12.6) in (12.5), we get,


a1 + 2a2 ≤ (ℓ − a0 )(r + 1). (12.7)
Counting the non-zero entries in matrix D1 row-wise and column-
wise,
a1 ≤ ρ1 (r + 1).
Substituting the above equation in equation (12.6),
a1
ℓ = a0 + ρ1 + p ≥ a0 + + p.
(r + 1)

Substituting equation (12.4) in the above equation,


a1 a1
ℓ ≥ + + p. (12.8)
r (r + 1)
All the above inequalities hold even if any of D0 , A1 , D1 , C are empty
matrices. Counting the non-zero entries in matrix H row-wise and
column-wise,
a0 + 2(a1 + a2 ) + 3(n − (a0 + a1 + a2 )) ≤ ℓ(r + 1).
Substituting equation (12.7) in the above equation we have,
(ℓ − a0 )(r + 1) − a1
 
3n − 2a0 − a1 + ≤ ℓ(r + 1)
2
(r + 1) a1 3ℓ(r + 1)
 
3n + a0 −2 − ≤ .
2 2 2
 
Since r ≥ 3, we have (r+1) 2 − 2 ≥ 0. Hence substituting equation
(12.4) in the above equation we have,
a1 (r + 1) a1 3ℓ(r + 1)
 
3n + −2 − ≤
r 2 2 2
a1 3 3ℓ(r + 1)
 
3n − ≤ .
r 2 2
Substituting equation (12.8) in the above equation we have,
3ℓ(r + 1) 3p(r + 1) 3ℓ(r + 1)
 
3n − − ≤ .
2(2r + 1) 2(2r + 1) 2
720 LRCs with Sequential Recovery

Using p ≥ 0 in the above equation,


ℓ(r + 1) ℓ(r + 1)
n ≤ +
2 2(2r + 1)
n(2r + 1)
ℓ ≥ .
(r + 1)2
Using ℓ ≤ n − k in the above equation, we get the bound given in the
Theorem.

We now present a code [226] achieving the above upper bound on


rate.

Construction 4 (Product-Code Construction [226]). Let n = (r + 1)2 .


Arrange the n code symbols in the form of an (r + 1) × (r + 1) array.
We impose the constraint that the sum of code symbols in any row or
column equals 0. The code symbols in the first r rows and r columns
can be chosen arbitrarily from Fq . Given these r2 values, the value of
code symbols in the rest of the array is determined. Thus the code has
dimension k = r2 . The fact that this is a seq-LRC with parameters
(r, t = 3) can be seen as follows. If there are 3 erased symbols then
there must exist either a row or a column with exactly 1 erased code
symbol which can be recovered from the remaining entries in that row
or column. The remaining symbols can be similarly recovered thereafter.
r 2
By Theorem 12, this code has the maximum possible rate r+1 . This


code is commonly known as the product code in two dimensions.

Although 2-dimensional product code is well known in the literature,


[226] was the first paper to point out this construction in the context of
seq-LRCs.

12.2 The General Case

An upper bound for the case of general (r, t) derived in Balaji et al. [13]
is presented below in Theorem 13. Matching constructions establishing
that this bound is tight are also given in the same paper. The bound also
establishes the correctness of a conjecture due to Song et al. appearing
in [226] and that is stated in the notes subsection.
12.2. The General Case 721

12.2.1 Rate Bound


Theorem 13. [13] Let C be an [n, k] seq-LRC over Fq having parameters
r ≥ 3, t , over the finite field Fq . Then



rs+1

 Ps
rs+1 +2 i=0 ri
, for t even,
k 
≤ (12.9)
n 
 s+1
rP
, for t odd,
s
rs+1 +2 i=1 ri +1

where s = ⌊ t−1
2 ⌋.

The proof is along the same lines of the proof used to bound the
rate for the cases t = 2 and t = 3. The bound is tight as it is possible
to construct seq-LRCs that achieve this rate bound. Details including
code constructions, can be found in [13].

12.2.2 Example Constructions


The seq-LRC construction provided in [13] that achieves the rate bound
in (12.9), takes on a slightly different form for the cases t even and
t odd. We present below, illustrative examples of code constructions
corresponding to parameter sets (r = 6, t = 4) representing the case of
t even and (r = 3, t = 5) representing the case t odd. Additional details
can be found in [13].

Illustrative Example for t Even


Let C be a binary code with binary p-c matrix:
" #
D 0 A1 0
H= , (12.10)
0 D1 C

where

1. D0 is an (a0 × a0 ) diagonal matrix with nonzero diagonal entries,

2. D1 is an (a0 r ×a0 r) diagonal matrix with nonzero diagonal entries,

3. A1 is an (a0 × a0 r) matrix with each row of weight r and each


column of weight 1,
722 LRCs with Sequential Recovery

2
4. C is an (a0 r × a02r ) matrix with each row of weight r and each
column of weight 2.
2
Hence the block length is given by n = a0 (1 + r + r2 ). Since the diagonal
entries of D0 , D1 are nonzero, it follows that the rank of H is equal to
the number of rows. It follows that the dimension k of the code k equals
a0 r2
2 .
Let us form an augmented matrix H∞ by adding a row to H at the
very top, this row is the binary sum of rows in H. Thus H∞ is given by:
1 0 0
 

H∞ =  D 0 A1 0  , (12.11)
 

0 D1 C
where
1. 1 is an (1 × a0 ) row vector with each coordinate equal to 1,
2. the matrices D0 , D1 , A1 , C remain as before.
Clearly, H∞ is also a valid p-c matrix for the code C. Each column
of H∞ has Hamming weight exactly 2. Hence this matrix H∞ can be
interpreted as the edge-vertex incidence matrix of a graph G∞ having
n edges and (1 + a0 + a0 r) nodes (the number of rows in H∞ ). Fig.
12.1, called the Moore graph, shows the graph G∞ corresponding to
the values (a0 = 7, r = 6) for a certain choice of matrices D0 , D1 , A1 , C
ensuring that the girth of G∞ is ≥ t + 1 = 5.
Each edge in G∞ represents a distinct code symbol while each vertex
represents a parity-check on the code symbols represented by edges
attached to the vertex. Thus each vertex is associated to a row in the
p-c matrix H∞ and each edge to a column of the p-c matrix. Each
column of the p-c matrix H∞ has Hamming weight 2 and the location
of the two 1s within the column indicates the vertices to which the edge
is connected. In Fig. 12.1, the edges at the very top, which are colored
in red, correspond to the first a0 = 7 columns of H∞ . The edges which
are colored in black and blue, correspond respectively, to the columns
of H∞ corresponding to the sub-matrices
0 0
   

 A1  and  0 .
   

D1 C
12.2. The General Case 723

Figure 12.1: The figure shows the graphical interpretation of a binary, rate-optimal
seq-LRC C having parameter set (n, k, r, t) = (175, 126, 6, 4). Each of the 175 edges of
the graph represents a distinct code symbol and each of the 50 vertices represents a
parity-check of the code symbols represented by edges incident on it. This is a regular
graph with a total of 50 vertices, each of degree r + 1 = 7 and is an example of a
Moore graph called the Hoffman-Singleton graph. This graph has girth 5, which is a
necessity for the associated binary code to be able to recover from t = 4 erasures. The
code has redundancy 49 and not 50 since it turns out that the overall parity-check
at the very top is redundant.

The sequential recovery property of this binary code can be deduced


from the girth of G∞ and the fact that all nodes in G∞ have degree
exactly r + 1. It turns out that the girth of our example G∞ graph is
equal to 5. Hence if there are any ≤ 4 erased symbols and if in G∞
only edges corresponding to erased symbols are retained, there will be
at least one vertex or parity-check with degree 1 and hence the erased
symbols can be recovered one by one.
This connection between girth and sequential recovery, namely that
a girth ≥ t + 1 guarantees sequential recovery from t erasures, was to
our knowledge, first pointed out in the context of LRCs by [192] and
the graphs discussed in [192] can also be used to construct LRCs with
sequential recovery.
It can be verified that the rate of the binary code C achieves the
upper bound on rate given in Theorem 13 for (r = 6, t = 4).
724 LRCs with Sequential Recovery

Illustrative Example for t Odd


We now present an illustrative example of a seq-LRC with (r = 3, t = 5)
Let C be a binary code having binary p-c matrix given by:

D0 A1 0
 

H =  0 D1 A2  , (12.12)
 

0 0 P
where

1. D0 is an (a0 × a0 ) diagonal matrix with non-zero diagonal entries,

2. A1 is an (a0 × a0 r) matrix with each row of weight r and each


column of weight 1,

3. D1 is an (a0 r × a0 r) diagonal matrix with non-zero diagonal


entries,

4. A2 is an (a0 r × a0 r2 ) matrix with each row of weight r and each


column of weight 1,
2
5. P is an ( ar+1
0r
× a0 r2 ) matrix with each row of weight r + 1 and
each column of weight 1.

Hence the block length is given by n = a0 (1 + r + r2 ). Since D0 , D1 are


diagonal and each column of A1 , A2 , P has Hamming weight exactly 1,
we have that the rank of the above p-c matrix is equal to the number
2
of rows. Hence k = a0 r2 − ar+1
0r
.
As in the case t even, let us form an augmented matrix H∞ by
adding a row to H at the very top, this row is the binary sum of rows
in H. Thus H∞ is given by:

1 0 0
 
 D0 A1 0 
H∞ =  , (12.13)
 
 0 D1 A2 
0 0 P
where

1. 1 is an (1 × a0 ) vector with each coordinate equal to 1,


12.2. The General Case 725

2. the matrices D0 , A1 , D1 , A2 , P remain as before.


Clearly, as in the case t even, H∞ is also a valid p-c matrix for the
code C. Each column of H∞ has Hamming weight exactly 2. Hence this
matrix H∞ can be interpreted as the edge-vertex incidence matrix of a
2
graph with n edges and 1 + a0 + a0 r + ar+1 0r
nodes (number of rows
in H∞ ). The graph G∞ with H∞ as the node-edge incidence matrix
is shown in Fig. 12.2, corresponding to the values (a0 = 4, r = 3) for
a certain choice of the matrices {D0 , D1 , A1 , A2 , P } such that girth of
G∞ is ≥ t + 1 = 6.

Figure 12.2: The figure shows the graphical interpretation of a binary, rate-optimal
seq-LRC C having parameter set (n, k, r, t) = (52, 27, 3, 5). Each of the 52 edges of
the graph represents a distinct code symbol and each of the 26 vertices represents
a parity-check on the code symbols represented by edges incident on it. This is a
regular graph with a total of 26 vertices, each of degree r + 1 = 4 and is an example
Moore graph for (r = 3, t = 5) corresponding to the projective plane of order r = 3.
This graph has girth 6, which is a necessity for the associated binary code to be able
to recover from t = 5 erasures. The code has redundancy 25 and not 26 since it turns
out that the overall parity-check at the very top is redundant.

As in the example case of t = 4 even above, each edge in G∞


represents a distinct code symbol while each vertex represents a parity-
check on the code symbols represented by edges attached to the vertex.
Thus each vertex is associated to a row in the p-c matrix H∞ and each
edge to a column of the p-c matrix. Each column of the p-c matrix
H∞ has Hamming weight 2 and the location of the two 1s within the
column indicates the vertices to which the edge is connected. In Fig.
12.2, the edges at the very top, which are colored in red, correspond to
726 LRCs with Sequential Recovery

the first a0 columns of H∞ . The edges which are colored in black and
blue, correspond respectively, to the columns of H∞ corresponding to
the sub-matrices
0 0
   
 A1   0 
and .
   
D1 A2
  
   
0 P
The sequential recovery property follows from by noting that the girth
of G∞ is ≥ 6 and that all nodes in G∞ have degree exactly r + 1. It
can be seen that the rate of this code achieves the upper bound on rate
given in Theorem 13 for (r = 3, t = 5).

Notes

1. Conjecture on code rate by Song et al.: As discussed above, the


tight upper bound on code rate appearing in Theorem 13 for
general t, r ≥ 3 establishes correctness of the conjecture below:

Conjecture 1. [226] Let C be an [n, k] seq-LRC having parameters


r ≥ 3, t , over the finite field Fq . Then


k 1

1 + i=1
Pm ai
n ri

where m = ⌈logr (k)⌉, and the integers {ai } satisfy the conditions
ai ≥ 0, m
i=1 ai = t.
P

2. High rate codes having smaller block length: The construction of


seq-LRCs for any (r, t) having smaller block length and high rate,
including some codes that are rate optimal, can be found in [12],
[226]. Constructions of codes with smaller block length possessing
both the sequential recovery and availability properties can be
found in [252].

3. Upper bound on dmin for seq-LRCs: An upper bound on the


minimum distance of an [n, k] seq-LRC for t = 2 and any r can be
found in [174]. Codes achieving this upper bound are also given
12.2. The General Case 727

in [174] for n = (r+β)(r+2)


2 , β | r. Constructions of seq-LRCs for
general (r, t) with large minimum distance, but which are not
necessarily minimum distance optimal, can be found in [192].

4. Source of upper bound on dmin for codes with availability: The


upper bound on minimum distance for codes with availability
obtained from equation (11.3) by applying (11.5) is based on
the upper bound on minimum distance for seq-LRCs with t = 2
mentioned above and given in [174].

5. In Fig. 12.3, we compare the tight bound in (12.9) on the rate of


a seq-LRC with the upper bound in (11.1), due to Tamo et al. on
the rate of an availability code. The plots suggest that codes with
sequential recovery offer a significant rate advantage. Availability
codes, of course have the advantage of offering multiple disjoint
repair sets.

Figure 12.3: Comparison of rate bounds on codes with sequential recovery (12.9)
and codes with availability (11.1) for t = 12.

Open Problem 15. The tight upper bound on rate presented in this
section does not depend on block length n and depends only on the pair
(r, t). It can be shown that the block length n of a rate-optimal code must
t−2
necessarily satisfy n ≥ r 2 on account of the tree-like structure forced
upon the graphical representation of these codes as discussed in [13].
The open problem in this context, is to derive an upper bound on the
dimension k of a seq-LRC for given (n, r, t) and identify constructions
achieving the upper bound.
728 LRCs with Sequential Recovery

Open Problem 16. Clearly, the minimum distance dmin of a seq-LRC


designed to recover from t erasures satisfies the lower bound dmin ≥ (t +
1). The open problem here is to derive an upper bound on the minimum
distance of seq-LRCs for general (n, k, r, t) and obtain constructions
achieving this upper bound.
13
Hierarchical Locality

Discussions of hierarchical codes can be found in the early papers


by Huang et al. [102], [104] and Duminuco and Biersack [52], [53]
and the notes section provides additional details on these papers. In
a more recent paper by Sasidharan et al. [199], an upper bound on
the minimum distance of hierarchical codes is derived and optimal
constructions provided for certain parameter sets. In the present section,
we present the bound on minimum distance appearing in [199], [201]
as well as an example of a construction of a hierarchical code that
is optimal with respect to the distance bound, also drawn from the
same papers. We restrict our attention throughout, to the case of linear
hierarchical codes.

Motivation To reduce the repair degree while maintaining a relatively


low value of storage overhead, the Windows Azure Storage system
employs an [18, 14, 4] pyramid-like code with information-symbol locality,
and locality parameters (r = 7, δ = 2). This code is illustrated in
Fig. 13.1. Every code symbol except the global parities P1 , P2 can be
recovered accessing r = 7 other code symbols. While the code is well-
suited to handle single node-failures, the failure for example, of two

729
730 Hierarchical Locality

X1 X2 X3 X4 X5 X6 X7 Y1 Y2 Y3 Y4 Y5 Y6 Y7
P1

X-code Px Y-code PY P2

Figure 13.1: Illustrating the [18, 14, 4] code with information-symbol locality and
locality parameters (r = 7, δ = 2) employed in the Windows Azure storage system.

[24,14,6]

[12,8,3] [12,8,3]

[4,3,2] [4,3,2] [4,3,2] [4,3,2] [4,3,2] [4,3,2]

Figure 13.2: A [24, 14, 6] code having 2-level hierarchical locality. The top-level
[24, 14, 6] code is a code that is a subcode of the disjoint union of two [12, 8, 3] codes.
The [12, 8, 3] codes are in turn, subcodes of the disjoint union of three [4, 3, 2] codes.
As a result, the [24, 14, 6] code can recover from any single-node failure with repair
degree 3 by making use of the codes at the bottom level and any double-node failure
with repair degree r = 8 by making use of the middle codes.

nodes within the support of the local X-code, will mean that local node
repair is no longer possible and hence, the repair degree will jump from
r = 7 to k = 14. In such a scenario, codes with hierarchical locality can
step in to provide a more gradual degradation in repair degree.
An example of a code with hierarchical locality is presented in
Fig. 13.2. The figure shows an [24, 14, 6] code having 2-level hierarchical
locality. The top-level [24, 14, 6] code is a code that is a subcode of the
disjoint union of two [12, 8, 3] codes. The [12, 8, 3] codes, called middle
codes, are in turn, subcodes of the disjoint union of three [4, 3, 2] codes.
As a result, the [24, 14, 6] code can recover from any single-node failure
with repair degree 3 by making use of the codes at the bottom level
and any double-node failure with repair degree r = 8 by making use
of the middle codes. It is only with 3 or more node failures, that the
repair degree jumps to 14. We note that the Windows Azure code has
information-symbol locality, while the [24, 14, 6] code has all-symbol
locality.
731

The [24, 14, 6] code is also an example of a code with two-level


(all-symbol) hierarchical locality. Clearly, this has a natural extension to
multi-level hierarchical locality, encompassing more than 2 layers. In this
context, the conventional LRC may be regarded as codes having a single
level of hierarchy. In the present section, we restrict our attention for
simplicity, to codes with two-level, all-symbol, hierarchical locality and
refer the reader to [199], [201] for an extension to the case of multi-level
hierarchical locality.
We present below a formal definition of two-level, all-symbol hierar-
chical locality. When we speak in this section of a code with hierarchical
locality, we will mean a linear code with all-symbol, two-level hierarchical
locality.

Definition 14. [199] An [n, k, d] linear code C is a code with hierarchical


locality having locality parameters [(r1 , δ1 ), (r2 , δ2 )] if associated to each
symbol ci , 1 ≤ i ≤ n, there exists a code Ci obtained by puncturing C
such that ci ∈ Supp(Ci ) and the following conditions hold:

1. Ci has block length ≤ (r1 + δ1 − 1),

2. dmin (Ci ) ≥ δ1 ,

3. Ci is itself, a code with (r2 , δ2 )-locality.

As in the example, the code Ci associated to ci will be referred to


as the middle code associated to ci . The local codes that are part of
each middle code will simply be referred to as local codes.

Remark 12. Each local code (of a middle code), is a code with (r2 , δ2 )-
locality, meaning that it is a code of block length ≤ (r2 + δ2 − 1) and
minimum distance ≥ (δ2 − 1). By the Singleton bound, this means that
the dimension of each local code is ≤ r2 . Similarly, the dimension of
each middle code Ci is ≤ r1 .

Remark 13. We can scale the definition recursively to h-level hierarchi-


cal locality by modifying the last constraint. In the case of h-level hierar-
chy, the code Ci would be required to be a code with (h − 1)-level hierar-
chical locality having locality parameters [(r2 , δ2 ), (r3 , δ3 ), · · · , (rh , δh )].
732 Hierarchical Locality

13.1 An Upper Bound on dmin

The upper bound on the minimum distance of a (linear) code with


hierarchical locality derived by Sasidharan et al. in [199], [201] will
now be presented. While the result given in is for the general, h-level
hierarchy case, for simplicity, we present it here only for the case h = 2.

Theorem 14. Let C be an [n, k, d]-linear code with hierarchical locality


having locality parameters [(r1 , δ1 ), (r2 , δ2 )]. Then
l m  l m 
dmin ≤ n − k + 1 − k
r2 − 1 (δ2 − 1) − k
r1 − 1 (δ1 − δ2 ). (13.1)

Proof. The proof will identify a code CS obtained by restricting the code
C to a subset S ⊆ [n], where S has large size and dim(CS ) = (k − 1).
The bound

dmin ≤ n − |S|, (13.2)

then follows by invoking Lemma 9. We note that if G is a generator


matrix for C, then dim(CS ) can alternately be described as rank(G |S ).
Algorithm 1 given below, is used to identify a candidate punctured
code CS having sufficiently large support |S|. We will use Li and Mj
to denote the supports of the local and middle codes respectively. We
will assume an ordering of indices such that the algorithm below picks
the Li and Mj in order of their index. By this, we mean that in the
jth iteration of the outer loop indexed by j, the algorithm identifies a
middle code C Mj , that accumulates additional rank. Within the inner
loop, indexed by the variable i, the algorithm picks up the local code
C Li , whose support Li is contained in the support Mj of the middle
code, i..e, Li ⊆ Mj and that accumulates additional rank. Thus, the
inner loop index keeps incrementing without resetting, regardless of
whether or not the outer loop index associated to the middle code, has
been incremented. The algorithm terminates once the accumulated rank
equals k.
Let iend and jend respectively denote the values of the indices i
and j at which the algorithm terminates. We will use S to denote a
running support set that is incremented whenever either index i or j is
incremented. When we speak of the rank associated with support set S,
13.1. An Upper Bound on dmin 733

we will mean the rank of the matrix G |S . Let ai denote the incremental
rank and si denote the incremental support size when adding the support
of a local code Li to the existing support set S by replacing S by the
union S ∪ Li . Then we have si ≥ ai + (δ2 − 1), 1 ≤ i ≤ iend , since the
rank condition (i.e., Line 3 in Algorithm 1) ensures that ai > 0 in every
iteration.
Let Vi denote the column space of the matrix G|Li . Let ij denote the
index of the last local code C Lij added for fixed value j of the outer loop
index. As noted above, the support Lij of the code C Lij is contained
in Mj . Within the jth outer iteration, if there are no more local codes
having support contained in the support Mj of the current middle code,
and which will result in an increase in rank of the associated matrix
G|S , then the support Lij of the last local code added with Lij ⊆ Mj
is deleted from S and the support set is incremented by taking instead,
the union with Mj . Thus, in place of replacing the existing support set
S by S ∪ Lij , we replace S by S ∪ Mj . This has the effect of increasing
the support size during the ij th inner iteration by an amount

s ij ≥ aij + (δ1 − 1),


= aij + (δ2 − 1) + (δ1 − δ2 ), 1 ≤ j ≤ jend .

Thus we have that for any j, 1 ≤ j ≤ jend , we have


(
ai + (δ2 − 1), i ̸= ij ,
si ≥
aij + (δ2 − 1) + (δ1 − δ2 ), i = ij .

The rank accumulates to k during iteration iend of the inner loop,


corresponding to value jend of the outer loop’s index. This can only
happen if
k k
   
iend ≥ and jend ≥ . (13.3)
r2 r1
After adding (iend − 1) local codes, we would have arrived at a support
set S having accumulated rank
iend −1
rank(G |S ) = ai ≤ (k − 1).
X

i=1
734 Hierarchical Locality

Algorithm 1 (for the proof of Theorem 14)


1: Let j = 0, i = 0, S = ϕ.
2: while (∃ a middle code C Mj having support Mj ⊆ [n] such that
rank(G|S∪Mj ) > rank(G|S )) do
3: while (∃ a local code C Li having support Li ⊆ Mj such that
rank(G|S∪Li ) > rank(G|S )) do
4: S = S ∪ Li
5: i=i+1
6: end while
7: S = (S \ Li−1 ) ∪ Mj
8: j =j+1
9: end while

Clearly, we can augment S by adding a set J containing (k − 1) −


Piend −1
i=1 ai indices to the support of S to ensure that the rank of G |S∪J
equals (k − 1). This will ensure that
iend −1
|J| + ai = (k − 1).
X

i=1

In a final step, we replace S by S ∪ J. With this we have


iend −1
|S| ≥ |J| +
X
si
i=1
iend −1 X−1
jend
≥ |J| + (ai + δ2 − 1) + (δ1 − δ2 )
X

i=1 j=1
iend −1 X−1
jend
= (k − 1) + (δ2 − 1) + (δ1 − δ2 ).
X

i=1 j=1

It follows from our estimates of iend and jend that


k k
     
|S| ≥ (k − 1) + − 1 (δ2 − 1) + − 1 (δ1 − δ2 ),
r2 r1
leading to the bound on minimum distance appearing in the theorem.
13.2. Optimal Constructions 735

13.2 Optimal Constructions

Constructions for codes with hierarchical locality for any arbitrary level
h of hierarchy can be found in [199], [201]. Our focus here is on the
case of 2-level hierarchy. Let (n1 , n2 ) denote block lengths of the middle
and local codes respectively. Then the construction presented in [199] is
optimal when n2 | n1 | n and r2 | r1 | k. The construction is shown to be
optimal under certain other numerical constraints as well. In [18], the
authors provide optimal constructions based on algebraic curves and
elliptic curves. The constructions provided in [18] also assume numerical
conditions such as for example, δ2 = 2 or r2 | r1 | k. In [260], the authors
first construct a family of generalized RS-based optimal LRCs and then
use the resulting LRCs to construct optimal codes with hierarchical
locality. The constructions presented in [260], are less restrictive in their
choice of parameters in comparison with the constructions in [199] and
[18]. We now present an illustrative example of the optimal construction
in [199] for 2-level hierarchy.

Example Construction of an Optimal Code with 2-Level Hierarchy


The code presented here has parameters given by:

[n = 24, k = 14, (r1 , δ1 ) = (8, 3), (r2 , δ2 ) = (3, 2)].

As will be seen, the code will turn out to be optimal despite the fact
that r2 ∤ r1 . We choose n1 = 12 and n2 = 4 as the block lengths of the
middle and local codes respectively and note that n2 | n1 | n. In the
construction, there are two middle codes having disjoint support sets
M1 and M2 . In turn, each middle code contains three support-disjoint
local codes, i.e., Mi = Li1 ∪ Li2 ∪ Li3 , i = 1, 2 where Lij denotes the
support of a local code.
The underlying finite field Fq in the construction is selected to be
the field F25 , which ensures that n | (q − 1). Let G, H with H ⊆ G be
subgroups of F∗q with sizes given by |G| = n1 = 12, |H| = n2 = 4. Let α
be a primitive element of Fq , thus α has order 24. We set

G = {1, α2 , α4 , . . . , α22 }, H = {1, α6 , α12 , α18 }.


736 Hierarchical Locality

We have the two coset decompositions:


F∗q = G ∪ αG and G = H ∪ α2 H ∪ α4 H.
The annihilator polynomials associated with each of these cosets are
identified below:
Pαj H (x) = (x − θ) = x4 − α4j , 0 ≤ j ≤ 5.
Y

θ∈αj H

and
Pαi G (x) = (x − θ) = x12 − α12i , i = 0, 1.
Y

θ∈αi G

For j = 0, 1, 2, let fj (x) and gj (x) denote the message polynomials


of degree (r2 − 1) = 2 associated to the local codes CL1j and CL2j
respectively. This will ensure that each of the local codes obtained by
evaluating these message polynomials has minimum distance δ2 = 2.
The polynomials can thus be expressed in the form :
fj (x) = aj2 x2 + aj1 x + aj0 ,
gj (x) = bj2 x2 + bj1 x + bj0 .
Next, let f (x), g(x) be polynomials satisfying:
f (x) = fj (x) (mod Pα2j H (x)), 0 ≤ j ≤ 2,
g(x) = gj (x) (mod Pα2j+1 H (x)), 0 ≤ j ≤ 2.
From Chinese Remainder Theorem theory, we have the following closed-
from expressions for f and g:
f (x) = f0 (x)Q00 (x) + f1 (x)Q10 (x) + f2 (x)Q20 (x), (13.4)
g(x) = g0 (x)Q01 (x) + g1 (x)Q11 (x) + g2 (x)Q21 (x), (13.5)
where
(x4 − α8 )(x4 − α16 ) (x4 − α12 )(x4 − α20 )
Q00 (x) = , Q01 (x) = ,
(1 − α8 )(1 − α16 ) (α4 − α12 )(α4 − α16 )
(x4 − 1)(x4 − α16 ) (x4 − α4 )(x4 − α20 )
Q10 (x) = , Q11 (x) = ,
(α8 − 1)(α8 − α16 ) (α12 − α4 )(α12 − α20 )
(x4 − 1)(x4 − α8 ) (x4 − α4 )(x4 − α12 )
Q20 (x) = , Q21 (x) = .
(α16 − 1)(α16 − α8 ) (α20 − α4 )(α20 − α12 )
13.2. Optimal Constructions 737

From (13.4), (13.5), it can be seen that the monomials in both f (·) and
g(·) above have degree belonging to the set {0, 1, 2, 4, 5, 6, 8, 9, 10}. Since
the middle code is required to have minimum distance δ1 = 3, we would
like to make sure that the coefficient of x10 in both f (x) and g(x) equals
zero. This can be ensured by making sure that the message coefficients
{aij , bij , | i, j ∈ {0, 1, 2}} are pre-coded to satisfy the conditions:
a02 a12 a22
+ + = 0
(1 − α8 )(1 − α16 ) (α8 − 1)(α8 − α16 ) (α16 − 1)(α16 − α8 )
b02 b12
+ 12 +
(α − α )(α − α ) (α − α )(α12 − α4 )
4 12 4 20 20

b22
= 0.
(α − α )(α20 − α12 )
20 4

We now proceed to identify an overall message polynomial m(x) satis-


fying
(
f (x) (mod PG (x)),
m(x) =
g(x) (mod PαG (x)).

Again by Chinese remainder theorem, we have that

m(x) = f (x)T0 (x) + g(x)T1 (x),

where
(x12 − α12 )
T0 (x) = ,
(1 − α12 )
(x12 − 1)
T1 (x) = .
(α12 − 1)
The monomials in the polynomial m(x) have degrees belonging to the
set {0, 1, 2, 4, 5, 6, 8, 9, 12, 13, 14, 16, 17, 18, 20, 21}. However, we are in-
terested in constructing an overall block code having dimension k = 14.
We have already imposed two constraints on the 18 message coefficients
{aij , bij }2i,j=0 . Thus we are in a position to impose two further con-
straints. In the interest of ensuring that the minimum distance as large
a value as possible, we restrict m(x) to have degree 18, by setting the
738 Hierarchical Locality

corresponding coefficients to zero, which turns out to correspond to


imposing the precoding constraints
2 
1 aj1
 X

1 − α12 j=0
(α8(j−1) − α )(α8(j−1)
8j − α8(j+1) )
bj1

= 0
(α 8(j−1)+4 −α 8j+4 )(α8(j−1)+4 − α8(j+1)+4 )
2 
1 aj0
 X

1 − α12 j=0
(α8(j−1)+4 − α8j+4 )(α8(j−1)+4 − α8(j+1)+4 )
bj0

= 0.
(α 8(j−1)+4 −α 8j+4 )(α8(j−1)+4 − α8(j+1)+4 )
In this way, we have ensured that the overall code has minimum distance
≥ 6. Turns out from the bound in (13.1) that this is the best possible.

Notes

1. Early work on hierarchical codes: The idea of hierarchical codes


was introduced by Huang et al. [102], [104], with the help of an
example of a [20, 12] code having a two-level hierarchy and two
global parities. The authors refer to the code as a multi-hierarchical
extension of the Pyramid code, and note that this class of codes
permits decoding at the lowest level of local codes, gradually
moving up the hierarchy of local codes, and at the final step,
making use of global parities. In the papers [52], [53], published
subsequently, Duminuco and Biersack present hierarchical codes
as a means of achieving reduced value of average repair degree.
They investigate the average repair degree for a hierarchical code
of block length 64, with probabilistically varying number of erased
nodes.

Open Problem 17. Provide constructions of optimal codes having h-


level hierarchical locality for all possible parameter sets
n, k, (r1 , δ1 ), (r2 , δ2 ), · · · , (rh , δh ) without being restricted by numeri-


cal constraints involving code parameters.


14
Maximally Recoverable Codes

Background The notion of a maximally recoverable code (MRC) was


introduced by Chen et al. [40]. Subsequent, early papers on the topic
include those by Gopalan et al. [71], [72] and Blaum et al. [27]. There is
considerable variation in the definition of an MRC within the literature.
In this section, we introduce and treat MRCs in what is perhaps the
most basic setting. More general settings are discussed in the notes
subsection. In the basic setting, there is a parent [n, kL ] code C L over
a field Fq , and the goal is to identify a k-dimensional subcode C of C L
that is maximal in the following sense: C should be capable of recovering
from all erasure patterns that it is possible to do so, given that C is a
subcode of C L of dimension k. It turns out that if the underlying field
size q is large enough, such an MRC is guaranteed to exist.
More general definitions of an MRC do not assume a fully-specified
parent code C L . They may assume for instance, that C L is defined
by an (n − kL × n) p-c matrix of rank (n − kL ) that is only partially
identified. For instance, the location of the non-zero elements within
the p-c matrix could be partially or fully specified, but not the entries
themselves. A simple example of this is when the desired MRC C is
required to have disjoint locality, corresponding to a specific partitioning

739
740 Maximally Recoverable Codes

of the coordinate set [n]. From a geometric point of view, one could say
that the topology of the parity-checks (by which we mean the support
sets of the parity-checks) has been identified, but not the specific parity-
checks themselves. The rest of the definition remains unchanged and an
[n, k] MRC is defined in this more general topological setting, as any
subcode of C L that is capable of recovering from any erasure pattern
that it is possible for an [n, k] subcode of such a code C L to do so.

Motivation All of the discussion in this section will be restricted to


linear codes. We begin with an informal description of an MRC. Let
C L be a linear [n, kL ] code over a finite field Fq , where each codeword
satisfies locality constraints imposed by the linearly independent rows
of an (n − kL ) × n p-c matrix HL . Consider a situation where one


would like to impose additional parity constraints so as to arrive at a


subcode C of C L having dimension k < kL , that is capable of recovering
from a larger number of erasure patterns. How should one go about
designing these additional p-c equations?
It turns out that for a given reduced code dimension k, there are
certain excluded erasure patterns from which the subcode C cannot
possibly recover, simply by virtue of being a subcode of the parent code
C L . If q is sufficiently large, it is possible to identify a subcode C of
C L that is capable of recovering from every erasure pattern that is not
an excluded erasure pattern. Such a subcode C is called an MRC with
respect to the parent code C L . As noted at the start of this section,
there are more general definitions of an MRC in the literature, and the
notes subsection discusses some of these. A more formal definition of
MRCs (for the basic setting), appears below.

14.1 Recoverable Erasure Patterns

Throughout this section, we will identify an erasure pattern with the


corresponding subset E ⊆ [n] that specifies the coordinates of erased
code symbols. We will use S to denote the complement S = [n] \ E of
E.
14.1. Recoverable Erasure Patterns 741

Definition 15. A subset E ⊆ [n] is called a recoverable erasure pattern


of an [n, k] code C, if a codeword c ∈ C is uniquely determined by its
restriction c|S to the complement S = [n] \ E of E.

Theorem 15. Let C be an [n, k] linear code and let G, H of size (k × n)


and (n − k × n) respectively, be a generator and p-c matrix for C. Let
E be an erasure pattern and let S = [n] \ E denote its complement. Let
G|S and H|E denote the restrictions of G and H to the index sets S
and E respectively. Then

(a) E is a recoverable erasure pattern iff rank(G|S ) = k,

(b) E is a recoverable erasure pattern iff rank(H|E ) = |E|.

Proof: Follows from the encoding and p-c equations given by

uT G = cT ,
Hc = 0,

where u, c denote the message and code vector respectively. □


The set of recoverable erasure patterns of a block code C can be
partially ordered by inclusion. Recoverable erasure patterns that are
maximal with respect to this partial ordering, will be termed as maxi-
mal recoverable erasure patterns. Clearly, knowledge of the maximal
recoverable erasure patterns from which C can recover, characterizes the
set of all erasure patterns from which the code can recover. In the case
of an [n, k] linear block code over a finite field Fq , by Theorem 15 above,
recoverable erasure patterns are in 1-1 correspondence with subsets
of the columns of the p-c matrix H that form linearly independent
sets. Since the p-c matrix has rank (n − k) it follows that all maximal
recoverable erasure patterns are of size (n − k).

14.1.1 Excluded Erasure Patterns


Let C be an [n, k] subcode of an [n, kL ] code CL , with k < kL . By
the discussion above, the subcode C cannot recover from any erasure
pattern of size > (n − k). There are certain other erasure patterns that
C cannot possibly recover from, simply by virtue of being a subcode
of the parent code CL . We will refer to these latter erasure patterns as
excluded erasure patterns. In the theorem below, we characterize these
742 Maximally Recoverable Codes

in two different ways, from the perspective of the generator and p-c
matrices of the code C L .
Theorem 16 (Excluded Erasure Patterns). Let C L be a linear [n, kL ]
code over a finite field Fq , having generator matrix GL of size (kL × n)
and p-c matrix HL of size (n − kL × n). Let C be a subcode of C L of
dimension k < kL . Then it is not possible for C to recover from an
erasure pattern E of size |E| ≤ (n − k) if either of the following two
equivalent conditions are satisfied:
(a) rank(GL |S ) < k, where S = [n] \ E,

(b) rank(HL |E ) < |E| − (kL − k).


Proof: We assume without loss of generality, that E = {1, 2, · · · , |E|},
i.e., that the first |E| code symbols have been erased. The proof of (a)
is straightforward. To see that (b) is equivalent to (a), we begin by
partitioning GL as below:
GL = [GL |E GL |S ].
We will first assume (a) and show that (a) implies (b). Since |S| =
n − |E| ≥ k and rank(GL |S ) < k, it follows that we can replace the p-c

matrix HL by a row-reduced version HL that takes on the following form
′ ′ ′
HL = [HL |E HL |S ]
" #
A B
= ,
[0] D
with rank(D) = |S| − rank(GL |S ) > |S| − k ≥ 0, and where the rows
of A are linearly independent. It follows that

rank(HL |E ) = (n − kL ) − rank(D) < (n − kL ) − (|S| − k)
= |E| − (kL − k).

One can reverse the arguments above to show that rank(HL |E ) <
|E| − (kL − k) implies rank(GL |S ) < k. We thus obtain
rank(GL |S ) < k iff rank(HL |E ) < |E| − (kL − k),

where we have used the fact that rank(HL |E ) = rank(HL |E ). □
In the corollary below, we single out the case when an erasure
pattern has maximal size (n − k).
14.2. Defining Maximally Recoverable Codes 743

Corollary 3. In the setting of Theorem 16 above, it is not possible for


C to recover from an erasure pattern E of size |E| = (n − k) if either of
the following two equivalent conditions are satisfied:

(a) rank(GL |S ) < k, where S = [n] \ E,

(b) rank(HL |E ) < (n − kL ).

14.2 Defining Maximally Recoverable Codes

Motivated by the theorem and corollary above, we make the following


definition:

Definition 16 (Maximally Recoverable Codes). Let C L be a linear [n, kL ]


code over a finite field Fq , having generator matrix GL of size (kL × n)
and p-c matrix HL of size (n − kL × n). Let C be a subcode of C L of
dimension k < kL . We define an erasure pattern E to be an excluded
erasure pattern (EEP) for C if E is such that the equivalent conditions
(a) and (b) of Theorem 16 above are satisfied. If further, E is of size
|E| = (n − k), we will say that E is a maximal EEP (m-EEP). We will
say that C is an MRC (with respect to parent code C L ) iff C is able to
recover from any erasure pattern of size ≤ (n − k) that is not an EEP.

Remark 14 (Observations on MRCs). We make the following observations


concerning an MRC:

1. An MRC is maximal with respect to the property of recoverability


from erasure patterns.

2. As noted in the discussion above, knowledge of the maximal recov-


erable erasure patterns from which C can recover, characterizes
the set of all erasure patterns from which the code can recover. It
follows that we may also define an MRC as follows: C is an MRC
(with respect to parent code C L ) iff C is able to recover from any
erasure pattern of size equal to (n − k) that is not an m-EEP.

3. An MDS code can recover the entire codeword c given access to


any restriction c|S where S is of size k. Analogously, by Corollary 3
above, an MRC can recover the entire codeword c given access to
744 Maximally Recoverable Codes

any restriction c|S where S is of size k and where in addition, S


is such that rank(GL |S ) = k. In this sense, an [n, k] MRC is “as
MDS as possible” (given that it is a subcode of a parent code C L
having generator matrix GL ).

The corollary below presents a test for an MRC that follows from
Theorem 15, Corollary 3, Definition 16 and Remark 14.

Corollary 4 (Test for MRCs). In the setting of Theorem 16 above, an


[n, k] code C is an MRC with respect to [n, kL ] code C L iff C has the
ability to recover from any erasure pattern E of size |E| = (n − k) that
satisfies either of the equivalent conditions:

(a) rank(HL |E ) = (n − kL ),

(b) rank(GL |S ) = k, where S = [n] \ E.

Consequently, a subcode C of C L is an MRC with respect to C L iff C


has a p-c matrix H satisfying:

rank(H|E ) = (n − k) whenever rank(HL |E ) = (n − kL ).

The definition of an MRC does not make it clear whether or not


MRCs exist for any given choice of [n, kL ] parent code CL and desired
subcode dimension k. We answer this in the affirmative in this section.
We begin with an example.

Example 14.1 (An Example MRC). Let C L be the parent [n = 15, kL =


12] code satisfying the locality constraints associated to p-c matrix

1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
 

HL =  0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 .
 

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1

Let k = 10, so that it is desired to construct a [15, 10] subcode C of


C L that is an MRC. Let {θi }15 i=1 be a set of 15 distinct elements in the
finite field Fq , partitioned into the three sets:

Am = {θ5m+i | i = 1, 2, 3, 4, 5}, m = 0, 1, 2.
14.3. Existence of MRCs 745

Let the {θi } be further chosen so that

θi + θj ̸= θk + θl ,

for any two pairs (θi ∈ Au , θj ∈ Au ), (θk ∈ Av , θl ∈ Av ), with u ̸= v.


It is possible to identify such a set of 15 elements for example, in the
finite field Fq with q = 28 . It turns out that under these conditions, the
subcode C defined by the augmented p-c matrix
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
 

 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
 

=  0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
 
H 


 θ1 θ2 θ3 θ4 θ5 θ6 θ7 θ8 θ9 θ10 θ11 θ12 θ13 θ14 θ15
 

θ12 θ22 θ32 θ42 θ52 θ62 θ72 θ82 θ92 θ10
2 2
θ11 2
θ12 2
θ13 2
θ14 2
θ15

is an MRC. This can be viewed as an instance of constructing an MRC


over a field extension, (see Remark 15 below), if the parent code C L ,
prior to field extension, is interpreted as a code over F2 .

14.3 Existence of MRCs

Theorem 17. Let C L be an [n, k] code over Fq . Let the field size q
n−1 
satisfy q > n−k−1 . Then there exists an [n, k] subcode code C ⊆ CL
over Fq which is an MRC with respect to C L .

Proof. Let H be the p-c matrix of an [n, k] code C ⊆ CL of the form:


" #
HL
H= ,
HGlob

where HL is an (n − kL × n) p-c matrix for C L and where HGlob is


a kL − k × n matrix. Let xij denote the entry in the ith row and


jth column of HGlob . Let E be an erasure pattern of size (n − k). The


restriction of H to E is of the form:
" #
HL |E
H|E = .
HGlob |E

Let the erasure pattern E also satisfy the property that rank(HL |E ) =
(n−kL ). The theorem will then follow from Corollary 4 if we can establish
that rank(H|E ) = (n − k) for every such erasure pattern. For any given
746 Maximally Recoverable Codes

fixed erasure pattern E of size (n − k) satisfying rank(HL |E ) = (n − kL ),


H|E can be made to have rank (n − k) by suitably assigning values to
the variables {xij }. This follows since clearly, there exists a choice of
values for the variables {xij } that ensures that the (|E| × |E|) matrix
H|E is of full rank as one can always extend the linearly independent
|E|
rows of HL |E to a basis for Fq . It follows that H|E is an (|E| × |E|)
square matrix whose determinant pE ({xij }) = det(H|E ) is a non-zero
polynomial in the variables {xij }. Note that pE ({xij }) is a polynomial
that is of degree ≤ 1 in each of the variables {xij }.
Our aim is to identify an assignment of values to {xij } such that
the determinants pE ({xij }) evaluate to a non-zero value for all E ⊆ [n]
of size (n − k) such that rank(HL |E ) = n − kL . Towards this, we form
the product polynomial:

P ({xij }) = pE ({xij }).


Y

{E: E⊆[n], |E|=(n−k), rank(HL |E ) = n − kL }

Clearly, by our argument above, each of the constituent polynomials


in this product is a nonzero polynomial. The next step is to identify
a set of values to the variables {xij } so that the product polynomial
P ({xij }) is non-zero. We note that the product polynomial P ({xij }) is
a polynomial in the variables {xij } such that the degree of P ({xij }) in
n−1 
any of the individual variables is ≤ n−k−1 . This is because there are at
n−1 
most n−k−1 subsets E of size |E| = (n − k) such that H|E contains a
given variable xij . By the Combinatorial Nullstellensatz theorem [5], an
assignment of values for {xij } can always be found if q > n−k−1n−1 
.

Remark 15 (Necessity of Field Extension). It can happen that the locality


constraints represented by the p-c matrix HL are over a field Fb , so that
C L is an [n, kL ] code over the field Fb , but that a maximally-recoverable
subcode having dimension k over the very same field Fb either does not
exist or else, is hard to identify. It is possible in such situations, to pass
on to an extension field Fq , q = bm of Fb with m > 1, such that if one
now replaces C L with the code Ĉ L over the extension field Fq defined
by the same p-c matrix HL , a maximally-recoverable subcode C of Ĉ L
over the field Fq can be found.
14.4. MRCs Constructed using Linearized Polynomials 747

14.4 MRCs Constructed using Linearized Polynomials

We now present a method of explicitly identifying the variables {xij }


using linearized polynomials, in place of calling upon the Combinatorial
Nullstellensatz. We illustrate for the case when the p-c matrix HL and
the code C L are both defined over a finite field Fb having characteristic
2 and of size b = 2ℓ , but this approach extends to other characteristic
as well. We claim that the (kL − k) × n matrix HGlob appearing in the
proof of Theorem 17 can be selected to be of the form:

HGlob = Lp (α1 , . . . , αn , kL − k)
 
α1 α2 ... αn
ℓ ℓ ℓ
 α12 α22 ... αn2 
 (14.1)
 
≜  .. .. .. ..
. . . .
 
 
ℓ(kL −k−1) ℓ(kL −k−1) ℓ(kL −k−1)
α12 α22 . . . αn2

where {α1 , α2 , . . . , αn } are a set of n elements that are drawn from a


degree-n field extension Fq , q = bn , of the field Fb , that are linearly
independent over Fb . Thus, we are in need of field extension here as
described in Remark 15 above. Thus under this construction technique,
the required field size is exponential in the block length n.
We now proceed to explain the claim made above, namely that the
submatrix HGlob providing global parity, can be taken to have the form
in (14.1). Let E be an erasure pattern of size |E| = (n − k) such that
rank(HL |E ) = (n − kL ). Without loss of generality, we assume that
E = {1, 2, . . . , (n − k)}. We have:

HGlob |E = Lp (α1 , . . . , α|E| , kL − k)


 
α1 α2 ... α|E|
ℓ ℓ 2ℓ
α12 α22 ... α|E|
 
 
= .. .. .. ..
 
. . . .
 
 
 
ℓ(kL −k−1) ℓ(kL −k−1) ℓ(kL −k−1)
α12 α22 2
. . . α|E|

and after row reduction of HL |E , H|E can be written as,


748 Maximally Recoverable Codes

" #
HL |E
H|E =
HGlob |E
1 0 0
 
... a1,1 ... a1,|E|−m
0 1 ... 0 a2,1 ... a2,|E|−m
 
 

.. .. .. .. .. .. .. 
. . . . . . .
 
 
0 0 1
 

 ... am,1 ... am,|E|−m 

=
 
 

2ℓ 

 α1 α2 ... αm αm+1 ... α|E| 

ℓ ℓ 2ℓ 2ℓ ℓ
α12 α22 ... αm αm+1 ... 2
α|E|
 
 
.. .. .. .. .. .. ..
 
. . . . . . .
 
 
 
ℓ(kL −k−1) ℓ(kL −k−1) 2ℓ(kL −k−1) ℓ(k −k−1) 2ℓ(kL −k−1)
α12 α22 . . . αm 2 L
αm+1 . . . α|E|

where ai,j ∈ F2ℓ and where we have set m = (n − kL ). If we prove that


H|E is full rank then the above choice of HGlob yields an MRC from
Corollary 4.
Suppose H|E has rank < |E|, then there is a non-zero vector v =
[v1 , ..., v|E| ] in the left null space of H|E . Hence vH|E = 0 and this
implies that for 1 ≤ i ≤ (n − kL ),
−k−1
kLX
ℓj
vi = vj+m+1 αi2 = f (αi )
j=0

where we have defined f (·) to be the linearized polynomial given by


−k−1
kLX
ℓj
f (x) = vj+m+1 x2 .
j=0

Then for n − kL + 1 ≤ i ≤ n − k,
n−k
XL n−k
XL
f (αi ) = aji vj = aji f (αj ).
j=1 j=1

Since aji ∈ F2ℓ , by the properties of linearized polynomials we have,


 
n−k
XL
f αi + aji αj  = 0, n − kL + 1 ≤ i ≤ n − k.
j=1

Since α1 , . . . , α|E| are linearly independent over F2ℓ , this implies that
f (x) has at least 2ℓ(kL −k) zeros, since any linear combination of the
14.5. Reduced Field-Size Construction for the Disjoint Locality Case 749

n−kL
set {αi + j=1 aji αj : n − kL + 1 ≤ i ≤ (n − k)} of (kL − k) linearly
P

independent elements over F2ℓ is also a zero of f (x). There are 2ℓ(k−kL )
such linear combinations. However, the degree of f (x) is ≤ 2ℓ(k−kL −1) .
It follows that f (x) ≡ 0 i.e., f (x) must be the all-zero polynomial. This
implies that the vector v = 0, a contradiction. Hence H|E is full rank
and this choice of HGlob yields an MRC.
The above described construction has large field size primarily
because of the choice of HGlob that contains n elements {α1 , . . . , αn }
that are linearly independent over F2ℓ . It turns out that for the same
choice of HGlob , it is possible to reduce the field size for a special case of
HL by selecting the set {α1 , . . . , αn } more intelligently. This is described
below.

14.5 Reduced Field-Size Construction for the Disjoint Locality Case

We present in this subsection, a construction due to Gopalan et al. [71],


and along the lines of the linearized polynomial construction appearing
in the previous subsection, for a specific case of all-symbol locality.
The reduced field size comes about through an intelligent choice of the
{αi }ni=1 , lying in an extension field Fq of the ground field Fb of suitable
size, that does not require all of them to be linearly independent. We
retain the notation of the previous subsection.
We assume in this construction that (r + 1)|n. More formally, let CL
be an [n, kL ], (r, δ = 2) LRC with all-symbol locality, having disjoint
repair sets and p-c matrix given by:

1r+1 0 0
 
...
 0 1r+1 . . . 0
 
HL =  (14.2)

.. ,
 0 0 . 0
 

0 0 . . . 1r+1
where 1r+1 is a row vector of length r + 1 with all components equal to
1. Note that the matrix HL is a binary matrix, so that in reference to
the previous subsection, we have that the ground field here is Fb = F2 .
750 Maximally Recoverable Codes

Let the matrix HGlob identifying global parity-checks be given by


HGlob = Lp (α1 , . . . , αn , kL − k). Set
n
m = = (n − kL ),
r+1
Sg = {(r + 1)(g − 1) + 1, . . . , (r + 1)g}, for 1 ≤ g ≤ m.

It follows that (n − k) = m + (kL − k). From Corollary 4, it follows that


in determining whether or not a code is an MRC, it suffices to focus
attention on those erasure patterns of size (n − k) that are such that
rank(HL |E ) = n − kL = m.
From equation (14.2), it can be seen that

rank(HL |E ) = |{ j : |Sj ∩ E| > 0, j ∈ [m] }|,

for any E ⊆ [n]. It follows that rank(HL |E ) = n − kL = m iff we choose


the erasure pattern E such that there is at least one erasure within
the support of each local code. Accordingly, we restrict our attention
from here on to erasure patterns of size (n − k) that have at least one
element in the support Sg of each local code.
We now show how it is possible to select the {αi } in such a way that
H|E is invertible for any such E, i.e., has rank (n − k). This will ensure
that the code having H as p-c matrix is an MRC. Hence without loss of
generality, we can assume that the erasure pattern E has the structure
shown below for some pairs {(i1 , j1 ), . . . , (im , jm )} satisfying

[ig , jg ] ⊆ Sg , g ∈ [m],
i1 ≤ j1 , i2 ≤ j2 , · · · , im ≤ jm ,
E = [i1 , j1 ] ∪ · · · ∪ [im , jm ],
(j1 − i1 + 1) + · · · + (jm − im + 1) = |E| = m + kL − k.
14.5. Reduced Field-Size Construction for the Disjoint Locality Case 751

The restriction H|E of H to E takes on the form:


" #
HL |E
H|E =
HGlob |E
1 1 0 0 0 0
 
... ... ... ...

 0 ... 0 1 ... 1 ... 0 ... 0 

 .. .. .. .. .. .. .. .. .. .. 
. . . . . . . . . .
 
 
0 0 0 0 1 1
 
 ... ... ... ... 
=
 
.
 

 

 α i1 ... α j1 α i2 ... αj2 ... α im ... α jm 

 αi21 ... αj21 αi22 ... αj22 ... αi2m ... αj2m 
.. .. .. .. .. .. .. .. .. ..
 
 
. . . . . . . . . .
 
 
(kL −k−1) (kL −k−1) (kL −k−1) (kL −k−1) (k −k−1) (k −k−1)
αi21 . . . αj21 αi22 . . . αj22 . . . αi2m L . . . αj2m L

Note that for some values of u ∈ [m], it could be that iu = ju . It follows


that for some 0 < ℓ ≤ m, we can without loss of generality, write:

ju − iu > 0, ∀u ∈ [ℓ],
ju − iu = 0, ∀u ∈ [m] \ [ℓ],

(ju − iu ) = kL − k.
X

u=1

The last equation implies that ℓ ≤ kL − k, a fact we will make use


of below. Next, we perform some column operations on H|E . We add
the column iu to each of the columns (iu + 1, . . . , ju ), for u = 1, . . . , ℓ.
Clearly this column operation does not change the rank of H|E . The
resulting matrix after column operations is given by:
" #

HL,E
HE′ = ′
Hglob,E

where

Hglob,E = Lp (αi1 , (αi1 + αi1 +1 ), . . . , (αi1 + αj1 ),
αi2 , (αi2 + αi2 +1 ), . . . , (αi2 + αj2 ),
. . . , αiℓ , (αiℓ + αiℓ +1 ), . . . , (αiℓ + αjℓ ),
αiℓ+1 , αiℓ+2 , . . . , αim , kL − k)
752 Maximally Recoverable Codes

and

1 0 ... 0 0 0 ... 0 ...
0 0 0 0 0 ...
0 0

0 0 ... 0 1 0 ... 0 ...


0 0 0 0 0 ...
0 0 
 

.. .. .. .. .. .. .. .. ..
.. .. .. .. .. ..
.. .. 
 
. . . . . . . . . . . . . . .
. . 


 

 0 0 ... 0 0 0 ... 0 ...
0 0 0 1 0 ...
0 0 
HL,E = 
.

0 0 ... 0 0 0 ... 0 ...
0 0 0 ...

 
0 0 ... 0 0 0 ... 0 ...
0 0 0 ...
 
 
.. .. .. .. .. .. .. .. ..
.. .. .. ..
 
 

 . . . . . . . . . . . . .
Im−ℓ 

0 0 ... 0 0 0 ... 0 ... 0 0 ... 0

Clearly, the rank of above matrix HE′ is m + rank(J) where, J is the


square ((kL − k) × (kL − k)) matrix given by

J = Lp ((αi1 + αi1 +1 ), . . . , (αi1 + αj1 ), (αi2 + αi2 +1 ), . . . , (αi2 + αj2 ),


. . . , (αiℓ + αiℓ +1 ), . . . , (αiℓ + αjℓ ), kL − k).

For HE′ to be of full rank, we need that J have full rank kL − k. Let

T ≜ {(αi1 + αi1 +1 ), . . . , (αi1 + αj1 ), (αi2 + αi2 +1 ), . . . , (αi2 + αj2 ),


. . . , (αiℓ + αiℓ +1 ), . . . , (αiℓ + αjℓ )}.

It follows from the definition in (14.1) that it suffices to select the


{αi }ni=1 in such a way that the (kL − k) elements in T are linearly
independent over F2 . To achieve this, we define for 1 ≤ i ≤ m,

{α(r+1)(i−1)+1 , . . . , α(r+1)i } = λi {ζ1 , . . . , ζr+1 }


:= {λi ζ1 , . . . , λi ζr+1 },

where the scale factors

{λ1 , λ2 , . . . , λm } ⊆ Fq

are chosen to be a set of m elements from Fq such that any subset


of size (kL − k) is a linearly independent set over F2r , where F2r is a
subfield of the code symbol alphabet Fq and where further,

{ζ1 , . . . , ζr } ⊆ F2r

are a set of linearly independent elements over F2 and where we set


ζr+1 = 0. Thus we must have that r divides the degree [Fq : F2 ] of
14.5. Reduced Field-Size Construction for the Disjoint Locality Case 753

the extension Fq /F2 , i.e., r | [Fq : F2 ]. The precise value of q will be


indicated at a later point in the proof. We will now show that the
elements in T are linearly independent over F2 . Let us assume to the
contrary, that the elements in T are linearly dependent. This would
imply that for some choice of coefficients {bus ∈ F2 } with at least one
of them being non-zero, we have
−iu
ℓ juX ℓ −iu
juX
bus (αiu + αiu +s ) = bus (ζ1 + ζs+1 ) = 0.
X X
λu
u=1 s=1 u=1 s=1
Since
−iu
juX
bus (ζ1 + ζs+1 ) ∈ F2r
s=1

and λ1 , . . . , λℓ are linear independent over F2r as 0 < ℓ ≤ (kL − k), it


must be that:
−iu
juX
bus (ζ1 + ζs+1 ) = 0,
s=1

for all u. However, since 0 < (ju − iu ) ≤ r for u ∈ [ℓ], and ζ1 , . . . , ζr


are linearly independent over F2 , this is possible only if bus = 0, for all
1 ≤ u ≤ ℓ and 1 ≤ s ≤ ju − iu , which contradicts the linear-dependence
assumption. Hence for the choice of
{α(r+1)(i−1)+1 , . . . , α(r+1)i } = λi {ζ1 , . . . , ζr+1 },
we have that the code constructed is maximally recoverable. We now
present an explicit choice for λi . We choose the {λi } to be drawn from
a finite field of size
q = 2g(kL −k) ,
where the integer g is chosen to satisfy r | g and m ≤ 2g . We present
the choice of {λi } in the form of a vector with (kL − k) components
over the field F2g :
(kL −k)−1 T
λi := [1, βi , βi2 , , . . . , , βi ]
where β1 , β2 , . . . , βm are m distinct elements from F2g . Now {ζ1 , . . . , ζr }
are any set of r elements in F2r ⊆ F2g which are linearly independent over
F2 . Hence the required field size for this choice of αi ’s is q = 2g(kL −k) .
754 Maximally Recoverable Codes

Notes

1. MRC constructions for the case of uniform, disjoint locality: the


discussion here, pertains to a more general definition of an MRC,
corresponding to the setting where the parent code is only partially
specified as described below.
Let (r+δ −1) | n and let CL be an [n, kL ], (r, δ) LRC over the finite
field Fq having all-symbol locality, in which the n code symbols
are divided into disjoint groups containing (r +δ −1) code symbols
each. The locality constraint in this case, is that each local code
must be an [r + δ − 1, r] MDS code. There is freedom however, in
selecting the particular MDS code employed. The goal as in the
basic setting, is to construct an [n, k] subcode C of C L that is an
MRC, i.e., a subcode C that can recover from any erasure pattern
that it is possible to do so under the given constraints.
Such codes were introduced in [27] under the name of partial-MDS
(P-MDS) codes. Field extensions are permitted meaning that it is
allowed to identify a subcode C over an extension field Fqe of Fq ,
see Remark 15. An important goal in this thread of research, is
to identify the smallest size of extension field Fqe for which it is
possible to find an MRC. Some papers in the literature, assume
a specific p-c matrix for the parent code C L in which case, the
setting reverts to the basic setting addressed in this section.
Reduced field-size constructions of MRCs for general (r, δ) can be
found in [35], [36], [70], [78], [83], [158], [159], [232], apart from the
existential result appearing in Theorem 17. These constructions
yield upper bounds on the required field size. Lower bounds on
the field size required to construct an MRC in this setting, are
presented in [79], [80]. The explicit construction described in
Section 14.5 based on [71], corresponds to the case δ = 2. Apart
from the constructions in [35], [36], [70], [78], [83], [158], [159], [232],
additional constructions of MRCs, including constructions for the
case δ = 2, can be found in [20], [27], [96], [147]. Construction of
MRCs corresponding to specific values of parameters r, h, where
h := (kL − k) represents the number of global parities, can be
14.5. Reduced Field-Size Construction for the Disjoint Locality Case 755

found in [15], [24], [28], [29], [39]. Weight enumerators, GHWs


and higher support weights for MRCs with δ = 2 can be found in
[135].

2. MRCs for the case when the locality constraints correspond to a


grid-like topology: In [73], the authors initiated the study of MRCs
under a grid-like topology. Under the grid-like topology framework,
each codeword of a code C over a field F of characteristic 2 is
expressed in the form of an (m × n) matrix. Each row satisfies
parity-checks imposed by an (a × n) p-c matrix Ha ; each column
satisfies parity-checks imposed by a second (b × m) p-c matrix
Hb and a third (h × mn) matrix Hgl imposes global parities. An
erasure pattern E is said to be recoverable, if there is a code
(i.e., a set of matrices Ha , Hb , Hgl , all with entries drawn from F)
which is capable of recovering from E. An MRC is then a code
that can recover from all possible recoverable erasure patterns.
In [73], the authors derive a super-polynomial lower bound on
the size of the finite field F required to construct an MRC with
respect to the grid-like topology. They also derive a necessary
and sufficient condition, termed as the regularity condition, for an
erasure pattern to be recoverable for the case when (a = 1, h = 0)
and arbitrary b. In [217], the authors extend the study to consider
a ∈ {1, 2}, h = 0, and arbitrary b, and characterize a subset of
recoverable erasure patterns using an alternate proof technique.

There are many open problems on the topic of MRCs that one could
list. A basic open problem is listed below.
Open Problem 18. Determine the minimum field size required to
construct an MRC with respect to parent code CL having disjoint-
locality specified by the p-c matrix HL given in (14.2).
More generally, one can raise the same question as above with
the single-parity-check local codes associated to (14.2) replaced by
MDS codes constructed say, using Vandermonde matrices. One could
generalize this further by leaving the p-c matrices of the individual MDS
codes unspecified. One could also ask similar questions with respect to
other topologies.
15
Codes with Combined Locality and Regeneration

In the previous sections, we have seen that RGCs minimize the repair
bandwidth, whereas LRCs have low repair degree. A natural question
to ask is, do there exist codes that simultaneously have low repair
bandwidth as well as low repair degree? Working independently, Kamath
et al. [117], [134] and Rawat et al. [189], [220] arrived at the same class
of codes that answered this question in the affirmative. These codes
have the property that the local codes are RGCs and for this reason, are
termed as locally regenerating codes (LRGCs). It follows that LRGCs
share the same vector symbol alphabet Fαq as RGCs.
In this section, we present a minimum distance bound that applies
to LRGCs. We also describe in brief, constructions that achieve the
bound. We follow the approach adopted in [117].

15.1 Locality of a Code with Vector Alphabet

In this section, we will refer to a linear code having a vector alphabet


as a vector code.

Definition 17 (Vector Codes). A vector code C over Fαq , is a linear


code, i.e., a code that is closed under vector addition and multiplication

756
15.1. Locality of a Code with Vector Alphabet 757

by scalars from Fq , and where each codeword c ∈ C is of the form


c = (c0 c1 . . . cn−1 ), with ci ∈ Fαq , for all i.
Clearly, every linear RGC is an example of a vector code. Note that
every codeword can also be viewed as an (α × n) array and for this
reason, such codes are also referred to as array codes [23], [26] (see also
the notes subsection of Section 2).
We associate with the vector code C, a scalar code Cs of length nα,
obtained from C by concatenating the code symbols ci of each codeword
(c0 c1 . . . cn−1 ) as shown below,
(cT0 cT1 . . . cTn−1 ),
to obtain a codeword in C s . We use K to denote the dimension of C s ,
i.e., the dimension of C s when viewed as a vector space over Fq .

Thin and Thick Columns


Let G = [G0 G1 . . . Gn−1 ] be a generator matrix for Cs , where each
sub-matrix Gi , is of size (K × α). We will refer to the ordered set of
α columns making up the ith sub-matrix Gi , as the ith thick column.
Each of the nα columns of G will be called a thin column. Thus, the
matrix G can be viewed as being comprised of nα thin columns; it can
also be viewed as being comprised of n thick columns, where each thick
column consists of α thin columns.
We will use [[n, K, dmin , α]] to denote the parameters of the vector
code C, where dmin is the minimum distance of C when viewed as a code
over the vector symbol alphabet Fαq . While the underlying finite field
is not identified within the notation, we will assume throughout, that
the underlying finite field is Fq . Given a subset S ⊆ [0, n − 1], we use
C|S to denote the restriction of C to the coordinates in S. We use G||S
to denote the (K × |S|α) matrix obtained by restricting G to the thick
columns associated to the coordinates in S. The double bars indicate
that the restriction is to a set of thick columns.

Locality
Definition 18. Let C be a vector code of block length n over Fαq . The ith
vector code symbol ci for 0 ≤ i ≤ (n − 1), is said to have (r, δ) locality,
758 Codes with Combined Locality and Regeneration

if there exists a subset Si ⊆ [0, n − 1] such that i ∈ Si , |Si | ≤ (r + δ − 1)


and the minimum distance of the code C|Si is greater than or equal to
δ. Any such code C|Si , will be referred to as a local code.

Definition 19 ((r, δ) Information-Symbol Locality). An [[n, K, dmin , α]]


vector code is said to have (r, δ) information-symbol locality if there
exists a subset I ⊆ [0, n − 1] such that:

• rank(G||I ) = K, and

• for any i ∈ I, the ith vector code symbol ci , has (r, δ) locality.

One can further extend the definition as follows. A vector code is


said to have (r, δ) all-symbol locality, if for all i ∈ [0, n − 1], the ith
vector symbol ci , has (r, δ) locality. Furthermore, if for a code having
(r, δ) all-symbol locality, the subsets {Si | i ∈ [0, n − 1]} are either
identical or else, disjoint, i.e., Si = Sj or else, |Si ∩ Sj | = 0, for i ̸= j,
0 ≤ i, j ≤ n − 1, then the vector code will be said to have all-symbol
disjoint locality. All the code constructions discussed in this section will
have the disjoint-locality property.

15.2 Codes with MSR/MBR Locality

Both MSR and MBR codes belong to a class of vector codes that we
term here as uniform rank accumulation (URA) codes.

15.2.1 Uniform Rank Accumulation Property


Definition 20 (Uniform Rank Accumulation Codes). An [[n, K, dmin , α]]
vector code having associated generator matrix G of size (K × nα) is
said to be an uniform rank accumulation code if there exists a sequence
of n non-negative integers {a1 , a2 , . . . , an }, referred to as the rank profile
of the code, having the following properties:

(i) a1 = α, and

(ii) rank(G||I ) = j=1 aj ,


Pi

for all I ⊆ [0, n − 1] such that |I| = i.


15.2. Codes with MSR/MBR Locality 759

The rank profile of an {(n, k, d), (α, β), K, Fq } MSR code is given
by (see for example, [211]):
(
α 1≤i≤k
ai = . (15.1)
0 (k + 1) ≤ i ≤ n

Note that
n
ai = kα = K,
X

i=1

as expected.
In the case of an {(n, k, d), (α, β), K, Fq } MBR code, the rank profile
is given by [211]:
(
α − (i − 1)β 1≤i≤k
ai = . (15.2)
0 (k + 1) ≤ i ≤ n

Once again, we see that


n
! !!
k k
= kα − β = β = K,
X
ai dk −
i=1
2 2

as expected.

MSR and MBR Locality


An LRGC with MSR (similarly, MBR) locality [117], [189] is an [[n, K,
dmin , α]] vector code with (r, δ) locality, where the local codes are MSR
(similarly, MBR) codes having identical parameters:

(nℓ , r, d), (α, β), Kℓ , Fq ,




where nℓ := (r + δ − 1) and Kℓ ≤ K. Note that the local codes will


have identical rank profiles as the parameters of either an MSR or an
MBR code determine its rank profile uniquely.
We now present the minimum distance upper bound appearing in
[117], for a class of vector codes having URA codes as local codes.
Minimum distance upper bounds for codes with MSR and MBR locality
follow from this bound.
760 Codes with Combined Locality and Regeneration

Minimum Distance Bound


We restrict our attention here to [[n, K, dmin , α]] vector codes C with
(r, δ) information-symbol locality, where the local codes are URA codes
having identical parameters [[nℓ = r + δ − 1, Kℓ , δ, α]] and identical rank
profile {a1 , a2 , . . . , anℓ }. It follows that the rank profile {a1 , a2 , . . . , anℓ }
of each local code has the property that ai = 0 for i ≥ (r + 1).
Next, let us construct the semi-infinite, periodic sequence b1 , b2 , b3 ,
. . ., where bi+jnℓ ≜ ai , for 1 ≤ i ≤ nℓ and j ≥ 0. For s ≥ 1, we set
s
P (s) = (15.3)
X
bi .
i=1

For y ≥ 1, set P (inv) (y) = x, where x is the smallest integer such that
P (x) ≥ y. The minimum distance of C is then upper bounded by the
following theorem (see Theorem 4.1 in [117]):

Theorem 18. Let C be an [[n, K, dmin , α]] code with (r, δ) information-
symbol locality, where the local codes are URA codes having identi-
cal [[nℓ = r + δ − 1, Kℓ , δ, α]] parameters and identical rank profile
{a1 , . . . , anℓ }. Then, we have:

dmin ≤ n − P (inv) (K) + 1. (15.4)

With respect to Theorem 18, we make the following definitions:

• A code satisfying (15.4) with equality is said to be minimum-


distance-optimal. Note that in this case, we will have K ≤ P (n −
dmin + 1).

• A code that is minimum-distance-optimal is said to be rate-optimal


if K = P (n − dmin + 1).

For a code with MSR locality, one can simplify (15.4) using (15.1) to
obtain ([117], [189]):
K K
    
dmin ≤ n− +1− − 1 (δ − 1).
α αr
15.2. Codes with MSR/MBR Locality 761

15.2.2 Constructions for Codes with MSR Locality


In [189], the authors present an explicit construction of minimum-
distance-optimal LRGCs with MSR all-symbol locality having parame-
ters

[[n = νnℓ , K ≤ νKℓ , dmin , α]].

Here, ν is an integer satisfying ν ≥ 2 and each local code is an MSR


code having common parameters {(nℓ = (r + δ − 1), r, d), (α, β), Kℓ =
rα}, where δ is the minimum distance of the local MSR code. The
construction requires a field-size that is exponential in n. In [117],
the authors establish the existence of a minimum-distance-optimal
LRGC with MSR all-symbol locality, having the reduced field-size
requirement nµ , with all parameters remaining the same, apart from


placing the additional constraint that K = µα for some integer µ such


that r ≤ µ ≤ νr.

15.2.3 Constructions for Codes with MBR Locality


Local Codes are MBR codes with General (nℓ , r, d) Parameters
Given an arbitrary MBR code CMBR having block length nℓ , an explicit
construction of a minimum-distance-optimal LRGC with MBR all-
symbol locality and block length n that is a multiple of nℓ is presented
in [116], in which each local code is the same MBR code, namely the
code CMBR . The construction makes use an approach based on pre-coding
using Gabidulin codes [49], [69] and has a field-size requirement that is
exponential in the block length n.
The product-matrix construction for MBR codes described in Sec-
tion 4.2, yields MBR codes for any parameter sets (nℓ , r, d) and requires
an O(nℓ ) field size. In [129], the authors present an explicit construction
of a minimum-distance-optimal LRGC having block length n and MBR
all-symbol locality, in which the local codes are identical and correspond
to a product-matrix MBR code of block length nℓ , with nℓ |n. This
construction makes use of the scalar Tamo-Barg all-symbol locality code
construction described in Section 10.7, and has only a linear field-size
requirement.
762 Codes with Combined Locality and Regeneration

Local Codes are Polygonal MBR Codes The polygonal MBR code
construction described in Section 4.1.1, yields MBR codes having pa-
rameter sets of the form
( !)
r
(nℓ , r, d = nℓ − 1), (α = nℓ − 1, β = 1), Kℓ = rα −
2

for any pair (nℓ , r) with nℓ > r ≥ 1. The construction makes use of
a scalar MDS code precoder and the resultant MBR code possesses
the RBT property. In [117], the authors present the construction of a
minimum-distance-optimal LRGC with MBR all-symbol locality where
the local MBR codes are polygonal MBR codes. Interestingly, this LRGC
construction may be regarded as replacing the scalar MDS precoder
appearing in the construction of the polygonal MBR code, with a scalar
all-symbol locality code having optimal minimum distance such as the
Tamo-Barg code (see Section 10.7). The resultant LRGC code has
parameters

[[n = νnℓ , K ≤ νKℓ , dmin , α = nℓ − 1]]

where ν ≥ 2, and the local codes are polygonal MBR codes having
parameters
( !)
r
(nℓ , r, d = nℓ − 1), (α = nℓ − 1, β = 1), Kℓ = rα − .
2

This explicit construction has an O(n2 ) field-size requirement.

An Example LRGC with Polygonal MBR Locality In the example,


we illustrate the above construction for the case ν = 3. The parameters
of the overall LRGC are given by:

[[n = 15, K = 20, dmin = 5, α = 4]]

and the local polygonal MBR codes have parameters given by

{(nℓ = 5, r = 3, d = 4), (α = 4, β = 1), Kℓ = 9}.

The construction proceeds as follows. We begin by using the pentagon


MBR construction to realize each local code (see Fig. 15.1). Each
15.2. Codes with MSR/MBR Locality 763

Local Code 1 Local Code 2 Local Code 3

1 2 3 4 5 6 7 8 9 P 1 2 3 4 5 6 7 8 9 P 1 2 3 4 5 6 7 8 9 P
Scalar All-Symbol Locality Code

Figure 15.1: The upper portion of the figure shows an example LRGC C having
parameters [[n = 15, K = 20, dmin = 5, α = 4]] that is optimal with respect to (15.4).
There are 3 disjoint
 local codes, each of which is a pentagon-MBR code having
parameter set (5, 3, 4), (4, 1), 9, F31 . The set of 30 scalar symbols shown in the
bottom portion of the figure form a [30, 20, 9] LRC with all-symbol locality (optimal
with respect to (10.1)), where each local code is a [10, 9, 2] MDS code. The 30 symbols
of the LRC are used to label the (3 × 10) = 30 edges of the 3 pentagons.

pentagon MBR code is made up of Nℓ := n2ℓ = 10 scalar symbols




over F31 . Owing to the data collection property, the contents of each
pentagon should be decodable from the contents of any r = 3 nodes.
This calls for the 10 scalar symbols populating the MBR code to form a
[10, 9, 2] MDS code. By concatenating three codewords of the [10, 9, 2]
MDS code, we can populate the three pentagons and we will in this way,
have satisfied the node repair and data collection properties required to
say that each local code is an MBR code having the desired parameters
given above. The periodic sequence {bi } in this case is given by

4, 3, 2, 0, 0, 4, 3, 2, 0, 0 , 4, 3, 2, 0, 0 .

| {z } | {z } | {z }
first period second period third period

The associated sum sequence {P (s)} is given by

P (1), P (2), · · · , P (15)




= 4, 7, 9, 9, 9, 13, 16, 18, 18, 18, 22, 25, 27, 27, 27 .




Since the desired minimum distance of the LRGC is dmin = 5, we have


that n − dmin + 1 = 15 − 5 + 1 = 11. Hence, equivalently, we should be
able to recover the data file of size K = 20 from the contents of any
764 Codes with Combined Locality and Regeneration

11 among the 15 nodes. If we are able to construct such a code, it will


be optimal with respect to the minimum distance since the minimum
distance bound (15.4) states that

dmin ≤ n − P (inv) (20) + 1 = 15 − 11 + 1 = 5.

Note that the number of distinct scalar symbols obtained by contacting


any set of 11 nodes is no smaller than (4+3+2+1+0+4+3+2+1+0+4) =
24. Let the set of 30 scalar symbols form a Tamo-Barg code of length
30 and dimension K = 20 that is comprised of three support-disjoint,
[10, 9, 2] scalar local codes. The symbols of each of these local codes,
populate the nodes associated to the three MBR codes as shown in
Fig. 15.1. The minimum distance of such a Tamo-Barg code (which
meets the bound (10.1)) is given by
  20  
dmin = (30 − 20 + 1) − − 1 (2 − 1) = 11 − 2 = 9.
9
Hence if one has access to any set of 30 − 9 + 1 = 22 scalar symbols, one
can recover all the data. On the other hand, we have access to 24 scalar
symbols. Thus the data file can be recovered by contacting any 11 nodes
and decoding the Tamo-Barg code. It follows that this construction is
optimal with respect to the minimum distance bound in Theorem 18.

Open Problem 19. Construct minimum-distance-optimal, linear field-


size LRGCs with MSR all-symbol locality, for general n, K, nℓ , r, d .

16
Repair of Reed-Solomon Codes

In this section we consider the repair of a Reed-Solomon (RS) code. We


will depart slightly from the notation employed thus far and assume
that the symbol alphabet of the code is a finite field F of size q t , i.e.
F = Fqt . We will use B=Fq to denote the subfield of F of size q, which
we will refer to as the base field. Thus F is a vector space over B of
dimension t.

16.1 Vectorization Approach

The conventional repair of a scalar [n, k, n − k + 1] MDS code over F


regards each code symbol lying in F as an indivisible unit, leading to
a total repair bandwidth of k times the amount of data stored in the
failed node, where k is the dimension of the code. RGCs, introduced in
Section 3, enable node repair with significantly reduced repair bandwidth.
This is made possible by the fact that RGCs have a vector symbol
alphabet of the form Fαq . This suggests that the repair bandwidth of an
RS code can perhaps be reduced by regarding an RS code over the field
F = Fqt , instead, as a code over the vector symbol alphabet Ftq . Since
code construction and repair schemes presented have all been linear in
nature, we will use the isomorphism of F = Fqt and Ftq as vector spaces
of dimension t over Fq in this re-interpretation of an RS code.
765
766 Repair of Reed-Solomon Codes

Remark 16. A vectorized RS code may be viewed as a special instance


of a concatenated code, where the outer code is the scalar [n, k, n−k +1]
RS code over F = Fqt , and where the inner code is the trivial [t, t, 1]
code over the base field B = Fq .

As an example of the savings in repair bandwidth that can be


achieved, consider the example of a [16, 8, 9] RS code over the field
F = F24 . Traditional repair requires the downloading of 8 symbols over
F = F24 , corresponding to a repair bandwidth of 32 bits. As we will
see later in this section, by regarding the same code instead, as a code
over the vector alphabet F42 , corresponding to the choice of base field
B = F2 , it is possible to perform single-node repair by downloading just
1 bit each from the 15 surviving nodes, for a total repair bandwidth of
15 bits.
In general, traditional repair bandwidth of an RS code equals kt
symbols over the base field B. On the other hand, the repair bandwidth
of an MSR code with d = n − 1 and sub-packetization level α = t is
given by:
(n − 1)t
dβ = , (16.1)
n−k
which is in general significantly smaller. In this comparison, we have
chosen the RGC to be an MSR code to match the MDS property of an
RS code.
Since the field size of an RS code is typically on the order of its length
n, it follows that the vector-code-symbol viewpoint corresponds to a sub-
packetization level that is logarithmic in the length n. On the other hand,
there are bounds (see notes subsection) showing that an exponential
sub-packetization level is needed to achieve the repair bandwidth of
an MSR code indicated in (16.1). Thus a study of the minimum repair
bandwidth needed to repair an RS code is not only of practical interest,
it also provides some insight into how repair bandwidth scales with
sub-packetization level.
16.2. Tools Employed 767

16.2 Tools Employed

GRS Codes We recall from Section 2 that an [n, k] GRS code C over
F is a code of the form:

C = { (u1 f (α1 ), · · · , un f (αn )) | f (x) ∈ F[x], deg(f ) < k } ,

where the evaluation set E = {αi | 1 ≤ i ≤ n}, is a subset of F of size n,


and where the {ui }ni=1 are a set of n nonzero elements, not necessarily
distinct, over F. We will refer to the set {ui } as the scaling set. An RS
code is a GRS code where no scaling takes place, i.e., ui = 1, for all i.
We will refer to an RS code where the evaluation set E is all of F as a
full RS code. Thus the block length n of a full RS code equals the field
size |F|.
As explained in Section 2 the dual C ⊥ of a GRS code C is also a
GRS code, having the same evaluation set E. The scaling elements
(v1 , · · · , vn ) of the dual code C ⊥ are given by
n
vi = u−1 (αi − αj )−1 .
Y
i
j=1,j̸=i

It may be noted that the dual of a full RS code is a full RS code.

16.2.1 Trace-Dual Basis


The trace function TrF/B is the mapping from the extension field F = Fqt
to the base field B = Fq given by
2 t−1
TrF/B (x) = x + xq + xq + · · · + xq .

It is straightforward to verify that the trace function is linear over B.


Let (γ1 , · · · , γt ) be a basis for F over B and let x be an element of F.
Then x can be uniquely recovered from knowledge of the t trace values:

TrF/B (xγi ) = ai ∈ B, i = 1, 2, · · · , t.

This can be seen using the trace-dual basis. Associated to every basis,
(γ1 , · · · , γt ) for F over B, there is a second basis, (γ1∗ , · · · , γt∗ ) for F over
768 Repair of Reed-Solomon Codes

B, known as the trace-dual basis (for instance, see [156, Ch. 4]) that
satisfies:

1, i = j,
TrF/B (γi γj∗ ) =
0, else.

Using the trace-dual basis, we can recover x from {ai }ti=1 via:
t
x = ai γi∗ .
X

i=1

16.2.2 Repair Polynomials


We continue in the setting as above, where the field F = Fqt of definition
of the RS code is a degree-t extension of a base field B = Fq . We will
show in this subsection, that if one can find a set of t polynomials in F[x]
of degree < n−k satisfying certain properties, then node repair in a GRS
code with smaller bandwidth is possible. We term these polynomials as
repair polynomials. The lemma below identifies the desired properties
of the set of t repair polynomials and also shows how they can be used
in node repair.
Given a set of m elements {wi }m i=1 , with all wi ∈ F, we will use
w1 , · · · , wm to denote the vector space over the base field B spanned
by the m elements.

Lemma 15 (Recovery Using a Given Set of Repair Polynomials [85]).


Let C be an [n, k] GRS code having evaluation set E = {α1 , · · · , αn }
and scaling set {ui }ni=1 . Thus each codeword in C is of the form
(u1 f (α1 ), · · · , un f (αn )) for some polynomial f over F of degree < k.
Suppose it is possible to identify a set {g1 (x), · · · , gt (x)} of t polynomials
over F, each of degree < (n − k) that satisfy:

t i = i0
 
dimB g1 (αi ), g2 (αi ), · · · , gt (αi ) = (16.2)
bi i ̸= i0 .

Then the i0 th code symbol ui0 f (αi0 ) for i0 ∈ [n], can be recovered by
downloading b = bi symbols over the base field B.
P
i∈[n]\{i0 }
16.2. Tools Employed 769

Proof: Let the dual code of the [n, k] GRS code C associated to
evaluation set E, and scaling set u = (u1 , · · · , un ) be the [n, n − k] GRS
code having scaling set vector v = (v1 , · · · , vn ) (and the same evaluation
set E). Since each of the repair polynomials has degree < (n − k), for
every j ∈ [t] we have
(v1 gj (α1 ), · · · , vn gj (αn )) ∈ C ⊥ ,
n
=⇒ ui vi f (αi )gj (αi ) = 0,
X

i=1
=⇒ ui0 vi0 f (αi0 )gj (αi0 ) = − ui vi f (αi )gj (αi ).
X

i∈[n]\{i0 }

Applying the trace function on both sides:


TrF/B (ui0 vi0 f (αi0 )gj (αi0 )) = − TrF/B (ui vi f (αi )gj (αi )) .
X

i∈[n]\{i0 }
(16.3)
Note that since
 
dimB g1 (αi0 ), g2 (αi0 ), · · · , gt (αi0 ) = t,

it follows that the elements {gj (αi0 )}tj=1 form a basis for F over B. Hence,
the values of ui0 vi0 f (αi0 ) and hence of f (αi0 ), can be determined by
making use of (16.3), for all j ∈ {1, 2, · · · , t}. This approach requires
that the ith node, i ̸= i0 , provides the t values:
n o
TrF/B (ui vi f (αi )gj (αi )) | j = 1, 2, · · · , t . (16.4)
However since the space
Wi := g1 (αi ), g2 (αi ), · · · , gt (αi ) ,
has dimension bi over B, it follows that in place of t, bi scalars from
the base field B suffice to provide the information content contained
in (16.4). More specifically, if the set {θi1 , · · · , θibi } is a basis for Wi , it
suffices for the ith node to supply the bi symbols:
n o
TrF/B (ui vi f (αi )θij ) | j = 1, 2, · · · , bi .
It follows that node i0 can be repaired using the repair bandwidth
associated to a set of b = bi symbols over the base field B. □
P
i∈[n]\{i0 }
770 Repair of Reed-Solomon Codes

Remark 17. It turns out that the converse of Lemma 15 is also true.
Let C be an [n, k] GRS code and (u1 f (α1 ), · · · , un f (αn )) be a codeword
in C. Linear repair of the i0 th code symbol ui0 f (αi0 ) by downloading
bi symbols over the base field B from node i, for all i ∈ [n] \ {i0 }, is
possible only if there exists a set of t polynomials {g1 (x), · · · , gt (x)}
over F, each of degree < (n − k) satisfying (16.2). We refer the reader
to [85] for a proof.

16.3 Guruswami-Wootters Repair Scheme

In [85], the authors identify a set {gj (x) | j ∈ [t]}, of repair polynomials
leading to a repair scheme for an RS code having parameters (n ≤
q t , k ≤ n − q t−1 ) where remarkably, the repair bandwidth b = (n − 1),
measured in number of symbols over the base field B = Fq , is as small
as possible.

16.3.1 Repair Polynomials


Let {γ1 , · · · , γt } be the basis of F over B. The Guruswami-Wootters
(GW) scheme makes use of the following set of t repair polynomials for
the repair of node i0 :
TrF/B (γj (x − αi0 ))
gj (x) = , j ∈ [t]. (16.5)
(x − αi0 )
Note that as required, the degree of each repair polynomial gj (x) equals
(q t−1 − 1) < (n − k). The Lemma below will establish that in the GW
scheme, we have

bi = 1, 1 ≤ i ≤ n, i ̸= i0 ,

so that the repair bandwidth of the GW scheme is b = (n − 1) symbols


over B. Thus, it suffices for each of the d = (n − 1) helper nodes to pass
on just a single symbol over B.

Lemma 16.

t i = i0
 
dimB g1 (αi ), g2 (αi ), · · · , gt (αi ) =
1 i ̸= i0 .
16.4. Dau-Milenkovic Repair Scheme 771

t−1
Proof: Note that TrF/B (x) = x + xq + · · · + xq . It follows that
gj (αi0 ) = γj and therefore,
 
dimB g1 (αi0 ), · · · , gt (αi0 ) = t.

For i ∈ [n] \ {i0 } we clearly have:


1
 
g1 (αi ), · · · , gt (αi ) = as TrF/B (γj (αi − αi0 )) ∈ B.
α i − α i0
In this we have used that
TrF/B (γj (αi − αi0 ))
gj (αi ) = ̸= 0,
(αi − αi0 )
for at least one value of j, j ∈ [t]. It follows that

dimB g1 (αi ), · · · , gt (αi ) = 1.




16.4 Dau-Milenkovic Repair Scheme

The Dau-Milenkovic (DM) scheme [48] generalizes the GW scheme by


making it applicable to the following larger set of RS code parameters:

n ≤ qt, k ≤ n − qs ,


for s ∈ [t − 1]. The repair bandwidth b achieved by this scheme satisfies:


b ≤ (n − 1)(t − s) measured once again, in units of symbols over the
base field B. When (t − s) = 1, the DM scheme achieves the same repair
bandwidth as the GW scheme. The generalization is carried out by
replacing the trace function by the larger class of linearized polynomials.

16.4.1 Repair Polynomials


We begin by introducing linearized polynomials, these polynomials are
called subspace polynomials in [48].
772 Repair of Reed-Solomon Codes

Definition 21 (Linearized Polynomials). A (monic) linearized polynomial


over the field F := Fqt is a polynomial of the form
h
i
L(z) =
X
ℓi z q ,
i=0

where ℓi ∈ F, all i and ℓh = 1.


Linearized polynomials are so-called as they exhibit linear behavior
over the base field B = Fq :
L(cx + y) = cL(x) + L(y), c ∈ B, x, y ∈ F.
The zeros of L(z) form a subspace W over B of dimension h. Con-
versely, it can be shown that any polynomial over F whose zeros form a
subspace W over B is a linearized polynomial [156]. We will use LW (·) to
denote the linearized polynomial whose zeros are precisely the elements
of the subspace W . The trace function TrF/B encountered earlier, is a
linearized polynomial:
t−1
i
TrF/B (z) =
X
zq .
i=0

We now introduce a set of t repair polynomials that are based on


linearized polynomials. Let i0 ∈ [n] be the index of the node that we
wish to repair, so the aim is to recover f (αi0 ). Let W be a subspace
over B of dimension s. The t repair polynomials in the DM scheme are
defined as given below:
LW (γj (x − αi0 ))
gj (x) = = γj (γj (x − αi0 ) − w), (16.6)
Y
(x − αi0 ) w∈W \{0}

for all j ∈ [t], where {γ1 , γ2 , · · · , γt } is a basis for F over B. Note that
the deg(gj (x)) = q s − 1 < n − k, as needed. The lemma below shows
how this choice of repair polynomial set permits recovery of f (αi0 ) by
downloading ≤ (n − 1)(t − s) symbols from B.
Lemma 17.

t i = i0
 
dimB g1 (αi ), · · · , gt (αi ) =
≤ t − s i ̸= i0 ,
16.5. Bounds on Repair-Bandwidth 773

Proof: By the definition of repair polynomial in equation (16.6)


gj (αi0 ) = γj (−1)|W |−1
Y
w
w∈W \{0}

Clearly, w ̸= 0 and it follows from this that


Q
w∈W \{0}
 
dimB g1 (αi0 ), · · · , gt (αi0 ) = t,

as {γ1 , · · · , γt } is a basis for F over B. For the case i ∈ [n] \ {i0 }, we


have
LW (γj (αi − αi0 ))
gj (αi ) = .
(αi − αi0 )
As a result,
 
dimB g1 (αi ), · · · , gt (αi )
 
= dimB LW (γ1 (αi − αi0 )), · · · , LW (γt (αi − αi0 )) ≤ t − s.

The last inequality follows by regarding LW (·) as a linear mapping


from a t-dimensional space F back to F, while having a kernel W of
dimension s. □

16.5 Bounds on Repair-Bandwidth

Let C be an [n, k] GRS code over F. We claim [48], [85] that any linear
repair scheme for C will necessarily incur a repair bandwidth of at least:
q t (n − 1)
b ≥ (n − 1) logq (16.7)
(n − k − 1)(q t − 1) + n − 1
units, measured in terms of number of symbols over the subfield B.

Proof. From Remark 17, we know that given an evaluation point αi0 ∈ F,
there exists a set {g1 (x), g2 (x), . . . , gt (x)}, gj (x) ∈ F[x], of t repair
polynomials, each having degree < (n − k), such that:

 
dimB g1 (αi0 ), g2 (αi0 ), · · · , gt (αi0 ) = t.
774 Repair of Reed-Solomon Codes

Let
 
dimB g1 (αi ), g2 (αi ), · · · , gt (αi ) = di ,

for all i ∈ [n] \ {i0 }. It follows that the repair bandwidth needed to
recover the code symbol corresponding to the evaluation point αi0 using
(16.3) is now given by i∈[n]\{i0 } di . For i ∈ [n] \ {i0 }, let the subspace
P

Si ⊆ Bt over B be defined as follows:


 
 
Si := s = (s1 , s2 , . . . , st ) ∈ Bt | sj gj (αi ) = 0 .
X
 
j∈[t]

We have dimB (Si ) = t−di and hence the cardinality of the set of nonzero
elements in Si is given by q t−di − 1. As the next step, we determine the
average number ρ of sets {Si , i ∈ [n] \ {i0 }} that a nonzero element in
Bt belongs to:
1
ρ := |{i ∈ [n] \ {i0 } : s ∈ Si }|
X
(q t − 1) t
s̸=0,s∈B
1
= |{s ∈ Si , s ̸= 0}|
X
(q − 1) i∈[n]\{i }
t
0

1
= (q t−di − 1). (16.8)
X
(q t − 1) i∈[n]\{i }
0

Clearly, there exists a t-tuple s∗ := (s∗1 , s∗2 , . . . , s∗t ) ∈ Bt \ {0}, such that
the polynomial g ∗ (x) = j∈[t] s∗j gj (x) vanishes on at least ρ evaluation
P

points. Furthermore, g ∗ (αi0 ) ̸= 0 as


 
dimB g1 (αi0 ), g2 (αi0 ), · · · , gt (αi0 ) = t.

This tells us that g ∗ (x) is a non-zero polynomial of degree < (n − k)


that has at least ρ zeros. It follows that
ρ ≤ n − k − 1. (16.9)
From (16.8) and (16.9), one obtains
(n − k − 1)(q t − 1) + (n − 1) ′
q −di := ρ . (16.10)
X
≤ t
i∈[n]\{i0 }
q
16.5. Bounds on Repair-Bandwidth 775

The repair bandwidth b is lower bounded by the quantity

min
X
di
{di ∈[0,t]}
i∈[n],i̸=i0

subject to (16.10). It turns out that the minimum occurs when {di }
are balanced and this results in the following lower bound on repair
bandwidth:
n−1
 
b ≥ (n − 1) logq
ρ′
q t (n − 1)
= (n − 1) logq .
(n − k − 1)(q t − 1) + n − 1

Corollary 5. If n = q t and n − k = q s for some s ∈ [t − 1], any linear


repair scheme for an [n, k] GRS code requires a repair bandwidth of at
least (measured over B):
!
qt
b ≥ (n − 1) logq
qs
= (n − 1)(t − s). (16.11)

16.5.1 Optimality of Repair Schemes


It can be verified that

• when n = q t and n − k = q t−1 the GW scheme achieves the lower


bound presented in (16.11) and

• when n = q t and n − k = q s , for s ∈ [t − 1], the DM scheme


achieves the lower bound presented in (16.11).

Notice that the sub-packetization t = logq n in both schemes.

Notes
1. An early paper: The line of work in which scalar MDS codes
are vectorized by treating each code symbol belonging to a field
F, as a vector over a base field B, for the purpose of reducing
776 Repair of Reed-Solomon Codes

repair bandwidth, began with the work of Shanmugam et al. in


[215]. Here, the authors showed the existence of an efficient repair
scheme for systematic node repair, that improves upon the repair
bandwidth incurred under traditional repair, for the case k = n−2.
2. Achieving the cut-set bound: The Tamo-Ye-Barg RS repair scheme
in [234] has sub-packetization level t = e(1+o(1))(n log n) and achieves
the cut-set bound b ≥ t(n−1) (n−k) on the minimum possible repair
bandwidth b of an MDS code. It is also shown in [234], that the
sub-packetization level required for linear repair of a scalar MDS
code, having minimum possible repair bandwidth, must satisfy
t ≥ e(1+o(1))(k log k) . In [43], the authors show that given a scalar
MDS code that achieves the cut-set bound on minimal repair
bandwidth, it is possible to replace the repair scheme employed
here, by an optimal-access repair scheme.
3. Trading increased sub-packetization level for reduced repair band-
width: The sub-packetization level t appearing in the GW and
DM schemes is of logarithmic order with respect to block length
n, i.e., t = logq (n). The RS repair scheme presented in [254],
has exponential sub-packetization level given by t = (n − k)n ,
but a smaller repair bandwidth b satisfying b < t(n+1)
(n−k) and this
value of repair bandwidth is optimal in the limit as n → ∞. In
[44], this result is refined, resulting in a scheme having smaller
sub-packetization t = um+n−1 where n − k = um for integers u, m,
but which continues to asymptotically achieve the cut-set bound.
The tradeoff between sub-packetization and repair bandwidth of
RS codes is further explored in [82], [140].
4. Multiple node repair: In [47], the authors extend the GW scheme
and formulate RS repair schemes for the case of two or three node
failures. The authors of [157] provide a general framework for
efficiently handling multiple erasures for scalar MDS codes. In
[234], an RS repair scheme achieving the cut-set bound for multiple
node failures is presented, that has large sub-packetization level.
5. Improved repair of the [14, 10] HDFS RS code: In [58], Duursma
and Dau consider the [14, 10, 5] RS code over F28 employed in
16.5. Bounds on Repair-Bandwidth 777

the Hadoop Distributed File System, and present a repair scheme


having reduced repair bandwidth of 54 bits in comparison with
the 80 bits required under conventional repair.

Open Problem 20. Determine the minimum-possible sub-packetization


level of an RS repair scheme that achieves the cut-set bound b ≥ t(n−1)
(n−k)
on repair bandwidth with equality.

Open Problem 21. Determine the smallest repair bandwidth permitted


by an RS repair scheme for given parameters {n, k, q, t}.
17
Codes in Practice

MDS Codes
Distributed systems such as Hadoop, Google File System and Windows
Azure have evolved to support erasure codes so as to derive the benefits
of improved storage efficiency in comparison with simple replication.
MDS codes in general, and RS codes in particular, are the most common
form of erasure coding employed here. Examples include the [9, 6] RS
code employed in the Hadoop Distributed File System, the [14, 10] RS
code in Facebook’s f4 Storage System and the [11, 8] RS code employed
in Yahoo Cloud Object Storage, see [47] for additional examples.

Regenerating Codes
NCCloud: The NCCloud storage system described in [98], is one of the
earliest projects that dealt with the performance evaluation of RGCs
in practice. The NCCloud system employs an (n, k = n − 2, d = n − 1)
MSR code with functional repair. The performance evaluation is carried
out for an (n = 4, k = 2, d = 3) case and compared against RAID-6.

Codes with Inherent Double Replication: In [130], the performance of


two codes is studied in a Hadoop setting. The first code is the Pentagon

778
779

MBR code, discussed in Section 4.1. The second code is a variant of


LRGC employing MBR local codes, that is termed the Heptagon-Local
code. Both codes possess inherent double replication of code symbols,
have storage overhead slightly greater than 2 and in the study, their
performance is compared against schemes that employ double and triple
replication.

PM-RBT Code: In [180], the authors present an optimal-access version


of the PM-MSR code, which they refer to as the PM-RBT code. The
results of an experimental evaluation of (n = 12, k = 6, d = 11) PM-RBT
code on Amazon EC2 instances are presented.

Beehive Codes: In [139], the authors introduced erasure codes termed


as Beehive codes that make use of PM-MSR codes in their construction.
These codes repair multiple failures simultaneously and are implemented
in C++ using the Intel storage acceleration library. The performance
of the (n = 12, k = 6, d = 10) Beehive code for two-node repair is
compared against that of an MSR code having the same parameters
as well as against that of an [n = 12, k = 6] RS code on Amazon EC2
instances.

Butterfly Code: In [166], the authors present the evaluation of a high-


rate MSR code known as the Butterfly code in both Ceph and HDFS.
This code is a simplified version of the MSR codes presented in [65]
corresponding to the presence of two parity nodes. The code possesses
the optimal-access property except in the case of the repair of a spe-
cific parity node, and has sub-packetization level α = 2k−1 . In [166],
the authors present the repair performance of Butterfly codes having
parameters (n = 7, k = 5, d = 6) and (n = 9, k = 7, d = 8).

Clay Code in Ceph: In [240], the authors present an implementation


and evaluation of the CL-MSR code (known in this context by the
acronym, Clay code) in the Ceph distributed storage system. Clay codes
are the first known implementation of MSR codes for general (n, k).
They were also made part of Ceph’s release [37] as an erasure code
780 Codes in Practice

plugin. As a part of this open source work, vector code support is


added to Ceph that enables introducing any other vector erasure code
plugin in the future. In [240], results of experimental evaluation of repair
performance of Clay code are provided for six different code parameters.

MDS Codes with Reduced Repair Bandwidth


Hitchhiker System: The Hitchhiker erasure-coded system presented
in [183] is a practical implementation of the piggybacking framework
introduced in [184]. The authors implemented the Hitchhiker in HDFS
and evaluated the performance of (n = 14, k = 10, α = 2) code on a
data-warehouse cluster at Facebook.

Hashtag Codes: In [128], the HDFS implementation of a class of MDS


array codes called HashTag codes is discussed. The theoretical framework
of HashTag codes was presented in [127]. These codes allow low sub-
packetization levels at the expense of increased repair bandwidth and are
designed to efficiently repair systematic nodes. The repair performance
of several Hashtag codes with different sub-packetization levels are
presented in [128].

LRCs
Windows Azure Code: In [103], the authors compare performance eval-
uation results of an (n = 16, k = 12, r = 6, δ = 2) LRC with that of an
[16, 12, 5] RS code in the Azure production cluster and demonstrate the
repair savings of LRCs. Subsequently the authors [101] implemented
an (n = 18, k = 14, r = 7, δ = 2) LRC in Microsoft’s Windows Azure
Storage system and showed that this code has repair degree comparable
to that of an [9, 6, 4] RS code, but has storage overhead 1.29 versus
1.5 in the case of the RS code. This reduction in storage overhead has
reportedly resulted in significant cost savings for Microsoft [161].

HDFS-Xorbas: The authors of [207] implemented HDFS-Xorbas which


uses LRCs in place of RS codes in HDFS-RAID. The Xorbas LRC is
built on top of an RS code by adding extra local XOR parties. The
experimental evaluation of Xorbas was carried out in Amazon EC2 as
781

well as a cluster in Facebook. In the evaluation, the repair performance


of an (n = 16, k = 10, r = 5, δ = 2) LRC was compared against that of
an [14, 10, 5] RS code.

LRCs in Ceph: Ceph is a second distributed storage system that has


an LRC plug-in [149]. In [124], a performance comparison of different
LRCs is provided, through experimental evaluation over a Ceph cluster.
Acknowledgements

We thank the editors of Science China Information Sciences for allowing


reuse of some material from our previously-published article [10]. We
would like to thank the editor-in-chief and the publisher for the invitation
to write the monograph, as well as for being patient with respect to the
submission timeline. Thanks also go out to the editor-in-chief for the
helpful initial comments that guided the writing of this monograph.
We would like to thank the anonymous reviewer for the very careful
reading and the detailed comments, which helped significantly improve
the presentation and coverage of the material. The last author would
like to thank Kannan Ramchandran for introducing him to this research
topic. He would also like to acknowledge support received under the J
C Bose National Fellowship JCB/2017/000017.

782
References

[1] A. Agarwal, A. Barg, S. Hu, A. Mazumdar, and I. Tamo, “Com-


binatorial alphabet-dependent bounds for locally recoverable
codes,” IEEE Trans. Inf. Theory, vol. 64, no. 5, 2018, pp. 3481–
3492.
[2] G. K. Agarwal, B. Sasidharan, and P. V. Kumar, “An alternate
construction of an access-optimal regenerating code with opti-
mal sub-packetization level,” in Proc. Twenty First National
Conference on Communications, Mumbai, India, 2015, pp. 1–6,
2015.
[3] R. Ahlswede, N. Cai, S. R. Li, and R. W. Yeung, “Network
information flow,” IEEE Trans. Inf. Theory, vol. 46, no. 4, 2000,
pp. 1204–1216.
[4] I. Ahmad and C.-C. Wang, “When can intelligent helper node
selection improve the performance of distributed storage net-
works?” IEEE Trans. Inf. Theory, vol. 64, no. 3, 2017, pp. 2142–
2171.
[5] N. Alon, “Combinatorial Nullstellensatz,” Combinatorics, Prob-
ability and Computing, vol. 8, no. 1-2, 1999, pp. 7–29.
[6] O. Alrabiah and V. Guruswami, “An exponential lower bound
on the sub-packetization of MSR codes,” in Proc. 51st Annual
ACM SIGACT Symposium on Theory of Computing, Phoenix,
AZ, USA, 2019, pp. 979–985, 2019.

783
784 References

[7] O. Alrabiah and V. Guruswami, “An exponential lower bound on


the sub-packetization of minimum storage regenerating codes,”
IEEE Trans. Inf. Theory, 2021.
[8] B. S. Babu and P. V. Kumar, “Erasure codes for distributed
storage: Tight bounds and matching constructions,” CoRR,
vol. abs/1806.04474, 2018.
[9] B. S. Babu, M. Vajha, and P. V. Kumar, “On lower bounds
on sub-packetization level of MSR codes and on the struc-
ture of optimal-access MSR codes achieving the bound,” CoRR,
vol. abs/1710.05876v3, 2021.
[10] S. B. Balaji, M. N. Krishnan, M. Vajha, V. Ramkumar, B. Sasid-
haran, and P. V. Kumar, “Erasure coding for distributed stor-
age: An overview,” Science China Information Sciences, vol. 61,
no. 10, 2018, pp. 1–45.
[11] S. B. Balaji and P. V. Kumar, “A tight lower bound on the
sub-packetization level of optimal-access MSR and MDS codes,”
in Proc. IEEE International Symposium on Information Theory,
Vail, CO, USA, 2018, pp. 2381–2385, 2018.
[12] S. B. Balaji, K. P. Prasanth, and P. V. Kumar, “Binary codes
with locality for multiple erasures having short block length,”
in Proc. IEEE International Symposium on Information Theory,
ISIT 2016, Barcelona, Spain, 2016, pp. 655–659, 2016.
[13] S. B. Balaji, G. R. Kini, and P. V. Kumar, “A tight rate bound
and matching construction for locally recoverable codes with
sequential recovery from any number of multiple erasures,” IEEE
Trans. Inf. Theory, vol. 66, no. 2, 2020, pp. 1023–1052.
[14] S. B. Balaji and P. V. Kumar, “Bounds on the rate and minimum
distance of codes with availability,” in Proc. IEEE International
Symposium on Information Theory, Aachen, Germany, 2017,
pp. 3155–3159, 2017.
[15] S. B. Balaji and P. V. Kumar, “On partial maximally-recoverable
and maximally-recoverable codes,” in Proc. IEEE International
Symposium on Information Theory, Hong Kong, China, 2015,
pp. 1881–1885, 2015.
References 785

[16] S. Ball, “On sets of vectors of a finite vector space in which


every subset of basis size is a basis,” Journal of the European
Mathematical Society, vol. 14, no. 3, 2012, pp. 733–748.
[17] S. Ball and J. De Beule, “On sets of vectors of a finite vector
space in which every subset of basis size is a basis II,” Designs,
Codes and Cryptography, vol. 65, no. 1, 2012, pp. 5–14.
[18] S. Ballentine, A. Barg, and S. Vladuţ, “Codes with hierarchical
locality from covering maps of curves,” IEEE Trans. Inf. Theory,
vol. 65, no. 10, 2019, pp. 6056–6071.
[19] A. Barg, I. Tamo, and S. Vlăduţ, “Locally recoverable codes on
algebraic curves,” IEEE Trans. Inf. Theory, vol. 63, no. 8, 2017,
pp. 4928–4939.
[20] A. Barg, Z. Chen, and I. Tamo, “A construction of maximally
recoverable codes,” Designs, Codes and Cryptography, vol. 90,
no. 4, 2022, pp. 939–945.
[21] A. Barg, K. Haymaker, E. W. Howe, G. L. Matthews, and
A. Várilly-Alvarado, “Locally recoverable codes from algebraic
curves and surfaces,” in Algebraic Geometry for Coding Theory
and Cryptography, Springer, 2017, pp. 95–127.
[22] S. Bhadane and A. Thangaraj, “Unequal locality and recovery
for locally recoverable codes with availability,” in Twenty-third
National Conference on Communications, Chennai, India, 2017,
pp. 1–6, 2017.
[23] M. Blaum, P. G. Farrell, and H. C. A. van Tilborg, “Array codes,”
in Handbook of Coding Theory, V. S. Pless and W. C. Huffman,
Eds., vol. 2, Amsterdam, The Netherlands: North Holland, 1998,
pp. 1855–1909.
[24] M. Blaum, “Construction of PMDS and SD codes extending
RAID 5,” CoRR, vol. abs/1305.0032, 2013.
[25] M. Blaum, J. Brady, J. Bruck, and J. Menon, “EVENODD:
An efficient scheme for tolerating double disk failures in RAID
architectures,” IEEE Transactions on computers, vol. 44, no. 2,
1995, pp. 192–202.
[26] M. Blaum, J. Bruck, and A. Vardy, “MDS array codes with
independent parity symbols,” IEEE Trans. Inf. Theory, vol. 42,
no. 2, 1996, pp. 529–542.
786 References

[27] M. Blaum, J. L. Hafner, and S. Hetzler, “Partial-MDS codes and


their application to RAID type of architectures,” IEEE Trans.
Inf. Theory, vol. 59, no. 7, 2013, pp. 4510–4519.
[28] M. Blaum, J. S. Plank, M. Schwartz, and E. Yaakobi, “Con-
struction of partial MDS and sector-disk codes with two global
parity symbols,” IEEE Trans. Inf. Theory, vol. 62, no. 5, 2016,
pp. 2673–2681.
[29] T. Bogart, A. Horlemann-Trautmann, D. A. Karpuk, A. Neri,
and M. Velasco, “Constructing partial MDS codes from reducible
algebraic curves,” SIAM J. Discret. Math., vol. 35, no. 4, 2021,
pp. 2946–2970.
[30] V. R. Cadambe and A. Mazumdar, “Bounds on the size of locally
recoverable codes,” IEEE Trans. Inf. Theory, vol. 61, no. 11,
2015, pp. 5787–5794.
[31] V. Cadambe, S. A. Jafar, H. Maleki, K. Ramchandran, and C.
Suh, “Asymptotic interference alignment for optimal repair of
MDS codes in distributed storage,” IEEE Trans. Inf. Theory,
vol. 59, no. 5, 2013, pp. 2974–2987.
[32] V. R. Cadambe, C. Huang, and J. Li, “Permutation code: Op-
timal exact-repair of a single failed node in MDS code based
distributed storage systems,” in Porc. IEEE International Sym-
posium on Information Theory Proceedings, ISIT 2011, St. Pe-
tersburg, Russia, 2011, pp. 1225–1229, 2011.
[33] V. R. Cadambe, C. Huang, J. Li, and S. Mehrotra, “Polynomial
length MDS codes with optimal repair in distributed storage,”
in Proc. Forty Fifth Asilomar Conference on Signals, Systems
and Computers, Pacific Grove, CA, USA 2011, pp. 1850–1854,
2011.
[34] H. Cai, Y. Miao, M. Schwartz, and X. Tang, “On optimal locally
repairable codes with super-linear length,” IEEE Trans. Inf.
Theory, vol. 66, no. 8, 2020, pp. 4853–4868.
[35] H. Cai, Y. Miao, M. Schwartz, and X. Tang, “A construction
of maximally recoverable codes with order-optimal field size,”
IEEE Trans. Inf. Theory, vol. 68, no. 1, 2022, pp. 204–212.
References 787

[36] G. Calis and O. O. Koyluoglu, “A general construction for PMDS


codes,” IEEE Communications Letters, vol. 21, no. 3, 2017,
pp. 452–455.
[37] Ceph V14.1.0 Nautilus (Release Candidate 1), url: https://docs.
ceph.com/en/nautilus/releases/nautilus/.
[38] B. Chen, S. T. Xia, J. Hao, and F. W. Fu, “Constructions of
optimal cyclic (r, δ) locally repairable codes,” IEEE Trans. Inf.
Theory, vol. 64, no. 4, 2018, pp. 2499–2511.
[39] J. Chen, K. W. Shum, Q. Yu, and C. W. Sung, “Sector-disk
codes and partial MDS codes with up to three global parities,”
in Proc. IEEE International Symposium on Information Theory,
Hong Kong, China, 2015, pp. 1876–1880, 2015.
[40] M. Chen, C. Huang, and J. Li, “On the maximally recoverable
property for multi-protection group codes,” in Proc. IEEE Inter-
national Symposium on Information Theory, Nice, France, 2007,
pp. 486–490, 2007.
[41] Z. Chen and A. Barg, “Explicit constructions of MSR codes for
clustered distributed storage: The rack-aware storage model,”
IEEE Trans. Inf. Theory, vol. 66, no. 2, 2019, pp. 886–899.
[42] Z. Chen and A. Barg, “Cyclic LRC codes with hierarchy and
availability,” in IEEE International Symposium on Information
Theory, Los Angeles, CA, USA, 2020, pp. 616–621, 2020.
[43] Z. Chen, M. Ye, and A. Barg, “Enabling optimal access and
error correction for the repair of Reed–Solomon codes,” IEEE
Trans. Inf. Theory, vol. 66, no. 12, 2020, pp. 7439–7456.
[44] A. Chowdhury and A. Vardy, “Improved schemes for asymptoti-
cally optimal repair of MDS codes,” IEEE Trans. Inf. Theory,
vol. 67, no. 8, 2021, pp. 5051–5068.
[45] P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J.
Leong, and S. Sankar, “Row-diagonal parity for double disk
failure correction,” in Proc. 3rd USENIX Conference on File and
Storage Technologies, San Francisco, CA, pp. 1–14, 2004.
[46] A. Datta and F. E. Oggier, “An overview of codes tailor-made for
better repairability in networked distributed storage systems,”
SIGACT News, vol. 44, no. 1, 2013, pp. 89–105.
788 References

[47] H. Dau, I. M. Duursma, H. M. Kiah, and O. Milenkovic, “Repair-


ing Reed-Solomon codes with multiple erasures,” IEEE Trans.
Inf. Theory, vol. 64, no. 10, 2018, pp. 6567–6582.
[48] H. Dau and O. Milenkovic, “Optimal repair schemes for some
families of full-length Reed-Solomon codes,” in Proc. IEEE Inter-
national Symposium on Information Theory, Aachen, Germany,
2017, pp. 346–350, 2017.
[49] P. Delsarte, “Bilinear forms over a finite field, with applications
to coding theory,” J. Comb. Theory, Ser. A, vol. 25, no. 3, 1978,
pp. 226–241.
[50] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K.
Ramchandran, “Network coding for distributed storage systems,”
IEEE Trans. Inf. Theory, vol. 56, no. 9, 2010, pp. 4539–4551.
[51] A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A survey
on network codes for distributed storage,” Proceedings of the
IEEE, vol. 99, no. 3, 2011, pp. 476–489.
[52] A. Duminuco and E. W. Biersack, “Hierarchical codes: How to
make erasure codes attractive for peer-to-peer storage systems,”
in Proc. P2P’08, Eighth International Conference on Peer-to-
Peer Computing, 2008, Aachen, Germany, pp. 89–98, IEEE
Computer Society, 2008.
[53] A. Duminuco and E. W. Biersack, “Hierarchical codes: A flexible
trade-off for erasure codes in peer-to-peer storage systems,”
Peer-to-Peer Netw. Appl., vol. 3, no. 1, 2010, pp. 52–66.
[54] I. Duursma, X. Li, and H.-P. Wang, “Multilinear algebra for
distributed storage,” SIAM Journal on Applied Algebra and
Geometry, vol. 5, no. 3, 2021, pp. 552–587.
[55] I. Duursma and H.-P. Wang, “Multilinear algebra for minimum
storage regenerating codes: A generalization of the product-
matrix construction,” Applicable Algebra in Engineering, Com-
munication and Computing, 2021, pp. 1–27.
[56] I. M. Duursma, “Shortened regenerating codes,” IEEE Trans.
Inf. Theory, vol. 65, no. 2, 2018, pp. 1000–1007.
[57] I. M. Duursma, “Outer bounds for exact repair codes,” CoRR,
vol. abs/1406.4852, 2014.
References 789

[58] I. M. Duursma and H. Dau, “Low bandwidth repair of the


RS(10, 4) Reed-Solomon code,” in Proc. Information Theory and
Applications Workshop, San Diego, CA, USA, 2017, pp. 1–10,
2017.
[59] S. El Rouayheb and K. Ramchandran, “Fractional repetition
codes for repair in distributed storage systems,” in Proc. 48th
Annual Allerton Conference on Communication, Control, and
Computing, pp. 1510–1517, 2010.
[60] M. Elyasi, S. Mohajer, and R. Tandon, “Linear exact repair
rate region of (k + 1, k, k) distributed storage systems: A
new approach,” in Proc. IEEE International Symposium on
Information Theory, Hong Kong, China, 2015, pp. 2061–2065,
2015.
[61] M. Elyasi and S. Mohajer, “Determinant coding: A novel frame-
work for exact-repair regenerating codes,” IEEE Trans. Inf.
Theory, vol. 62, no. 12, 2016, pp. 6683–6697.
[62] M. Elyasi and S. Mohajer, “A cascade code construction for
(n, k, d) distributed storage systems,” in Proc. IEEE Interna-
tional Symposium on Information Theory, Vail, CO, USA, 2018,
pp. 1241–1245, 2018.
[63] M. Elyasi and S. Mohajer, “Determinant codes with helper-
independent repair for single and multiple failures,” IEEE Trans.
Inf. Theory, vol. 65, no. 9, 2019, pp. 5469–5483.
[64] M. Elyasi and S. Mohajer, “Cascade codes for distributed stor-
age systems,” IEEE Trans. Inf. Theory, vol. 66, no. 12, 2020,
pp. 7490–7527.
[65] E. En Gad, R. Mateescu, F. Blagojevic, C. Guyot, and Z. Bandic,
“Repair-optimal MDS array codes over GF(2),” in Proc. IEEE In-
ternational Symposium on Information Theory, Istanbul, Turkey,
2013, pp. 887–891, 2013.
[66] T. Ernvall, T. Westerback, and C. Hollanti, “Constructions of op-
timal and almost optimal locally repairable codes,” in Proc. 4th
International Conference on Wireless Communications, Vehic-
ular Technology, Information Theory and Aerospace Electronic
Systems, 2014, pp. 1–5, 2014.
790 References

[67] T. Ernvall, “The existence of fractional repetition codes,” CoRR,


vol. abs/1201.3547, 2012.
[68] M. Forbes and S. Yekhanin, “On the locality of codeword symbols
in non-linear codes,” Discrete Math., vol. 324, 2014, pp. 78–84.
[69] E. M. Gabidulin, “Theory of codes with maximum rank distance,”
Problemy Peredachi Informatsii, vol. 21, no. 1, 1985, pp. 3–16.
[70] R. Gabrys, E. Yaakobi, M. Blaum, and P. H. Siegel, “Construc-
tions of partial MDS codes over small fields,” IEEE Trans. Inf.
Theory, vol. 65, no. 6, 2019, pp. 3692–3701.
[71] P. Gopalan, C. Huang, B. Jenkins, and S. Yekhanin, “Explicit
maximally recoverable codes with locality,” IEEE Trans. Inf.
Theory, vol. 60, no. 9, 2014, pp. 5245–5256.
[72] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the
locality of codeword symbols,” IEEE Trans. Inf. Theory, vol. 58,
no. 11, 2012, pp. 6925–6934.
[73] P. Gopalan, G. Hu, S. Kopparty, S. Saraf, C. Wang, and S.
Yekhanin, “Maximally recoverable codes for grid-like topolo-
gies,” in Proc. Twenty-Eighth Annual ACM-SIAM Symposium
on Discrete Algorithms, Barcelona, Spain, pp. 2092–2108, 2017.
[74] S. Goparaju and A. R. Calderbank, “Binary cyclic codes that
are locally repairable,” in Proc. IEEE International Symposium
on Information Theory, Honolulu, HI, USA, 2014, pp. 676–680,
2014.
[75] S. Goparaju, S. El Rouayheb, A. R. Calderbank, and H. V. Poor,
“Data secrecy in distributed storage systems under exact repair,”
in Proc. International Symposium on Network Coding, Calgary,
Canada, 2013, pp. 1–6, 2013.
[76] S. Goparaju, A. Fazeli, and A. Vardy, “Minimum storage re-
generating codes for all parameters,” IEEE Trans. Inf. Theory,
vol. 63, no. 10, 2017, pp. 6318–6328.
[77] S. Goparaju, I. Tamo, and R. Calderbank, “An improved sub-
packetization bound for minimum storage regenerating codes,”
IEEE Trans. Inf. Theory, vol. 60, no. 5, 2014, pp. 2770–2779.
[78] S. Gopi and V. Guruswami, “Improved maximally recoverable
LRCs using skew polynomials,” Electron. Colloquium Comput.
Complex., 2021, p. 25.
References 791

[79] S. Gopi, V. Guruswami, and S. Yekhanin, “Maximally recoverable


LRCs: A field size lower bound and constructions for few heavy
parities,” IEEE Trans. Inf. Theory, vol. 66, no. 10, 2020, pp. 6066–
6083.
[80] M. Grezet, T. Westerbäck, R. Freij-Hollanti, and C. Hollanti,
“Uniform minors in maximally recoverable codes,” IEEE Com-
munications Letters, vol. 23, no. 8, 2019, pp. 1297–1300.
[81] M. K. Gupta, A. Agrawal, and D. Yadav, “On weak dress codes
for cloud storage,” CoRR, vol. abs/1302.3681, 2013.
[82] V. Guruswami and H. Jiang, “Near-optimal repair of Reed-
Solomon codes with low sub-packetization,” in Proc. IEEE In-
ternational Symposium on Information Theory, Paris, France,
2019, pp. 1077–1081, 2019.
[83] V. Guruswami, L. Jin, and C. Xing, “Constructions of maximally
recoverable local reconstruction codes via function fields,” IEEE
Trans. Inf. Theory, vol. 66, no. 10, 2020, pp. 6133–6143.
[84] V. Guruswami and A. S. Rawat, “MDS code constructions with
small sub-packetization and near-optimal repair bandwidth,” in
Proc. Twenty-Eighth Annual ACM-SIAM Symposium on Discrete
Algorithms, Barcelona, Spain, 2017, pp. 2109–2122, 2017.
[85] V. Guruswami and M. Wootters, “Repairing Reed-Solomon
codes,” IEEE Trans. Inf. Theory, vol. 63, no. 9, 2017, pp. 5684–
5698.
[86] V. Guruswami, C. Xing, and C. Yuan, “How long can optimal
locally repairable codes be?” IEEE Trans. Inf. Theory, vol. 65,
no. 6, 2019, pp. 3662–3670.
[87] J. Han and L. A. Lastras-Montano, “Reliable memories with
subline accesses,” in Proc. IEEE International Symposium on
Information Theory, Nice, France, 2007, pp. 2531–2535, 2007.
[88] J. Hao, S. T. Xia, and B. Chen, “Some results on optimal locally
repairable codes,” in Proc. IEEE International Symposium on
Information Theory, Barcelona, Spain, 2016, pp. 440–444, 2016.
[89] J. Hao, S. T. Xia, and B. Chen, “On optimal ternary locally
repairable codes,” in Proc. IEEE International Symposium on
Information Theory, Aachen, Germany, 2017, pp. 171–175, 2017.
792 References

[90] J. Hao, K. Shum, S.-T. Xia, and Y.-X. Yang, “On the maximal
code length of optimal linear locally repairable codes,” in Proc.
IEEE International Symposium on Information Theory, Vail,
CO, USA, 2018, 2018.
[91] J. Hao, S. Xia, and B. Chen, “On the linear codes with (r, δ)-
locality for distributed storage,” in Proc. IEEE International
Conference on Communications, ICC 2017, Paris, France, May
21-25, 2017, pp. 1–6, IEEE.
[92] J. Hao, S. Xia, K. W. Shum, B. Chen, F. Fu, and Y. Yang,
“Bounds and constructions of locally repairable codes: Parity-
check matrix approach,” IEEE Trans. Inf. Theory, vol. 66, no. 12,
2020, pp. 7465–7474.
[93] K. Haymaker, B. Malmskog, and G. L. Matthews, “Locally
recoverable codes with availability t≥2 from fiber products of
curves,” Adv. Math. Commun., vol. 12, no. 2, 2018, pp. 317–336.
[94] T. Helleseth, T. Klove, V. I. Levenshtein, and O. Ytrehus,
“Bounds on the minimum support weights,” IEEE Trans. Inf.
Theory, vol. 41, no. 2, 1995, pp. 432–440.
[95] H. Hou, P. P. C. Lee, K. W. Shum, and Y. Hu, “Rack-aware
regenerating codes for data centers,” IEEE Trans. Inf. Theory,
vol. 65, no. 8, 2019, pp. 4730–4745.
[96] G. Hu and S. Yekhanin, “New constructions of SD and MR codes
over small finite fields,” in Proc. IEEE International Symposium
on Information Theory, Barcelona, Spain, 2016, pp. 1591–1595,
2016.
[97] P. Hu, C. W. Sung, and T. H. Chan, “Broadcast repair for
wireless distributed storage systems,” in Proc. 10th Interna-
tional Conference on Information, Communications and Signal
Processing, Singapore, 2015, pp. 1–5, 2015.
[98] Y. Hu, H. C. H. Chen, P. P. C. Lee, and Y. Tang, “NCCloud:
Applying network coding for the storage repair in a cloud-of-
clouds,” in Proc. 10th USENIX conference on File and Storage
Technologies, San Jose, CA, USA, 2012, p. 21, 2012.
References 793

[99] Y. Hu, P. P. C. Lee, and X. Zhang, “Double regenerating codes


for hierarchical data centers,” in Proc. IEEE International Sym-
posium on Information Theory, Barcelona, Spain, 2016, pp. 245–
249, 2016.
[100] Y. Hu, Y. Xu, X. Wang, C. Zhan, and P. Li, “Cooperative re-
covery of distributed storage systems from multiple losses with
network coding,” IEEE Journal on Selected Areas in Communi-
cations, vol. 28, no. 2, 2010, pp. 268–276.
[101] C. Huang, “Erasure Coding in Windows Azure Storage,” talk
presented at the SNIA Storage Developer Conference, Santa
Clara, Sept 12-15, 2012 (joint work with H. Simitci, Y. Xu, A.
Ogus, B. Calder, P. Gopalan, J. Li and S. Yekhanin).
[102] C. Huang, M. Chen, and J. Li, “Pyramid codes: Flexible schemes
to trade space for access efficiency in reliable data storage sys-
tems,” in Proc. 6th IEEE Int. Symposium on Network Computing
and Applications, Cambridge, Massachusetts, USA, 2007, pp. 79–
86, 2007.
[103] C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J.
Li, and S. Yekhanin, “Erasure coding in windows azure storage,”
in Proc. 2012 USENIX Annual Technical Conference, pp. 15–26,
Boston, MA, 2012.
[104] C. Huang, M. Chen, and J. Li, “Pyramid codes: Flexible schemes
to trade space for access efficiency in reliable data storage sys-
tems,” ACM Trans. Storage, vol. 9, no. 1, 2013, 3:1–3:28.
[105] K. Huang, U. Parampalli, and M. Xian, “On secrecy capacity
of minimum storage regenerating codes,” IEEE Trans. on Inf.
Theory, vol. 63, no. 3, 2017, pp. 1510–1524.
[106] K. Huang, U. Parampalli, and M. Xian, “Security concerns in
minimum storage cooperative regenerating codes,” IEEE Trans.
Inf. Theory, vol. 62, no. 11, 2016, pp. 6218–6232.
[107] K. Huang, U. Parampalli, and M. Xian, “Improved upper bounds
on systematic-length for linear minimum storage regenerating
codes,” IEEE Trans. Inf. Theory, vol. 65, no. 2, 2018, pp. 975–
984.
794 References

[108] P. Huang, E. Yaakobi, H. Uchikawa, and P. H. Siegel, “Cyclic lin-


ear binary locally repairable codes,” in Proc. IEEE Information
Theory Workshop, Jerusalem, Israel, 2015, pp. 1–5, 2015.
[109] P. Huang, E. Yaakobi, H. Uchikawa, and P. H. Siegel, “Binary
linear locally repairable codes,” IEEE Trans. Inf. Theory, vol. 62,
no. 11, 2016, pp. 6268–6283.
[110] Information theory inequality prover, url: http://user-www.ie.
cuhk.edu.hk/~ITIP/.
[111] L. Jin, H. Kan, and Y. Zhang, “Constructions of locally repairable
codes with multiple recovering sets via rational function fields,”
IEEE Trans. Inf. Theory, vol. 66, no. 1, 2020, pp. 202–209.
[112] L. Jin, L. Ma, and C. Xing, “Construction of optimal locally
repairable codes via automorphism groups of rational function
fields,” IEEE Trans. Inf. Theory, vol. 66, no. 1, 2019, pp. 210–
221.
[113] S. Kadhe and A. Sprintson, “Security for minimum storage re-
generating codes and locally repairable codes,” in Proc. IEEE
International Symposium on Information Theory, Aachen, Ger-
many, 2017, pp. 1028–1032, 2017.
[114] S. Kadhe and A. R. Calderbank, “Rate optimal binary lin-
ear locally repairable codes with small availability,” CoRR,
vol. abs/1701.02456, 2017.
[115] S. Kadhe and A. R. Calderbank, “Rate optimal binary linear
locally repairable codes with small availability,” in Proc. IEEE
International Symposium on Information Theory, Aachen, Ger-
many, 2017, pp. 166–170, 2017.
[116] G. M. Kamath, N. Silberstein, N. Prakash, A. S. Rawat, V.
Lalitha, O. O. Koyluoglu, P. V. Kumar, and S. Vishwanath,
“Explicit MBR all-symbol locality codes,” in Proc. IEEE Inter-
national Symposium on Information Theory, Istanbul, Turkey,
2013, IEEE, pp. 504–508, 2013.
[117] G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar, “Codes
with local regeneration and erasure correction,” IEEE Trans. Inf.
Theory, vol. 60, no. 8, 2014, pp. 4637–4660.
References 795

[118] A. M. Kermarrec, N. L. Scouarnec, and G. Straub, “Repairing


multiple failures with coordinated and adaptive regenerating
codes,” in Proc. International Symposium on Networking Coding,
Beijing, China, 2011, pp. 1–6, 2011.
[119] C. Kim and J. S. No, “New constructions of binary and ternary
locally repairable codes using cyclic codes,” IEEE Communica-
tions Letters, vol. 22, no. 2, 2018, pp. 228–231.
[120] M. Kleckler and S. Mohajer, “Secure determinant codes: A class
of secure exact-repair regenerating codes,” in Proc. IEEE In-
ternational Symposium on Information Theory, Paris, France,
2019, pp. 211–215, 2019.
[121] M. Kleckler and S. Mohajer, “Secure determinant codes: Type-II
security,” in Proc. IEEE International Symposium on Informa-
tion Theory, Los Angeles, CA, USA, 2020, pp. 652–657, 2020.
[122] R. Koetter and M. Médard, “An algebraic approach to network
coding,” IEEE/ACM Trans. Netw., vol. 11, no. 5, 2003, pp. 782–
795.
[123] O. Kolosov, A. Barg, I. Tamo, and G. Yadgar, “Optimal LRC
codes for all lenghts n <= q,” CoRR, vol. abs/1802.00157, 2018.
[124] O. Kolosov, G. Yadgar, M. Liram, I. Tamo, and A. Barg, “On
fault tolerance, locality, and optimality in locally repairable
codes,” ACM Transactions on Storage, vol. 16, no. 2, 2020,
pp. 1–32.
[125] J. C. Koo and J. T. G. III, “Scalable constructions of fractional
repetition codes in distributed storage systems,” in Proc. 49th
Annual Allerton Conference on Communication, Control, and
Computing, Monticello, IL, USA, 2011, pp. 1366–1373, 2011.
[126] O. O. Koyluoglu, A. S. Rawat, and S. Vishwanath, “Secure
cooperative regenerating codes for distributed storage systems,”
IEEE Trans. Inf. Theory, vol. 60, no. 9, 2014, pp. 5228–5244.
[127] K. Kralevska, D. Gligoroski, and H. Øverby, “General sub-
packetized access-optimal regenerating codes,” IEEE Communi-
cations Letters, vol. 20, no. 7, 2016, pp. 1281–1284.
[128] K. Kralevska, D. Gligoroski, R. E. Jensen, and H. Øverby, “Hash-
Tag erasure codes: From theory to practice,” IEEE Transactions
on Big Data, 2017.
796 References

[129] M. N. Krishnan, A. Narayanan R, and P. V. Kumar, “Codes with


combined locality and regeneration having optimal rate, dmin
and linear field size,” in Proc. IEEE International Symposium
on Information Theory, Vail, CO, USA, 2018, pp. 1196–1200,
2018.
[130] M. N. Krishnan, N. Prakash, V. Lalitha, B. Sasidharan, P. V.
Kumar, S. Narayanamurthy, R. Kumar, and S. Nandi, “Evalua-
tion of codes with inherent double replication for Hadoop,” in
Proc. 6th USENIX Workshop on Hot Topics in Storage and File
Systems, Philadelphia, PA, USA, 2014.
[131] M. N. Krishnan, B. Puranik, P. V. Kumar, I. Tamo, and A.
Barg, “Exploiting locality for improved decoding of binary cyclic
codes,” IEEE Trans. Commun., vol. 66, no. 6, 2018, pp. 2346–
2358.
[132] M. N. Krishnan and P. V. Kumar, “On MBR codes with replica-
tion,” in Proc. IEEE International Symposium on Information
Theory, Barcelona, Spain, 2016, pp. 71–75, 2016.
[133] S. Kruglik, K. Nazirkhanova, and A. Frolov, “New bounds and
generalizations of locally recoverable codes with availability,”
IEEE Trans. Inf. Theory, vol. 65, no. 7, 2019, pp. 4156–4166.
[134] P. V. Kumar, “Codes with local regeneration,” talk presented at
the conference on Trends in Coding Theory, Ascona, Switzerland,
Oct. 28 to Nov. 2, 2012 (joint work with G. M. Kamath, N.
Prakash, V. Lalitha).
[135] V. Lalitha and S. V. Lokam, “Weight enumerators and higher
support weights of maximally recoverable codes,” in Proc. 53rd
Annual Allerton Conference on Communication, Control, and
Computing, Monticello, IL, USA, 2015, pp. 835–842, 2015.
[136] J. Li and B. Li, “Erasure coding for cloud storage systems: A
survey,” Tsinghua Science and Technology, vol. 18, no. 3, 2013,
pp. 259–272.
[137] J. Li, X. Tang, and C. Tian, “A generic transformation for
optimal repair bandwidth and rebuilding access in MDS codes,”
in Proc. IEEE International Symposium on Information Theory,
Aachen, Germany, 2017, pp. 1623–1627, 2017.
References 797

[138] J. Li, Y. Liu, and X. Tang, “A systematic construction of MDS


codes with small sub-packetization level and near-optimal repair
bandwidth,” IEEE Trans. Inf. Theory, vol. 67, no. 4, 2020,
pp. 2162–2180.
[139] J. Li and B. Li, “Beehive: Erasure codes for fixing multiple
failures in distributed storage systems,” IEEE Trans. Parallel
Distrib. Syst., vol. 28, no. 5, 2017, pp. 1257–1270.
[140] W. Li, Z. Wang, and H. Jafarkhani, “A tradeoff between the sub-
packetization size and the repair bandwidth for Reed-Solomon
code,” in Proc. 55th Annual Allerton Conference on Commu-
nication, Control, and Computing, Monticello, IL, USA, 2017,
pp. 942–949, 2017.
[141] X. Li, L. Ma, and C. Xing, “Construction of asymptotically good
locally repairable codes via automorphism groups of function
fields,” IEEE Trans. Inf. Theory, vol. 65, no. 11, 2019, pp. 7087–
7094.
[142] X. Li, L. Ma, and C. Xing, “Optimal locally repairable codes via
elliptic curves,” IEEE Trans. Inf. Theory, vol. 65, no. 1, 2019,
pp. 108–117.
[143] S. J. Lin, W. H. Chung, Y. S. Han, and T. Y. Al-Naffouri, “A uni-
fied form of exact-MSR codes via product-matrix frameworks,”
IEEE Trans. Inf. Theory, vol. 61, no. 2, 2015, pp. 873–886.
[144] S. Lin and W. Chung, “Novel repair-by-transfer codes and sys-
tematic exact-MBR codes with lower complexities and smaller
field sizes,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 12,
2014, pp. 3232–3241.
[145] S. Liu and F. Oggier, “An overview of coding for distributed
storage systems,” Network Coding and Subspace Designs, 2018,
pp. 363–383.
[146] S. Liu and F. E. Oggier, “On storage codes allowing partially
collaborative repairs,” in Proc. IEEE International Symposium
on Information Theory, Honolulu, HI, USA, 2014, pp. 2440–2444,
2014.
[147] S. Liu and C. Xing, “Maximally recoverable local reconstruction
codes from subspace direct sum systems,” CoRR, vol. abs/
2111.03244, 2021.
798 References

[148] Y. Liu, J. Li, and X. Tang, “Explicit constructions of high-rate


MSR codes with optimal access property over small finite fields,”
IEEE Trans. Communications, vol. 66, no. 10, 2018, pp. 4405–
4413.
[149] Locally repairable erasure code plugin, url: http://docs.ceph.
com/docs/master/rados/operations/erasure-code-lrc/.
[150] M. Luby, “Repair rate lower bounds for distributed storage,”
IEEE Trans. Inf. Theory, vol. 67, no. 9, 2021, pp. 5711–5730.
[151] M. Luby, R. Padovani, T. J. Richardson, L. Minder, and P.
Aggarwal, “Liquid cloud storage,” ACM Trans. Storage, vol. 15,
no. 1, 2019, 2:1–2:49.
[152] M. Luby and T. Richardson, “Distributed storage algorithms
with optimal tradeoffs,” CoRR, vol. abs/2101.05223, 2021.
[153] G. Luo and X. Cao, “Constructions of optimal binary locally
recoverable codes via a general construction of linear codes,”
IEEE Transactions on Communications, vol. 69, no. 8, 2021,
pp. 4987–4997.
[154] Y. Luo, C. Xing, and C. Yuan, “Optimal locally repairable codes
of distance 3 and 4 via cyclic codes,” IEEE Trans. Inf. Theory,
vol. 65, no. 2, 2019, pp. 1048–1053.
[155] J. Ma and G. Ge, “Optimal binary linear locally repairable codes
with disjoint repair groups,” SIAM J. Discret. Math., vol. 33,
no. 4, 2019, pp. 2509–2529.
[156] F. J. MacWilliams and N. J. A. Sloane, The theory of error-
correcting codes. Elsevier, 1977.
[157] J. Mardia, B. Bartan, and M. Wootters, “Repairing multiple
failures for scalar MDS codes,” IEEE Trans. Inf. Theory, vol. 65,
no. 5, 2019, pp. 2661–2672.
[158] U. Martínez-Peñas, “A general family of MSRD codes and PMDS
codes with smaller field sizes from extended Moore matrices,”
CoRR, vol. abs/2011.14109, 2020.
[159] U. Martínez-Peñas and F. R. Kschischang, “Universal and dy-
namic locally repairable codes with maximal recoverability via
sum-rank codes,” IEEE Trans. Inf. Theory, vol. 65, no. 12, 2019,
pp. 7790–7805.
References 799

[160] M. Mehrabi and M. Ardakani, “On minimum distance of lo-


cally repairable codes,” in Proc. 15th Canadian Workshop on
Information Theory, Quebec, Canada, 2017, pp. 1–5, 2017.
[161] Microsoft research blog: A better way to store data, url: https:
//www.microsoft.com/en-us/research/blog/better-way-store-
data/.
[162] S. Mohajer and R. Tandon, “New bounds on the (n, k, d) stor-
age systems with exact repair,” in Proc. IEEE International
Symposium on Information Theory, Hong Kong, China, 2015,
pp. 2056–2060, 2015.
[163] M. Y. Nam and H. Y. Song, “Binary locally repairable codes with
minimum distance at least six based on partial t -spreads,” IEEE
Communications Letters, vol. 21, no. 8, 2017, pp. 1683–1686.
[164] F. Oggier and A. Datta, “Self-repairing homomorphic codes
for distributed storage systems,” in Proc. IEEE INFOCOM,
Shanghai, China, 2011, pp. 1215–1223, 2011.
[165] O. Olmez and A. Ramamoorthy, “Fractional repetition codes
with flexible repair from combinatorial designs,” IEEE Trans.
Inf. Theory, vol. 62, no. 4, 2016, pp. 1565–1591.
[166] L. Pamies-Juarez, F. Blagojevic, R. Mateescu, C. Guyot, E. En
Gad, and Z. Bandic, “Opening the chrysalis: On the real repair
performance of MSR codes,” in Proc. 14th USENIX Conference
on File and Storage Technologies, Santa Clara, CA, USA, 2016,
pp. 81–94, 2016.
[167] D. S. Papailiopoulos, A. G. Dimakis, and V. R. Cadambe, “Repair
optimal erasure codes through Hadamard designs,” IEEE Trans.
Inf. Theory, vol. 59, no. 5, 2013, pp. 3021–3037.
[168] D. S. Papailiopoulos and A. G. Dimakis, “Locally repairable
codes,” IEEE Trans. Inf. Theory, vol. 60, no. 10, 2014, pp. 5843–
5855.
[169] S. Pawar, S. El Rouayheb, and K. Ramchandran, “Securing
dynamic distributed storage systems against eavesdropping and
adversarial attacks,” IEEE Trans. on Inf. Theory, vol. 57, no. 10,
2011, pp. 6734–6753.
800 References

[170] S. Pawar, N. Noorshams, S. El Rouayheb, and K. Ramchan-


dran, “DRESS Codes for the storage cloud: Simple randomized
constructions,” in Proc. IEEE International Symposium on In-
formation Theory Proceedings, St. Petersburg, Russia, 2011,
pp. 2338–2342, 2011.
[171] N. Prakash, V. Abdrashitov, and M. Médard, “The storage
versus repair-bandwidth trade-off for clustered storage systems,”
IEEE Trans. Inf. Theory, vol. 64, no. 8, 2018, pp. 5783–5805.
[172] N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar, “Op-
timal linear codes with a local-error-correction property,” in
Proc. IEEE International Symposium on Information Theory
Proceedings, Cambridge, MA, USA, 2012, pp. 2776–2780, 2012.
[173] N. Prakash and M. N. Krishnan, “The storage-repair-bandwidth
trade-off of exact repair linear regenerating codes for the case
d=k=n-1,” in Proc. IEEE International Symposium on Informa-
tion Theory, Hong Kong, 2015, pp. 859–863, 2015.
[174] N. Prakash, V. Lalitha, S. B. Balaji, and P. V. Kumar, “Codes
with locality for two erasures,” IEEE Trans. Inf. Theory, vol. 65,
no. 12, 2019, pp. 7771–7789.
[175] N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with locality
for two erasures,” in Proc. IEEE International Symposium on
Information Theory, Honolulu, HI, USA, 2014, pp. 1962–1966,
2014.
[176] C. Rajput and M. Bhaintwal, “Optimal RS-like LRC codes of
arbitrary length,” Applicable Algebra in Engineering, Communi-
cation and Computing, vol. 31, no. 3, 2020, pp. 271–289.
[177] V. A. Rameshwar and N. Kashyap, “Achieving secrecy capacity
of minimum storage regenerating codes for all feasible (n, k, d)
parameter values,” in National Conference on Communications,
Bangalore, India, 2019, pp. 1–6, 2019.
[178] V. Ramkumar, M. Vajha, S. B. Balaji, M. N. Krishnan, B.
Sasidharan, and P. V. Kumar, “Codes for distributed storage,”
in Concise Encyclopedia of Coding Theory, W. C. Huffman, J.-L.
Kim, and P. Solé, Eds., Chapman and Hall/CRC, 2021, pp. 735–
761.
References 801

[179] K. V. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran,


“Explicit construction of optimal exact regenerating codes for
distributed storage,” in Proc. 47th Annu. Allerton Conf. Com-
munication, Control, and Computing, pp. 1243–1249, Urbana-
Champaign, IL, 2009.
[180] K. V. Rashmi, P. Nakkiran, J. Wang, N. B. Shah, and K. Ram-
chandran, “Having your cake and eating it too: Jointly optimal
erasure codes for I/O, storage, and network-bandwidth,” in Proc.
13th USENIX Conference on File and Storage Technologies,
Santa Clara, CA, USA, 2015, pp. 81–94, 2015.
[181] K. V. Rashmi, N. B. Shah, K. Ramchandran, and P. V. Kumar,
“Regenerating codes for errors and erasures in distributed stor-
age,” in Proc. IEEE International Symposium on Information
Theory, Cambridge, MA, USA, 2012, pp. 1202–1206, 2012.
[182] K. V. Rashmi, N. B. Shah, K. Ramchandran, and P. V. Kumar,
“Information-theoretically secure erasure codes for distributed
storage,” IEEE Trans. Inf. Theory, vol. 64, no. 3, 2018, pp. 1621–
1646.
[183] K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and
K. Ramchandran, “A "Hitchhiker’s" guide to fast and efficient
data reconstruction in erasure-coded data centers,” in Proc. ACM
SIGCOMM Conference, Chicago, IL, USA, 2014, pp. 331–342,
2014.
[184] K. V. Rashmi, N. B. Shah, and K. Ramchandran, “A piggyback-
ing design framework for read-and download-efficient distributed
storage codes,” IEEE Trans. Inf. Theory, vol. 63, no. 9, 2017,
pp. 5802–5820.
[185] K. V. Rashmi, N. B. Shah, and P. V. Kumar, “Optimal exact-
regenerating codes for distributed storage at the MSR and MBR
points via a product-matrix construction,” IEEE Trans. Inf.
Theory, vol. 57, no. 8, 2011, pp. 5227–5239.
[186] N. Raviv, N. Silberstein, and T. Etzion, “Constructions of high-
rate minimum storage regenerating codes over small fields,”
IEEE Trans. Inf. Theory, vol. 63, no. 4, 2017, pp. 2015–2038.
802 References

[187] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath,


“Optimal locally repairable and secure codes for distributed
storage systems,” IEEE Trans. Inf. Theory, vol. 60, no. 1, 2014,
pp. 212–236.
[188] A. S. Rawat, “Secrecy capacity of minimum storage regenerating
codes,” in Proc. IEEE International Symposium on Information
Theory, Aachen, Germany, 2017, pp. 1406–1410, 2017.
[189] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath,
“Optimal locally repairable and secure codes for distributed
storage systems,” IEEE Trans. Inf. Theory, vol. 60, no. 1, 2013,
pp. 212–236.
[190] A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath, “Progress
on high-rate MSR codes: Enabling arbitrary number of helper
nodes,” in Proc. Information Theory and Applications Workshop,
La Jolla, CA, USA, 2016, pp. 1–6, 2016.
[191] A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath, “Centralized
repair of multiple node failures with applications to communica-
tion efficient secret sharing,” IEEE Trans. Inf. Theory, vol. 64,
no. 12, 2018, pp. 7529–7550.
[192] A. S. Rawat, A. Mazumdar, and S. Vishwanath, “Cooperative
local repair in distributed storage,” EURASIP Journal on Ad-
vances in Signal Processing, vol. 2015, no. 1, 2015, pp. 1–17.
[193] A. S. Rawat, D. S. Papailiopoulos, A. G. Dimakis, and S. Vish-
wanath, “Locality and availability in distributed storage,” IEEE
Trans. Inf. Theory, vol. 62, no. 8, 2016, pp. 4481–4493.
[194] A. S. Rawat, I. Tamo, V. Guruswami, and K. Efremenko, “ϵ-MSR
codes with small sub-packetization,” in Proc. IEEE International
Symposium on Information Theory, Aachen, Germany, 2017,
pp. 2043–2047, 2017.
[195] A. S. Rawat, I. Tamo, V. Guruswami, and K. Efremenko, “MDS
code constructions with small sub-packetization and near-optimal
repair bandwidth,” IEEE Trans. Inf. Theory, vol. 64, no. 10,
2018, pp. 6506–6525.
[196] I. S. Reed and G. Solomon, “Polynomial codes over certain finite
fields,” Journal of the SIAM, vol. 8, no. 2, 1960, pp. 300–304.
References 803

[197] C. Salgado, A. Várilly-Alvarado, and J. F. Voloch, “Locally


recoverable codes on surfaces,” IEEE Trans. Inf. Theory, vol. 67,
no. 9, 2021, pp. 5765–5777.
[198] B. Sasidharan, K. Senthoor, and P. V. Kumar, “An improved
outer bound on the storage-repair-bandwidth tradeoff of exact-
repair regenerating codes,” in Proc. IEEE International Sympo-
sium on Information Theory, Honolulu, HI, USA, 2014, pp. 2430–
2434, 2014.
[199] B. Sasidharan, G. K. Agarwal, and P. V. Kumar, “Codes with
hierarchical locality,” in Proc. IEEE International Symposium
on Information Theory, Hong Kong, China, 2015, pp. 1257–1261,
2015.
[200] B. Sasidharan, G. K. Agarwal, and P. V. Kumar, “A high-rate
MSR code with polynomial sub-packetization level,” in Proc.
IEEE International Symposium on Information Theory, Hong
Kong, 2015, pp. 2051–2055, 2015.
[201] B. Sasidharan, G. K. Agarwal, and P. V. Kumar, “Codes with
hierarchical locality,” CoRR, vol. abs/1501.06683, 2015.
[202] B. Sasidharan and P. V. Kumar, “High-rate regenerating codes
through layering,” in Proc. IEEE International Symposium on
Information Theory, Istanbul, Turkey, 2013, pp. 1611–1615, 2013.
[203] B. Sasidharan, N. Prakash, M. N. Krishnan, M. Vajha, K.
Senthoor, and P. V. Kumar, “Outer bounds on the storage-
repair bandwidth trade-off of exact-repair regenerating codes,”
Int. Journal Inf. Coding Theory, vol. 3, no. 4, 2016, pp. 255–298.
[204] B. Sasidharan, M. Vajha, and P. V. Kumar, “An explicit, coupled-
layer construction of a high-rate MSR code with low sub-
packetization level, small field size and d<(n- 1),” in Proc. IEEE
International Symposium on Information Theory, Aachen, Ger-
many, 2017, pp. 2048–2052, 2017.
[205] B. Sasidharan, M. Vajha, and P. V. Kumar, “An explicit, coupled-
layer construction of a high-rate MSR code with low sub-
packetization level, small field size and all-node repair,” CoRR,
vol. abs/1607.07335, 2016.
804 References

[206] B. Sasidharan, M. Vajha, and P. V. Kumar, “An explicit, coupled-


layer construction of a high-rate regenerating code with low
sub-packetization level, small field size and d<(n-1),” CoRR,
vol. abs/1701.07447, 2022.
[207] M. Sathiamoorthy, M. Asteris, D. S. Papailiopoulos, A. G. Di-
makis, R. Vadali, S. Chen, and D. Borthakur, “XORing elephants:
Novel erasure codes for big data,” PVLDB, vol. 6, no. 5, 2013,
pp. 325–336.
[208] S. Schechter, “On the inversion of certain matrices,” Mathemati-
cal Tables and Other Aids to Computation, vol. 13, no. 66, 1959,
pp. 73–77.
[209] B. Segre, “Curve razionali normali ek-archi negli spazi finiti,”
Annali di Matematica Pura ed Applicata, vol. 39, no. 1, 1955,
pp. 357–379.
[210] K. Senthoor, B. Sasidharan, and P. V. Kumar, “Improved layered
regenerating codes characterizing the exact-repair storage-repair
bandwidth tradeoff for certain parameter sets,” in Proc. IEEE
Information Theory Workshop, Jerusalem, pp. 1–5, 2015.
[211] N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran,
“Distributed storage codes with repair-by-transfer and nonachiev-
ability of interior points on the storage-bandwidth tradeoff,”
IEEE Trans. Inf. Theory, vol. 58, no. 3, 2012, pp. 1837–1852.
[212] N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran,
“Interference alignment in regenerating codes for distributed
storage: Necessity and code constructions,” IEEE Trans. Inf.
Theory, vol. 58, no. 4, 2012, pp. 2134–2158.
[213] N. B. Shah, “On minimizing data-read and download for storage-
node recovery,” IEEE Communications Letters, vol. 17, no. 5,
2013, pp. 964–967.
[214] M. Shahabinejad, M. Khabbazian, and M. Ardakani, “A class of
binary locally repairable codes,” IEEE Transactions on Commu-
nications, vol. 64, no. 8, 2016, pp. 3182–3193.
[215] K. Shanmugam, D. S. Papailiopoulos, A. G. Dimakis, and G.
Caire, “A Repair framework for scalar MDS codes,” IEEE Jour-
nal on Selected Areas in Communications, vol. 32, no. 5, 2014,
pp. 998–1007.
References 805

[216] S. Shao, T. Liu, C. Tian, and C. Shen, “On the tradeoff region
of secure exact-repair regenerating codes,” IEEE Trans. Inf.
Theory, vol. 63, no. 11, 2017, pp. 7253–7266.
[217] D. Shivakrishna, V. A. Rameshwar, V. Lalitha, and B. Sasidha-
ran, “On maximally recoverable codes for product topologies,” in
Proc. Twenty Fourth National Conference on Communications,
IEEE, pp. 1–6, 2018.
[218] K. W. Shum and Y. Hu, “Cooperative regenerating codes,” IEEE
Trans. Inf. Theory, vol. 59, no. 11, 2013, pp. 7229–7258.
[219] N. Silberstein and A. Zeh, “Optimal binary locally repairable
codes via anticodes,” in Proc. IEEE International Symposium
on Information Theory, Hong Kong, 2015, pp. 1247–1251, 2015.
[220] N. Silberstein, “Optimal locally repairable codes via rank-metric
codes,” talk presented at the conference on Trends in Coding
Theory, Ascona, Switzerland, Oct. 28 to Nov. 2, 2012 (joint work
with A. S. Rawat and S. Vishwanath).
[221] N. Silberstein and T. Etzion, “Optimal fractional repetition
codes based on graphs and designs,” IEEE Trans. Inf. Theory,
vol. 61, no. 8, 2015, pp. 4164–4180.
[222] N. Silberstein and A. Zeh, “Anticode-based locally repairable
codes with high availability,” Designs, Codes and Cryptography,
vol. 86, Feb. 2018.
[223] R. Singleton, “Maximum distance q-nary codes,” IEEE Trans.
Inf. Theory, vol. 10, no. 2, 1964, pp. 116–118.
[224] J.-Y. Sohn, B. Choi, S. W. Yoon, and J. Moon, “Capacity of
clustered distributed storage,” IEEE Trans. Inf. Theory, vol. 65,
no. 1, 2019, pp. 81–107.
[225] W. Song, S. H. Dau, C. Yuen, and T. J. Li, “Optimal locally
repairable linear codes,” IEEE Journal on Selected Areas in
Communications, vol. 32, no. 5, 2014, pp. 1019–1036.
[226] W. Song, K. Cai, C. Yuen, K. Cai, and G. Han, “On sequential
locally repairable codes,” IEEE Trans. Inf. Theory, vol. 64, no. 5,
2018, pp. 3513–3527.
[227] C. Suh and K. Ramchandran, “Exact-repair MDS code construc-
tion using interference alignment,” IEEE Trans. Inf. Theory,
vol. 57, no. 3, 2011, pp. 1425–1442.
806 References

[228] I. Tamo and A. Barg, “A family of optimal locally recoverable


codes,” IEEE Trans. Inf. Theory, vol. 60, no. 8, 2014, pp. 4661–
4676.
[229] I. Tamo, Z. Wang, and J. Bruck, “Zigzag codes: MDS array codes
with optimal rebuilding,” IEEE Trans. Inf. Theory, vol. 59, no. 3,
2013, pp. 1597–1616.
[230] I. Tamo, A. Barg, and A. Frolov, “Bounds on the parameters
of locally recoverable codes,” IEEE Trans. Inf. Theory, vol. 62,
no. 6, 2016, pp. 3070–3083.
[231] I. Tamo, A. Barg, S. Goparaju, and A. R. Calderbank, “Cyclic
LRC codes, binary LRC codes, and upper bounds on the distance
of cyclic codes,” Int. J. Inf. Coding Theory, vol. 3, no. 4, 2016,
pp. 345–364.
[232] I. Tamo, D. S. Papailiopoulos, and A. G. Dimakis, “Optimal
locally repairable codes and connections to matroid theory,”
IEEE Trans. Inf. Theory, vol. 62, no. 12, 2016, pp. 6661–6671.
[233] I. Tamo, Z. Wang, and J. Bruck, “Access versus bandwidth in
codes for storage,” IEEE Trans. Inf. Theory, vol. 60, no. 4, 2014,
pp. 2028–2037.
[234] I. Tamo, M. Ye, and A. Barg, “The repair problem for Reed–
Solomon codes: Optimal repair of single and multiple erasures
with almost optimal node size,” IEEE Trans. Inf. Theory, vol. 65,
no. 5, 2018, pp. 2673–2695.
[235] R. Tandon, S. Amuru, T. C. Clancy, and R. M. Buehrer, “Toward
optimal secure distributed storage systems with exact repair,”
IEEE Trans. Inf. Theory, vol. 62, no. 6, 2016, pp. 3477–3492.
[236] C. Tian, B. Sasidharan, V. Aggarwal, V. Vaishampayan, and P. V.
Kumar, “Layered exact-repair regenerating codes via embedded
error correction and block designs,” IEEE Trans. Inf. Theory,
vol. 61, no. 4, 2015, pp. 1933–1947.
[237] C. Tian, “Characterizing the rate region of the (4, 3, 3) exact-
repair regenerating codes,” IEEE Journal on Selected Areas in
Communications, vol. 32, no. 5, 2014, pp. 967–975.
[238] C. Tian, “A note on the rate region of exact-repair regenerating
codes,” CoRR, vol. abs/1503.00011, 2015.
References 807

[239] M. Vajha, B. S. Babu, and P. V. Kumar, “Explicit MSR codes


with optimal access, optimal sub-packetization and small field size
for d = k+1, k+2, k+3,” in Proc. IEEE International Symposium
on Information Theory, Vail, CO, USA, 2018, pp. 2376–2380,
2018.
[240] M. Vajha, V. Ramkumar, B. Puranik, G. R. Kini, E. Lobo, B.
Sasidharan, P. V. Kumar, A. Barg, M. Ye, S. Narayanamurthy,
S. Hussain, and S. Nandi, “Clay codes: Moulding MDS codes to
yield an MSR Code,” in Proc. 16th USENIX Conference on File
and Storage Technologies, Oakland, CA, USA, 2018, pp. 139–154,
2018.
[241] A. Wang and Z. Zhang, “Repair locality with multiple erasure tol-
erance,” IEEE Trans. Inf. Theory, vol. 60, no. 11, 2014, pp. 6979–
6987.
[242] A. Wang and Z. Zhang, “An integer programming-based bound
for locally repairable codes,” IEEE Trans. Inf. Theory, vol. 61,
no. 10, 2015, pp. 5280–5294.
[243] A. Wang, Z. Zhang, and D. Lin, “Bounds and constructions
for linear locally repairable codes over binary fields,” in Proc.
IEEE International Symposium on Information Theory, Aachen,
Germany, 2017, pp. 2033–2037, 2017.
[244] A. Wang and Z. Zhang, “Exact cooperative regenerating codes
with minimum-repair-bandwidth for distributed storage,” in Proc.
IEEE INFOCOM, Turin, Italy, 2013, pp. 400–404, 2013.
[245] A. Wang, Z. Zhang, and D. Lin, “Two classes of (r, t)-locally
repairable codes,” in Proc. IEEE International Symposium on
Information Theory, Barcelona, Spain, 2016, pp. 445–449, 2016.
[246] A. Wang, Z. Zhang, and M. Liu, “Achieving arbitrary locality
and availability in binary codes,” in Proc. IEEE International
Symposium on Information Theory, Hong Kong, 2015, pp. 1866–
1870, 2015.
[247] G. Wang, M.-Y. Niu, and F.-W. Fu, “Constructions of (r,t)-LRC
based on totally isotropic subspaces in symplectic space over
finite fields,” International Journal of Foundations of Computer
Science, vol. 31, Apr. 2020, pp. 327–339.
808 References

[248] Z. Wang, I. Tamo, and J. Bruck, “On codes for optimal re-
building access,” in Proc. 49th Annual Allerton Conference on
Communication, Control, and Computing 2011, pp. 1374–1381,
2011.
[249] Z. Wang, I. Tamo, and J. Bruck, “Long MDS codes for optimal
repair bandwidth,” in Proc. IEEE International Symposium on
Information Theory, Cambridge, MA, USA, 2012, pp. 1182–1186,
2012.
[250] V. K. Wei, “Generalized Hamming weights for linear codes,”
IEEE Trans. Inf. Theory, vol. 37, no. 5, 1991, pp. 1412–1418.
[251] Y. Wu, “Existence and construction of capacity-achieving net-
work codes for distributed storage,” IEEE Journal on Selected
Areas in Communications, vol. 28, no. 2, 2010, pp. 277–288.
[252] E. Yavari and M. Esmaeili, “Locally repairable codes: Joint
sequential–parallel repair for multiple node failures,” IEEE Trans.
Inf. Theory, vol. 66, no. 1, 2020, pp. 222–232.
[253] F. Ye, K. W. Shum, and R. W. Yeung, “The rate region for
secure distributed storage systems,” IEEE Trans. Inf. Theory,
vol. 63, no. 11, 2017, pp. 7038–7051.
[254] M. Ye and A. Barg, “Explicit constructions of MDS array codes
and RS codes with optimal repair bandwidth,” in Proc. IEEE
International Symposium on Information Theory, Barcelona,
Spain, 2016, pp. 1202–1206, 2016.
[255] M. Ye and A. Barg, “Explicit constructions of high-rate MDS
array codes with optimal repair bandwidth,” IEEE Trans. Inf.
Theory, vol. 63, no. 4, 2017, pp. 2001–2014.
[256] M. Ye and A. Barg, “Explicit constructions of optimal-access
MDS codes with nearly optimal sub-packetization,” IEEE Trans.
Inf. Theory, vol. 63, no. 10, 2017, pp. 6307–6317.
[257] M. Ye and A. Barg, “Cooperative repair: Constructions of optimal
MDS codes for all admissible parameters,” IEEE Trans. Inf.
Theory, vol. 65, no. 3, 2018, pp. 1639–1656.
[258] R. W. Yeung, “A framework for linear information inequalities,”
IEEE Trans. Inf. Theory, vol. 43, no. 6, 1997, pp. 1924–1934.
References 809

[259] A. Zeh and E. Yaakobi, “Optimal linear and cyclic locally re-
pairable codes over small fields,” in Proc. IEEE Information
Theory Workshop, Jerusalem, Israel, 2015, pp. 1–5, 2015.
[260] G. Zhang and H. Liu, “Constructions of optimal codes with
hierarchical locality,” IEEE Trans. Inf. Theory, vol. 66, no. 12,
2020, pp. 7333–7340.
[261] J. Zhang, X. Wang, and G. Ge, “Some improvements on locally
repairable codes,” CoRR, vol. abs/1506.04822, 2015.
[262] M. Zhang and R. Li, “Two families of LRCs with availability
based on iterative matrix,” in Proc. 13th International Sym-
posium on Computational Intelligence and Design, Hangzhou,
China, 2020, pp. 334–337, 2020.
[263] L. Zhou and Z. Zhang, “Explicit construction of min-
imum bandwidth rack-aware regenerating codes,” CoRR,
vol. abs/2103.01533, 2021.
[264] B. Zhu, K. W. Shum, H. Li, and H. Hou, “General fractional
repetition codes for distributed storage systems,” IEEE Commun.
Lett., vol. 18, no. 4, 2014, pp. 660–663.
[265] M. Zorgui and Z. Wang, “Centralized multi-node repair regen-
erating codes,” IEEE Trans. Inf. Theory, vol. 65, no. 7, 2019,
pp. 4180–4206.
Index

(n, M, dmin ) code, 558 Cauchy matrix, 562, 563, 587


[n, k, dmin ] code, 558 Cauchy MDS codes, 562
ϵ-MSR code, 657, 659, 660 centralized repair, 603, 665
Ceph, 779, 781
active limited-knowledge
adversary model, 666 Chinese remainder theorem,
active omniscient adversary 736, 737
model, 666 Clay code, 779
algebraic geometry codes, 693 codes with availability, 695
all-symbol locality, 681, 730 codes with MBR locality, 761
annihilator polynomial, 684, codes with MSR locality, 761,
736 762
anti-code, 712 Combinatorial Nullstellensatz,
Azure, 680, 729, 730, 780 581, 613, 614, 661, 662,
746, 747
balanced incomplete block constant-repair-matrix property,
design, 711, 712 612
Beehive code, 779 cooperative
binary MBR codes, 590
locality, 689
bipartite graph, 651
regenerating code, 665
Butterfly code, 779
repair, 603, 665
Cascade code, 628, 633 corner points, 616

810
Index 811

coset, 684, 686, 736 global parity symbols, 680


coupled-layer MSR code, 604, good polynomial, 684–686
779 grid-like topology, 755
cowedge multiplication, 637
cross-rack repair bandwidth, Hamming distance, 558
669 Hamming weight, 558
cut-set bound, 579, 776, 777 Hashtag code, 780
HDFS, 549, 776, 779, 780
data collection, 568, 570, 572, help-by-tansfer, 584
573 helper nodes, 551, 552, 569,
data cube, 605 663, 670
Determinant code, 628, 633 helper-set-independence
Diagonal MSR code, 598, 660 property, 612, 628
disjoint locality, 749, 754, 758 Hitchhiker, 780
Hoffman-Singleton graph, 723
exact repair, 568, 569, 578
exact repair tradeoff, 577, 619,
inclusion-exclusion, 697
626
information-symbol locality,
excluded erasure patterns, 741
672, 730
exterior product, 636
interference alignment, 643, 644
file-size bound, 572, 573, 576, interior point, 582, 620, 626
579 intersection score, 609
fractional repetition codes, 589,
662 Lagrange interpolation, 560,
full Reed-Solomon code, 767 564
full-rank condition, 644 lazy repair, 583
functional repair, 568, 570, lexicographic ordering, 700
576–578, 580, 582 linear availability codes, 699
functional repair tradeoff, 577, linear LRC, 671
616 linear repair scheme, 773, 775,
776
generalized Hamming weight, linearized polynomials, 640,
704 747–749, 771, 772
generalized Reed-Solomon liquid storage, 583
codes, 553, 560, 767 local parity symbols, 680
girth, 722, 723, 725, 726 locally recoverable codes, 670
812 Index

MBR codes, 578, 584 optimal-access repair, 776


MDS array codes, 566, 578 optimal-update MSR code, 603
MDS codes, 549, 559 outer bounds, 622
MDS conjecture, 566
middle codes, 731 p-c equation, 550
minimum bandwidth pairwise
cooperative forward transform, 607
regenerating point, 666 reverse transform, 607
minimum bandwidth parity nodes, 551
rack-aware passive eavesdropper model,
regenerating point, 669 666, 668
minimum distance, 558 pentagon MBR code, 585, 762,
minimum storage cooperative 779
regenerating point, 666 Permuted-Diagonal MSR code,
minimum storage rack-aware 615
regenerating point, 669 piecewise linear, 616
minimum weight, 558 piggybacking framework, 658,
monic polynomial, 557 780
Moore graph, 722, 723, 725 polygonal MBR code, 585
Moulin code, 634 product code, 699, 720
MSR codes, 578, 591 product-matrix
multilinear algebra, 614, 634 framework, 587
multiple erasures, 714, 776 MBR, 587
MSR, 592, 779
NCCloud, 778
pyramid code, 678, 680
near-optimal repair bandwidth,
657
rack-aware regenerating code,
network coding, 577, 579, 581
668
node, 550
RAID, 549, 780
node repair, 551, 568, 570, 663
rank profile, 758
nonlinear LRC, 671
rate of
normalized repair bandwidth,
PM-MSR code, 598
576
MBR code, 579
optimal regenerating code, 576 RGC, 569
optimal-access MSR code, 591, recoverable erasure patterns,
604, 614, 648, 779 740
Index 813

reduced field-size constructions Steiner triple system, 712


of MRC, 754 storage overhead, 569, 576, 591
Reed-Solomon codes, 550, 557 storage-repair-bandwidth
regenerating codes, 567 tradeoff, 576
repair bandwidth, 551, 552, 567, strict availability, 708
569, 584, 591, 756, 765, sub-packetization level, 554,
766, 770, 773, 775, 776 569, 766, 775, 777
repair degree, 551, 552, 670, subgroup, 684, 686
729, 756 systematic code, 562, 563, 596
repair matrix, 643, 649 systematic MSR codes, 614
repair polynomials, 768–770,
772 table-based repair, 589, 663
repair subspace, 643, 645, 649 Tamo-Barg LRC, 682, 762
repair-by-transfer, 584, 585, tensor product, 635
662, 663 trace function, 767, 771
replacement node, 550, 568 trace-dual basis, 767, 768
resilient regenerating code, 667
uniform rank accumulation
secure codes, 758, 760
MBR code, 668
MSR code, 668 Vandermonde matrix, 558, 560,
regenerating code, 666 587, 593, 601
sequential recovery, 714 vector code, 756, 757
shortening, 596, 597, 611, 612 vector symbol alphabet, 765
Signed Determinant code, 627
simplex code, 712 Xorbas, 780
Singleton bound, 559, 672
Steiner system, 663 Zigzag code, 613

You might also like