Distributed Data Storage
Distributed Data Storage
8, AUGUST 2011
5227
I. INTRODUCTION
5228
C. Striping of Data
The nature of the cut-set bound permits a divide-and-conquer
approach to be used in the application of optimal regenerating
codes to large file sizes, thereby simplifying system implementation. This is explained below.
regenerating code with parameter
Given an optimal
, a second optimal regenerating code with paramset
for any positive integer
eter set
can be constructed, by dividing the
message symbols into
groups of symbols each, and applying the
code to
each group independently. Secondly, a common feature of both
MSR and MBR regenerating codes is that in either case, their pais such that both and are multiples of
rameter set
and further that , are functions only of , and . It follows
MSR or MBR
that if one can construct an (optimal)
, then one can construct an (optimal)
code with
MSR or MBR code for any larger value of . In addition, from a
practical standpoint, a code constructed through concatenation
of codes for a smaller will in general, be of lesser complexity
(see Section VI-C). For these reasons, in the present paper we
. Thus, throughout the redesign codes for the case of
. In the termimainder of the paper, we will assume that
nology of distributed storage, such a process is called striping.
We document below the values of and of MSR and MBR
codes respectively, when
:
(5)
(6)
for MSR codes and
(7)
(8)
in the case of MBR codes.
D. Additional Terminology
(3)
Reversing the order leads to the MBR point which thus corresponds to
(4)
regenerating code as a code
We define an optimal
with parameters
satisfying the twin requirements that:
achieves the cut-set bound with
1) the parameter set
equality;
2) decreasing either or will result in a new parameter set
that violates the cut set bound.
regenerating code
An MSR code is then defined as an
satisfy (3) and similarly, an MBR
whose parameters
satisfying (4). Clearly,
code as one with parameters
both MSR and MBR codes are optimal regenerating codes.
RASHMI et al.: OPTIMAL EXACT-REGENERATING CODES FOR DISTRIBUTED STORAGE AT THE MSR AND MBR POINTS
2) Systematic Regenerating Codes: A systematic regenerating code can be defined as a regenerating code designed in
such a way that the message symbols are explicitly present
amongst the
code symbols stored in a select set of nodes,
termed as the systematic nodes. Clearly, in the case of systematic regenerating codes, exact-regeneration of (the systematic
portion of the data stored in) the systematic nodes is mandated.
3) Linear Regenerating Codes: A linear regenerating code is
defined as a regenerating code in which:
a) the code symbols stored in each node are linear combinaof the message symbols
;
tions over
b) the symbols passed by a helper node to aid in the rein the
generation of a failed node are linear over
symbols stored in node .
It follows as an easy consequence, that linear operations sufcode
fice for a data-collector to recover the data from the
symbols stored in the nodes that it has connected to. Similarly, the replacement node for a failed node , performs linear
operations on the symbols passed on to it by the helper nodes
aiding in the regeneration.
E. Results of the Present Paper
While prior work is described in greater detail in Section II,
we begin by providing a context for the results presented here.
Background: To-date, explicit and general constructions for
exact-regenerating codes at the MSR point have been found only
. Similarly at the MBR
for the case
point, the only explicit code to previously have been constructed
. Thus, all existing code constructions
is for the case
. This
limit the total number of nodes in the system to
is restrictive since in this case, the system can handle only a
single node failure at a time. Also, such a system does not permit
additional storage nodes to be brought into the system.
A second open problem in this area that has recently drawn
attention is as to whether or not the storage-repair bandwidth
tradeoff is achievable under the additional requirement of exactregeneration. It has previously been shown that no linear code
with
,
can achieve the MSR point for any
when (and hence
but is achievable for all parameters
as well) is allowed to approach infinity.
Results Presented in Present Paper: In this paper, (optimal)
explicit constructions of exact-regenerating MBR codes for all
and exact-regenerating MSR codes for all
values of
are presented. The constructions are of a
product-matrix nature that is shown to significantly simplify
operation of the distributed storage network. The constructions
presented prove that the MBR point for exact-regeneration can
be achieved for all values of the parameters and that the MSR
.
point can be achieved for all parameters satisfying
In both constructions, the message size is as dictated by cut-set
bound. The paper also contains a simpler description, in the
product-matrix framework, of an MSR code for the parameters
that was previously constructed in
[6], [7].
A brief overview of prior work in this field is provided in
Section II. The product-matrix framework underlying the code
construction is described in Section III. An exact-regenerating
MBR code for all feasible values of the parameters
5229
5230
i.e., regeneration where a part of the code is exactly regenerated, and the remaining is functionally regenerated (it is shown
subsequently in [6], [14] that exact-regeneration is not possible,
, for the set of parameters considered therein).
when
MSR codes performing a hybrid of exact and functional-regeneration are provided in [15], for the parameters
and
. The codes given even here are nonexplicit, and
have high complexity and large field-size requirement.
A code structure that guarantees exact-regeneration of just the
systematic nodes is provided in [6], for the MSR point with pa. This code makes use
rameters
of interference alignment, and is termed as the MISER code
in journal-submission version [14] of [6]. Subsequently, it was
shown in [7] that for this set of parameters, the code introduced
in [6] for exact-regeneration of only the systematic nodes can
also be used to repair the nonsystematic (parity) node failures
exactly provided repair construction schemes are appropriately
designed. Such an explicit repair scheme is indeed designed and
presented in [7]. The paper [7] also contains an exact-regener.
ating MSR code for parameter set
A proof of nonachievability of the cut-set bound on exact-regeneration at the MSR point with linear codes, for the paramwhen
, is provided in [6], [14].
eters
On the other hand, the MSR point is shown to be achievable in
the limiting case of approaching infinity (i.e., approaching
infinity) in [16], [17].
A flexible setup for regenerating codes is described in [18],
where a data-collector (or a replacement node) can perform
data-reconstruction (or regeneration) irrespective of the number
of nodes to which it connects, provided the total data downloaded exceeds a certain threshold.
In [19], the authors establish that essentially all points on the
interior of the tradeoff (i.e., points other than MSR and MBR)
are not achievable under exact-regeneration.
where
is the submatrix of
consisting of the rows
. It then uses the properties of the matrices
and
to recover the message. The precise procedure for reis a function of the particular construction.
covering
As noted above, each node in the network is associated to a
encoding vector . In the regeneration process,
distinct
of length , that
we will need to call upon a related vector
contains a subset of the components of . To regenerate a failed
node , the node replacing the failed node connects to an arbiof storage nodes which we will refer
trary subset
to as the helper nodes. Each helper node passes on the inner
product of the symbols stored in it with , to the replacement
passes
node: the helper node
where
RASHMI et al.: OPTIMAL EXACT-REGENERATING CODES FOR DISTRIBUTED STORAGE AT THE MSR AND MBR POINTS
MBR code is
Let be a
matrix constructed so that the
entries
in the upper-triangular half of the matrix are filled up by
distinct message symbols drawn from the set
. The
entries in the strictly lower-triangular portion of the matrix are
then chosen so as to make the matrix a symmetric matrix. The
message symbols are used to fill up a second
remaining
matrix . The message matrix
is then defined
as the
symmetric matrix given by
(11)
The symmetry of the matrix will be found to be instrumental
when enabling node repair. Next, define the encoding matrix
to be any
matrix of the form
5231
..
.
By construction, the
matrix
is invertible. Thus,
through multiplication on
the replacement node recovers
. Since
is symmetric
the left by
(13)
and this is precisely the data previously stored in the failed node.
Theorem 3 (MBR Data-Reconstruction): In the code premessage symbols can be recovered by consented, all the
necting to any nodes, i.e., the message symbols can be recovered through linear operations on the entries of any rows of
the matrix .
Proof: Let
(14)
submatrix of , corresponding to the rows of
be the
to which the data-collector connects. Thus, the data-collector
has access to the symbols
(15)
By construction,
is a nonsingular matrix. Hence, by mulon the left by
, one can recover
tiplying the matrix
first and subsequently, .
A. An Example for the Product-Matrix MBR Code
Let
,
,
. Then
and
. Let us
so we are operating over
. The matrices and
choose
are filled up by the 9 message symbols
as follows:
(16)
5232
Fig. 1. Example for the MBR code construction: On failure of node 1, the replacement node downloads one symbol each from nodes 2, 4, 5 and 6, using which
node 1 is exactly regenerated. The notation h1; 1i indicates an inner product of the stored symbols with the vector [ 1 1 1 1 ] .
is given by
(17)
We choose
given by
to be the
(18)
(19)
We define the
message matrix
as
(24)
where
and
are
symmetric matrices constructed
such that the
entries in the upper-triangular part of each
of the two matrices are filled up by
distinct message
symbols. Thus, all the
message symbols are
and . The entries in the
contained in the two matrices
and
strictly lower-triangular portion of the two matrices
are chosen so as to make the matrices
and
symmetric.
Next, we define the encoding matrix to be the
matrix
given by
(25)
where
is the
zero matrix, and
identity matrix, 0 is a
are matrices of sizes
respectively, such that
and
is a Cauchy
2In general, any matrix, all of whose submatrices are of full rank, will suffice.
3As mentioned previously, it is impossible to construct linear MSR codes for
the case of d < 2k 0 3 when = 1 (see [6], [14]).
RASHMI et al.: OPTIMAL EXACT-REGENERATING CODES FOR DISTRIBUTED STORAGE AT THE MSR AND MBR POINTS
5233
where is an
matrix and is an
diagonal
matrix. The elements of are chosen such that the following
conditions are satisfied:
1) any rows of are linearly independent;
2) any rows of are linearly independent;
3) the diagonal elements of are distinct.
The above requirements can be met, for example, by choosing
to be a Vandermonde matrix with elements chosen carefully
to satisfy the third condition. In this case, let the th row of
(for
) be
, which gives
. In order to satisfy the third property,
to be any field of size
or higher,
one may choose
, where is the generator of the multiplicative
with
. Note that as in the MBR code, the
group of the finite field
only constraint on the field size in this construction arises from
the above required properties of the encoding matrix .
Then under our code-construction framework, the th row of
product matrix
, contains the code
the
symbols stored by the th node. The two theorems below establish that the code presented is an
MSR code by establishing respectively, the exact-regeneration and data-reconstruction properties of the code.
Theorem 4 (MSR Exact-Regeneration): In the code presented, exact-regeneration of any failed node can be achieved
of the
by downloading one symbol each from any
nodes.
remaining
be the row of corresponding
Proof: Let
to the failed node. Thus the symbols stored in the failed node
were
(30)
(26)
The replacement for the failed node connects to an arbitrary
of helper nodes. Upon being contacted
set
by the replacement node, the helper node computes the inner
product
and passes on this value to the replacement
node. Thus, in the present construction, the vector
equals .
The replacement node thus obtains the symbols
from the helper nodes, where
(29)
The data-collector can post-multiply this term with
tain
and
to ob-
be defined as
(31)
(32)
As
and
are symmetric, the same is true of the matrices
and . In terms of and , the data-collector has access to the
symbols of the matrix
(33)
The
th,
while the
th element is given by
(35)
..
.
By construction, the
matrix
the replacement node now has access to
is invertible. Thus
where (35) follows from the symmetry of and . By construction, all the
are distinct and hence using (34) and (35),
,
for all
the data-collector can solve for the values of
.
be given by
Consider first the matrix . Let
..
.
(36)
As
and
are symmetric matrices, the replacement node
and
.
has thus acquired through transposition, both
Using this, it can obtain
(27)
(37)
5234
Fig. 2. Example for the MSR code construction: On failure of node 1, the replacement node downloads one symbol each from nodes 2, 4, 5, and 6, using which
1 indicates an inner product of the stored symbols with the vector [ 1 1 ] .
node 1 is exactly regenerated. The notation
< :; >
Hence the
are
matrix
and the
diagonal matrix
(38)
Selecting the first
(43)
(39)
code matrix
with
Fig. 2 shows at the top, the
. The
entries expressed as functions of the message symbols
rest of the figure explains how exact-regeneration of failed node
1 takes place. To regenerate node 1, the helper nodes (nodes 2,
4, 5, 6 in the example), pass on their respective inner products
for
, 4, 5, 6. The replacement node multiplies
, where
the symbols it receives with
(44)
and decodes
,
,
. Then
and
. Let us choose
, so we are operating over
. The matrices
and
are filled up by the six message
symbols
as follows:
and
Let
(40)
so that the message matrix
is given by
(41)
We choose
given by
to be the
(42)
(45)
and
to obtain the data stored in
Finally, it processes
the failed node as explained in the proof of Theorem 4 above.
B. Systematic Version of the Code
It was pointed out in Section III, that every exact-regenerating
code has a systematic version and further, that the code could be
made systematic through a process of message-symbol remapping. In the following, we make this more explicit in the context
of the product-matrix MSR code.
be the
submatrix of , containing the rows
Let
of corresponding to the nodes which are chosen to be made
symbols stored in these nodes are
systematic. The set of
matrix
. Let be a
given by the elements of the
matrix containing the
source symbols. We map
(46)
and solve for the entries of
in terms of the symbols in .
This is precisely the data-reconstruction process that takes place
RASHMI et al.: OPTIMAL EXACT-REGENERATING CODES FOR DISTRIBUTED STORAGE AT THE MSR AND MBR POINTS
respectively, so that
in code
at the MSR point. Furthermore if
in code . If is linear, so is .
The corollary below follows from Corollary 7 above.
5235
5236
2) Striping: The codes presented here divide the entire mes. Since each
sage into stripes of sizes corresponding to
stripe is of minimal size, the complexity of encoding, data-reconstruction and regeneration operations, are considerably lowered, and so are the buffer sizes required at data-collectors and
replacement nodes. Furthermore, the operations that need to be
performed on each stripe are identical and independent, and
hence can be performed in parallel efficiently by a GPU/FPGA/
multi-core processor.
3) Choice of the Encoding Matrix : The encoding matrix
, for both the codes described, can be chosen as a Vandermonde matrix. Then each encoding vector can be described by
just a scalar. Moreover with this choice, the encoding, datareconstruction, and regeneration operations are, for the most
part, identical to encoding or decoding of conventional ReedSolomon codes.
, we have from
(47)
(48)
Let be a
message symbols
matrix4 given by
(49)
Next, let be a
chosen such that
and
a scalar
(50)
Let
be the
VII. CONCLUSIONS
In this paper, an explicit MBR code for all values of the
system parameters
, and an explicit MSR code for all
are presented. Both
parameters satisfying
constructions are based on a common product-matrix framework introduced in this paper, and possess attributes that make
them attractive from an implementation standpoint. To the best
of our knowledge, these are the first explicit constructions of
exact-regenerating codes that allow to take any value independent of the other parameters; this results in a host of desirable properties such as the ability to optimally handle multiple
simultaneous node failures as well as the ability of allowing the
total number of storage nodes in the system to vary with time.
Our results also prove that the MBR point on the storage-repair
bandwidth tradeoff is achievable under the additional constraint
of exact-regeneration for all values of the system parameters,
and that the MSR point is achievable under exact-regeneration
.
for all
APPENDIX A
DESCRIPTION OF A PREVIOUSLY CONSTRUCTED MSR CODE IN
THE PRODUCT-MATRIX FRAMEWORK
An explicit code that performs data-reconstruction, and
exact-regeneration of the systematic nodes is provided in [6],
.
for the MSR point with parameters
Subsequently, it was shown in [7] that for this set of parameters,
the code introduced in [6] for exact-regeneration of only the
systematic nodes can also be used for exact-regeneration of
the nonsystematic (parity) nodes, provided repair construction
schemes are appropriately designed. Such an explicit repair
scheme is indeed designed and presented in [7]. In this section,
we provide a simpler description of this code in the product-matrix framework.
, since the
As in [6], [7], we begin with the case
code as well as both data-reconstruction and exact-regeneration
algorithms can be extended to larger values of by making use
of Corollary 8.
(51)
The code constructed in [6], [7] can be verified to have an
alternate description as the collection of code matrices of the
form
(52)
Note that the first nodes store the message symbols in uncoded
form and hence correspond to the systematic nodes. A simple
description of the exact-regeneration and data-reconstruction
properties of the code is presented below.
Theorem 9 (Exact-Regeneration): In the code presented,
exact-regeneration of any failed node can be achieved by downnodes.
loading one symbol each from the remaining
used in the exactProof: In this construction, the vector
regeneration of a failed node is composed of the first
symbols of
.
1) Exact-Regeneration of Systematic Nodes: Consider
regeneration of the th systematic node. The symbols thus
. The replacement
desired by the replacement node are
node obtains the following
symbols from the remaining
nodes:
(53)
matrix which is the identity matrix
where is a
with th row removed. Since is full rank by construction, the
replacement node has access to
(54)
RASHMI et al.: OPTIMAL EXACT-REGENERATING CODES FOR DISTRIBUTED STORAGE AT THE MSR AND MBR POINTS
5237
From (53) and (54), we see that the replacement node has access
to
(55)
(60)
, the
matrix on the left is nonsingular.
Since
,
This allows the replacement node to recover the symbols
desired.
which are precisely the set of symbols
2) Exact-Regeneration of Non-Systematic Nodes: Let
be the row of corresponding to the failed node. Then the
symbols stored in the failed node are
. The replacesymbols
ment node requests and obtains the following
from the remaining nodes:
Now as
is nonsingular, being a
subma.
trix of a Cauchy matrix, the data-collector can recover
In this way, the data-collector has recovered all the entries in
the rows of indexed by , as well as all the entries in the
columns of indexed by . Clearly, the same statement holds
when is replaced by . Thus the data-collector has access to
the product:
(61)
(57)
Again,
is nonsingular, and this enables the data-collector
. It is easy to see that since
to recover
, from the diagonal elements of this matrix, all the diagonal
can be obtained. The nondiagonal elements
elements of
and
for
,
are however of the form
,
. Again since
, all the nondiagonal elements
can also be decoded. In this way, the data-collector
of
has recovered all the entries of .
(58)
APPENDIX B
EQUIVALENT CODES AND CONVERSION OF NONSYSTEMATIC
CODES TO SYSTEMATIC
(56)
where
is the submatrix of containing the
rows
corresponding to the remaining nonsystematic nodes. This gives
and therefore to
the replacement node access to
(62)
block generator matrix
where the
is composed of the component generator submatrices
each of size
, and associated to a distinct node.5 Let
denote the column-space of . A little thought will show that
a distributed storage code is an exact-regenerating code iff:
,
1) for every subset of nodes
(59)
Thus, the data-collector has access to the rows of indexed
by the entries of and consequently, has access to the correas well.
sponding columns of
Consider the columns of
indexed by .
are known, the dataSince the entries of these columns in
. Now since the rows of
collector has access to
and
2) for every subset of
the subspaces
nodes
contain a vector
,
such that
is
5238
..
where the
pre-multiplication matrix , and the
post-multiplication block diagonal matrix comprising of
matrices
, are nonsingular. Clearly, equivthe
alent codes have identical data-reconstruction and regeneration
properties.
Systematic Version of Exact-Regenerating Codes: It also
follows that any exact-regenerating code is equivalent to a systematic, exact-regenerating code. To see this, suppose the set of
nodes to be systematic are the first nodes. Let
, there
-length vector
(63)
is the corresponding set of code symbols. It
where
follows that if we wish to encode in such a way that the code
, the input
is systematic with respect to code symbols
to be fed to the generator matrix is
(66)
Proof: Rewriting the symbols passed by the helper node
(67)
(68)
APPENDIX C
INTERFERENCE ALIGNMENT IN THE
PRODUCT-MATRIX MSR CODE
The concept of interference alignment was introduced in [21]
and [22] in the context of wireless communication. This concept
was subsequently used to construct regenerating codes in [6],
[7], [11], [14]. Furthermore, [6], [14] showed that interference
alignment is in fact, a necessary ingredient of any linear MSR
code. Since the product-matrix MSR construction provided in
the present paper does not explicitly use the concept of interference alignment, a natural question that arises is how does interference alignment manifest itself in this code. We answer this
question in the present section.
Consider repair of a failed node (say, node ) in a distributed storage system employing an MSR code, and let nodes
(69)
(70)
where (68) follows from the symmetry of matrices
By construction, the values of the scalars
distinct, which allows us to write
and
.
are
(71)
-length vectors
Also, since the
are linearly independent by construction, for
such that
exist scalars
, there
(72)
RASHMI et al.: OPTIMAL EXACT-REGENERATING CODES FOR DISTRIBUTED STORAGE AT THE MSR AND MBR POINTS
, we can write
(73)
(74)
(75)
where (74) follows from (72), and (75) follows from (71).
REFERENCES
[1] D. A. Patterson, G. Gibson, and R. H. Katz, A case for redundant
arrays of inexpensive disks (RAID), in Proc. ACM SIGMOD Int. Conf.
Management of Data, Chicago, IL, Jun. 1988, pp. 109116.
[2] S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz, Pond: The OceanStore prototype, in Proc. 2nd USENIX
Conf. File and Storage Technologies (FAST), 2003, pp. 114.
[3] R. Bhagwan, K. Tati, Y. C. Cheng, S. Savage, and G. M. Voelker,
Total recall: System support for automated availability management,
in Proc. 1st Conf. Networked Systems Design and Implementation
(NSDI), 2004.
[4] A. G. Dimakis, P. B. Godfrey, M. Wainwright, and K. Ramchandran,
Network coding for distributed storage systems, in Proc. 26th IEEE
Int. Conf. Computer Communications (INFOCOM), Anchorage, AK,
May 2007, pp. 20002008.
[5] Y. Wu, A. G. Dimakis, and K. Ramchandran, Deterministic regenerating codes for distributed storage, in Proc. 45th Annu. Allerton
Conf. Control, Computing, and Communication, Urbana-Champaign,
IL, Sep. 2007.
[6] N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran, Explicit codes minimizing repair bandwidth for distributed storage, in
Proc. IEEE Information Theory Workshop (ITW), Cairo, Egypt, Jan.
2010.
[7] C. Suh and K. Ramchandran, Exact-repair MDS codes for distributed
storage using interference alignment, in Proc. IEEE Int. Symp. Information Theory (ISIT), Austin, TX, Jun. 2010, pp. 161165.
[8] Y. Wu, Existence and construction of capacity-achieving network
codes for distributed storage, IEEE J. Select. Areas Commun., vol.
28, no. 2, pp. 277288, Feb. 2010.
[9] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, Network coding for distributed storage systems, IEEE
Trans. Inf. Theory, vol. 56, no. 9, pp. 45394551, Sep. 2010.
[10] A. Duminuco and E. Biersack, A practical study of regenerating codes
for peer-to-peer backup systems, in Proc. 29th IEEE Int. Conf. Distributed Computing Systems (ICDCS), Jun. 2009, pp. 376384.
[11] Y. Wu and A. Dimakis, Reducing repair traffic for erasure
coding-based storage via interference alignment, in Proc. IEEE
Int. Symp. Information Theory (ISIT), Seoul, South Korea, Jul. 2009,
pp. 22762280.
[12] K. V. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran, Explicit construction of optimal exact regenerating codes for distributed
storage, in Proc. 47th Annu. Allerton Conf. Communication, Control,
and Computing, Urbana-Champaign, IL, Sep. 2009, pp. 12431249.
5239
K. V. Rashmi received the M.E. degree from the Indian Institute of Science
(IISc), Bangalore, in 2010.
Her research interests include coding theory, information theory, networks,
communications and signal processing, with a current focus on coding for data
storage networks and network coding.
Nihar B. Shah received the M.E. degree from the Indian Institute of Science
(IISc), Bangalore, in 2010.
His research interests include coding and information theory, algorithms, and
statistical inference.
Mr. Shah is a recipient of the Prof. S.V.C. Aiya Medal for the best master-ofengineering student in the ECE Department at IISc, 2010.