0% found this document useful (0 votes)
42 views35 pages

Coding Techniques For Networked Distributed Storage Systems

The document discusses coding techniques for distributed networked storage systems. It introduces erasure codes as an efficient way to provide redundancy compared to replication. Self-repairing codes are introduced that allow a missing block to be repaired by contacting only 2 or 3 nodes, independently of which blocks are missing. Two constructions of self-repairing codes are presented - homomorphic self-repairing codes based on polynomial evaluation, and those based on projective geometry using inner products. Self-repairing codes provide higher resilience to failures than erasure codes as they allow more repair opportunities.

Uploaded by

Vaziel Ivans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views35 pages

Coding Techniques For Networked Distributed Storage Systems

The document discusses coding techniques for distributed networked storage systems. It introduces erasure codes as an efficient way to provide redundancy compared to replication. Self-repairing codes are introduced that allow a missing block to be repaired by contacting only 2 or 3 nodes, independently of which blocks are missing. Two constructions of self-repairing codes are presented - homomorphic self-repairing codes based on polynomial evaluation, and those based on projective geometry using inner products. Self-repairing codes provide higher resilience to failures than erasure codes as they allow more repair opportunities.

Uploaded by

Vaziel Ivans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Coding Techniques

for Networked Distributed Storage Systems


Frederique Oggier
Joint work with Anwitaman Datta and Llus P`amies-Juarez
Nanyang Technological University, Singapore

MTNS 2012, Melbourne

F. Oggier (NTU)

Coding for Storage

MTNS 2012

1 / 35

Outline

Coding for Distributed Networked Storage

Self-Repairing Codes: Constructions and Properties

A Little Bit of Practice

F. Oggier (NTU)

Coding for Storage

MTNS 2012

2 / 35

Coding for Distributed Networked Storage

Distributed Networked Storage

A data owner wants to store data over a network of nodes (e.g. data
center, back-up or archival in peer-to-peer networks).
Redundancy is essential for resilience (Failure is the norm, not the
exception).
Data from Los Alamos National Laboratory (Dependable Systems and
Networks, 2006), gathered over 9 years, 4750 machines and 24101
CPUs. Distribution of failures:
Hardware 60%,
Software 20%,
Network/Environment/Humans 5%,

Failures occurred between once a day to once a month.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

3 / 35

Coding for Distributed Networked Storage

Whats New: More Numbers


As of June 2011, a study sponsored by the information storage
company EMC estimates that the worlds data is more than doubling
every 2 years, and reaching 1.8 zettabytes (1 zettabyte=1021 bytes)
of data to be stored in 2011.1

If you store this data on


DVDs, the stack would
reach from the earth to
the moon and back.

http://www.emc.com/about/news/press/2011/20110628-01.htm
F. Oggier (NTU)

Coding for Storage

MTNS 2012

4 / 35

Coding for Distributed Networked Storage

Redundancy through Coding


Replication: good availability and durability, but very costly.
Erasure codes: good trade-off of availability, durability and storage
cost.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

5 / 35

Coding for Distributed Networked Storage

Erasure Codes

A map that takes as input k blocks of data and outputs n blocks of


data, n k of them thus giving redundancy.
An (n, k) erasure code is characterized by (1) how many blocks are
needed to decode (recover) the k blocks of original data - if any
choice of k encoded blocks can do, the code is called maximum
distance separable (MDS) and (2) its rate k/n (or storage overhead
n/k).
3 way replication is a (3,1) erasure code.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

6 / 35

Coding for Distributed Networked Storage

Erasure codes for communication

F. Oggier (NTU)

Coding for Storage

MTNS 2012

7 / 35

Coding for Distributed Networked Storage

Erasure codes for storage systems

F. Oggier (NTU)

Coding for Storage

MTNS 2012

8 / 35

Coding for Distributed Networked Storage

Codes for Storage: Repair

Nodes may go offline, or may fail, so that the data they store
becomes unavailable.
Redundancy needs to be replenished, else data may be permanently
lost over time (after multiple storage node failures)

F. Oggier (NTU)

Coding for Storage

MTNS 2012

9 / 35

Coding for Distributed Networked Storage

Repair process using traditional Erasure Codes

F. Oggier (NTU)

Coding for Storage

MTNS 2012

10 / 35

Coding for Distributed Networked Storage

Related work
1

J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R.


Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao.
OceanStore: An Architecture for Global-Scale Persistent Storage, ASPLOS 2000.

H. Weatherspoon, J. Kubiatowicz. Erasure Coding Vs. Replication: A


Quantitative Comparison, Peer-to-Peer Systems, LNCS, 2002.

A. G. Dimakis, P. Brighten Godfrey, M. J. Wainwright, K. Ramchandran, The


Benefits of Network Coding for Peer-to-Peer Storage Systems, Netcod 2007.

A. Duminuco, E. Biersack, Hierarchical Codes: How to Make Erasure Codes


Attractive for Peer-to-Peer Storage Systems, Peer-to-Peer Computing (P2P), 2008.

K. V. Rashmi, N. B. Shah, P. V. Kumar and K. Ramchandran, Explicit


Construction of Optimal Exact Regenerating Codes for Distributed Storage,
Allerton Conf. on Control, Computing and Comm., 2009.

A.-M. Kermarrec, N. Le Scouarnec, G. Straub, Repairing Multiple Failures with


Coordinated and Adaptive Regenerating Codes, NetCod 2011.

K. W. Shum, Cooperative regenerating codes for distributed storage systems


Communications, ICC 2011.
F. Oggier (NTU)

Coding for Storage

MTNS 2012

11 / 35

Coding for Distributed Networked Storage

Regenerating Codes
Based on Network Coding (max flow-min cut argument) on top of an
MDS (n, k) erasure code.
Characterize storage overhead - repair bandwidth trade-off.
Number of contacted live nodes to repair is at least k.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

12 / 35

Coding for Distributed Networked Storage

Collaborative Regenerating Codes


Allow collaboration among new comers.
Improve the storage overhead - repair bandwidth trade-off.
Tolerates multiple faults.
Benefit of collaboration: StorageBandwidth tradeoff
t=1
t=4
t=8

1.45
1.4
1.35

Storage ()

1.3
1.25
1.2
1.15
1.1
1.05
1
0.95
1.2

F. Oggier (NTU)

Coding for Storage

1.4

1.6

1.8

2
2.2
Repair cost ()

2.4

2.6

2.8

MTNS 2012

13 / 35

Coding for Distributed Networked Storage

Codes for Storage: Wish List

Low storage overhead,


Good fault tolerance,
Low repair bandwith cost,
Low repair time,
Low complexity,
I/O
...

F. Oggier (NTU)

Coding for Storage

MTNS 2012

14 / 35

Self-Repairing Codes: Constructions and Properties

Outline

Coding for Distributed Networked Storage

Self-Repairing Codes: Constructions and Properties

A Little Bit of Practice

F. Oggier (NTU)

Coding for Storage

MTNS 2012

15 / 35

Self-Repairing Codes: Constructions and Properties

Self-Repairing Codes (SRC)


Motivation: minimize the number of nodes necessary to repair a
missing block.
The minimum is 2, cannot be achieved without sacrificing the MDS
property.

Self-repairing codes are (n, k) codes such that a fragment can be


repaired from a fixed number of encoded fragments (typically 2 or 3),
independently of which specific blocks are missing.
Fits in the category of locally repairable codes.
P. Gopalan, C. Huang, H. Simitci, S. Yekhanin, On the Locality of
Codeword Symbols.
A. S. Rawat, S. Vishwanath, On Locality in Distributed Storage
Systems.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

16 / 35

Self-Repairing Codes: Constructions and Properties

Self-Repairing Codes (a black-box view)

F. Oggier (NTU)

Coding for Storage

MTNS 2012

17 / 35

Self-Repairing Codes: Constructions and Properties

Homomorphic SRC (HSRC)

A first instance of self-repairing code.


Based on polynomial evaluation.
An object is cut into k pieces, which represent coefficients of a
polynomial p. The k pieces are mapped to n encoded fragments, by
performing n polynomial evaluations (p(1 ), . . . , p(n )).

Self-repairing Homomorphic Codes for Distributed Storage Systems


F. Oggier, A. Datta, INFOCOM 2011

F. Oggier (NTU)

Coding for Storage

MTNS 2012

18 / 35

Self-Repairing Codes: Constructions and Properties

HSRC: Encoding Illustration

F. Oggier (NTU)

Coding for Storage

MTNS 2012

19 / 35

Self-Repairing Codes: Constructions and Properties

HSRC: Decoding and Repair

Decoding is ensured by Lagrange interpolation.

Repair: p(a + b) = p(a) + p(b).

Computational cost of a repair: XORs.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

20 / 35

Self-Repairing Codes: Constructions and Properties

Self-Repairing Codes from Projective Geometry (PSRC)

A second instance of self-repairing code, based on spreads.


Spread=partition of the space into subspaces, nodes store inner
product of the data with basis vectors of subspaces.

Self-Repairing Codes for Distributed Storage - A Projective Geometric


Construction, F. Oggier, A. Datta, ITW 2011

F. Oggier (NTU)

Coding for Storage

MTNS 2012

21 / 35

Self-Repairing Codes: Constructions and Properties

PSRC: A toy example

Object o = (o1 , o2 , o3 , o4 ) to be stored.


node
basis vectors
data stored
N1
v1 = (1000), v2 = (0110)
{o1 , o2 + o3 }
N2
v3 = (0100), v4 = (0011)
{o2 , o3 + o4 }
N3
v5 = (0010), v6 = (1101) {o3 , o1 + o2 + o4 }
N4
v7 = (0001), v8 = (1010)
{o4 , o1 + o3 }
N5 v9 = (1100), v10 = (0101) {o1 + o2 , o2 + o4 }

F. Oggier (NTU)

Coding for Storage

MTNS 2012

22 / 35

Self-Repairing Codes: Constructions and Properties

Static resilience

There is at least one pair to repair a node, for up to (n 1)/2


simultaneous failures
Static resilience of a distributed storage system is the probability that
an object stored in the system stays available without any further
maintenance, even when a fraction of nodes become unavailable.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

23 / 35

Self-Repairing Codes: Constructions and Properties

Static resilience: SRC versus EC

1
0.9
0.8
0.7

pobj

0.6
0.5
0.4
0.3
0.2
EC(63,5)
SRC(63,5)
EC(31,5)
SRC(31,5)

0.1
0

0.1

0.2

0.3

0.4
pfrag

0.5

0.6

0.7

0.8

Figure: Static resilience of self-repairing codes (SRC): HSRC/PSRC in


comparison with erasure codes (EC)
F. Oggier (NTU)

Coding for Storage

MTNS 2012

24 / 35

Self-Repairing Codes: Constructions and Properties

More on Resilience: SRC versus EC

Figure: Static resilience of self-repairing codes (HSRC/PSRC): Comparison with


erasure codes (EC)

F. Oggier (NTU)

Coding for Storage

MTNS 2012

25 / 35

Self-Repairing Codes: Constructions and Properties

Fast & parallel repairs using HSRC: A toy example


Consider:
(15,4) code, nodes storing p(w i ) for i = 0, 1, 2, 3, 4, 5, 6 are missing
Nodes have upload/download bandwidth limit: one block per time unit

Possible pairs to repair each missing block:


fragment
p(1)
p(w )
p(w 2 )
p(w 3 )
p(w 4 )
p(w 5 )
p(w 6 )

suitable pairs to reconstruct


(p(w 7 ), p(w 9 ));(p(w 11 ), p(w 12 ))
(p(w 7 ), p(w 14 ));(p(w 8 ), p(w 10 ))
7
(p(w ), p(w 12 ));(p(w 9 ), p(w 11 ));(p(w 12 ), p(w 10 ))
(p(w 8 ), p(w 13 ));(p(w 10 ), p(w 12 ))
(p(w 9 ), p(w 14 ));(p(w 11 ), p(w 13 ))
(p(w 7 ), p(w 13 ));(p(w 12 ), p(w 14 ))
(p(w 7 ), p(w 10 ));(p(w 8 ), p(w 14 ))

A parallelized schedule:
node
Time 1
Time 2

p(w 0 )
p(w 7 )
p(w 9 )

F. Oggier (NTU)

p(w 1 )
p(w 8 )
p(w 10 )

p(w 2 )
p(w 9 )
p(w 11 )

p(w 3 )
p(w 13 )
p(w 8 )

Coding for Storage

p(w 4 )
p(w 11 )
p(w 13 )

p(w 5 )
p(w 12 )
p(w 14 )

p(w 6 )
p(w 10 )
p(w 7 )

MTNS 2012

26 / 35

Self-Repairing Codes: Constructions and Properties

Systematic Object Retrieval using PSRC: A toy example

node
basis vectors
data stored
N1
v1 = (1000), v2 = (0110)
{o1 , o2 + o3 }
N2
v3 = (0100), v4 = (0011)
{o2 , o3 + o4 }
N3
v5 = (0010), v6 = (1101) {o3 , o1 + o2 + o4 }
N4
v7 = (0001), v8 = (1010)
{o4 , o1 + o3 }
N5 v9 = (1100), v10 = (0101) {o1 + o2 , o2 + o4 }

F. Oggier (NTU)

Coding for Storage

MTNS 2012

27 / 35

A Little Bit of Practice

Outline

Coding for Distributed Networked Storage

Self-Repairing Codes: Constructions and Properties

A Little Bit of Practice

F. Oggier (NTU)

Coding for Storage

MTNS 2012

28 / 35

A Little Bit of Practice

A More Realistic Scenario

A network with 1000 (full duplex) nodes,


10 000 objects of size 1GB are stored,
Multiple failures.
Pipelined codes are also considered.

F. Oggier (NTU)

Coding for Storage

MTNS 2012

29 / 35

A Little Bit of Practice

Simulation Results: Storage of Multiple Objects

F. Oggier (NTU)

Coding for Storage

MTNS 2012

30 / 35

A Little Bit of Practice

Data Insertion

Replication: To store a new object, a source node uploads one replica


to a first node, which can concurrently forward it to another storage
node, etc
Erasure Codes: The source node computes and uploads the encoded
fragments to the corresponding storage nodes.
Issue: insertion time, possibly worsened by mismatched temporal
constraints (e.g. F2F).
In-Network Redundancy Generation for Opportunistic Speedup of Backup, L.
Pamies-Juarez, A. Datta, F. Oggier preprint

F. Oggier (NTU)

Coding for Storage

MTNS 2012

31 / 35

A Little Bit of Practice

Simulation Results: In-Network Coding


IM traces for F2F scenario
Figure (1): storage throughput increases with node availability. Figure
(2): total traffic increases, scales with storage throughput. Figure (3):
reduction of data upload at the source, up to 40%.

40
30
20
10

60

>4h
>6h
>12h

50
40
30
20
10

F. Oggier (NTU)

50

>4h
>6h
>12h

40
30
20
10

0
RndFlw RndDta MinFlw MinDta
Scheduling Policy

60
percentage (%)

>4h
>6h
>12h

50

percentage (%)

percentage (%)

60

0
RndFlw RndDta MinFlw MinDta
Scheduling Policy

Coding for Storage

RndFlw RndDta MinFlw MinDta


Scheduling Policy

MTNS 2012

32 / 35

A Little Bit of Practice

Summary

Erasure Codes for Communication vs for Networked Distributed


Storage Systems
Fault tolerance vs Storage Overhead
Repairability
Repair Bandwidth vs Local Repair
Other issues, such as Data Insertion

F. Oggier (NTU)

Coding for Storage

MTNS 2012

33 / 35

A Little Bit of Practice

Future/ongoing work
Efficient decoding, other
instances of SRC
Implementation &
integration in a distributed
storage system
Various
systems/algorithmic
issues: Topology
optimized placement,
repair scheduling

and design of new codes...


F. Oggier (NTU)

Coding for Storage

MTNS 2012

34 / 35

A Little Bit of Practice

Q&A

More information:
http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/

F. Oggier (NTU)

Coding for Storage

MTNS 2012

35 / 35

You might also like