Coding Techniques For Networked Distributed Storage Systems
Coding Techniques For Networked Distributed Storage Systems
F. Oggier (NTU)
MTNS 2012
1 / 35
Outline
F. Oggier (NTU)
MTNS 2012
2 / 35
A data owner wants to store data over a network of nodes (e.g. data
center, back-up or archival in peer-to-peer networks).
Redundancy is essential for resilience (Failure is the norm, not the
exception).
Data from Los Alamos National Laboratory (Dependable Systems and
Networks, 2006), gathered over 9 years, 4750 machines and 24101
CPUs. Distribution of failures:
Hardware 60%,
Software 20%,
Network/Environment/Humans 5%,
F. Oggier (NTU)
MTNS 2012
3 / 35
http://www.emc.com/about/news/press/2011/20110628-01.htm
F. Oggier (NTU)
MTNS 2012
4 / 35
F. Oggier (NTU)
MTNS 2012
5 / 35
Erasure Codes
F. Oggier (NTU)
MTNS 2012
6 / 35
F. Oggier (NTU)
MTNS 2012
7 / 35
F. Oggier (NTU)
MTNS 2012
8 / 35
Nodes may go offline, or may fail, so that the data they store
becomes unavailable.
Redundancy needs to be replenished, else data may be permanently
lost over time (after multiple storage node failures)
F. Oggier (NTU)
MTNS 2012
9 / 35
F. Oggier (NTU)
MTNS 2012
10 / 35
Related work
1
MTNS 2012
11 / 35
Regenerating Codes
Based on Network Coding (max flow-min cut argument) on top of an
MDS (n, k) erasure code.
Characterize storage overhead - repair bandwidth trade-off.
Number of contacted live nodes to repair is at least k.
F. Oggier (NTU)
MTNS 2012
12 / 35
1.45
1.4
1.35
Storage ()
1.3
1.25
1.2
1.15
1.1
1.05
1
0.95
1.2
F. Oggier (NTU)
1.4
1.6
1.8
2
2.2
Repair cost ()
2.4
2.6
2.8
MTNS 2012
13 / 35
F. Oggier (NTU)
MTNS 2012
14 / 35
Outline
F. Oggier (NTU)
MTNS 2012
15 / 35
F. Oggier (NTU)
MTNS 2012
16 / 35
F. Oggier (NTU)
MTNS 2012
17 / 35
F. Oggier (NTU)
MTNS 2012
18 / 35
F. Oggier (NTU)
MTNS 2012
19 / 35
F. Oggier (NTU)
MTNS 2012
20 / 35
F. Oggier (NTU)
MTNS 2012
21 / 35
F. Oggier (NTU)
MTNS 2012
22 / 35
Static resilience
F. Oggier (NTU)
MTNS 2012
23 / 35
1
0.9
0.8
0.7
pobj
0.6
0.5
0.4
0.3
0.2
EC(63,5)
SRC(63,5)
EC(31,5)
SRC(31,5)
0.1
0
0.1
0.2
0.3
0.4
pfrag
0.5
0.6
0.7
0.8
MTNS 2012
24 / 35
F. Oggier (NTU)
MTNS 2012
25 / 35
A parallelized schedule:
node
Time 1
Time 2
p(w 0 )
p(w 7 )
p(w 9 )
F. Oggier (NTU)
p(w 1 )
p(w 8 )
p(w 10 )
p(w 2 )
p(w 9 )
p(w 11 )
p(w 3 )
p(w 13 )
p(w 8 )
p(w 4 )
p(w 11 )
p(w 13 )
p(w 5 )
p(w 12 )
p(w 14 )
p(w 6 )
p(w 10 )
p(w 7 )
MTNS 2012
26 / 35
node
basis vectors
data stored
N1
v1 = (1000), v2 = (0110)
{o1 , o2 + o3 }
N2
v3 = (0100), v4 = (0011)
{o2 , o3 + o4 }
N3
v5 = (0010), v6 = (1101) {o3 , o1 + o2 + o4 }
N4
v7 = (0001), v8 = (1010)
{o4 , o1 + o3 }
N5 v9 = (1100), v10 = (0101) {o1 + o2 , o2 + o4 }
F. Oggier (NTU)
MTNS 2012
27 / 35
Outline
F. Oggier (NTU)
MTNS 2012
28 / 35
F. Oggier (NTU)
MTNS 2012
29 / 35
F. Oggier (NTU)
MTNS 2012
30 / 35
Data Insertion
F. Oggier (NTU)
MTNS 2012
31 / 35
40
30
20
10
60
>4h
>6h
>12h
50
40
30
20
10
F. Oggier (NTU)
50
>4h
>6h
>12h
40
30
20
10
0
RndFlw RndDta MinFlw MinDta
Scheduling Policy
60
percentage (%)
>4h
>6h
>12h
50
percentage (%)
percentage (%)
60
0
RndFlw RndDta MinFlw MinDta
Scheduling Policy
MTNS 2012
32 / 35
Summary
F. Oggier (NTU)
MTNS 2012
33 / 35
Future/ongoing work
Efficient decoding, other
instances of SRC
Implementation &
integration in a distributed
storage system
Various
systems/algorithmic
issues: Topology
optimized placement,
repair scheduling
MTNS 2012
34 / 35
Q&A
More information:
http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/
F. Oggier (NTU)
MTNS 2012
35 / 35