0% found this document useful (0 votes)

17 views53 pages

CS2510 00 Distributed Storage Overview

Uploaded by

palopoindowar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views53 pages

CS2510 00 Distributed Storage Overview

Uploaded by

palopoindowar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Distributed and

Federated Storage
How to store things… in… many places... (maybe)

CS2510
Presented by: wilkie
[email protected]

University of Pittsburgh
Recommended Reading (or Skimming)
• NFS: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.473
• WAFL: https://dl.acm.org/citation.cfm?id=1267093
• Hierarchical File Systems are Dead (Margo Seltzer, 2009):
https://www.eecs.harvard.edu/margo/papers/hotos09/paper.pdf
• Chord (Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari
Balakrishnan, 2001):
https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf
• Kademlia (Petar Maymounkov, David Mazières, 2002):
https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf
• BitTorrent Overview: http://web.cs.ucla.edu/classes/cs217/05BitTorrent.pdf
• IPFS (Juan Benet, 2014):
https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/
ipfs.draft3.pdf (served via IPFS, neat)
Network File System
NFS: A Traditional and Classic Distributed File System
Problem
• Storage is cheap.
• YES. This is a problem in a classical sense.
• People are storing more stuff and want very strong storage guarantees.
• Networked (web) applications are global and people want strong availability
and stable speed/performance (wherever in the world they are.) Yikes!

• More data == Greater probability of failure

• We want consistency (correct, up-to-date data)
• We want availability (when we need it)
• We want partition tolerance (even in the presence of downtime)
• Oh. Hmm. Well, heck.
• That’s hard (technically impossible) so what can we do?
Lightning Round: Distributed Storage
• Network File System (NFS)
• We will gloss over details, here, Unreliable

but the papers are definitely

worth a read.
• It invented the Virtual File
System (VFS)
• Basically, though, it is an early
attempt to investigate the
trade-offs for client/server file
consistency

Most Reliable??
NFS System Model
• Each client connects directly to the server. Files could be duplicated
on client-side.

Client

Server
Client

Client
NFS Stateless Protocol
Set of common operations clients can issue: (where is open? close?)

lookup Returns file handle for filename

create Create a new file and return handle
remove Removes a file from a directory
getattr Returns file attributes (stat)
setattr Sets file attributes
read Reads bytes from file
write Writes bytes to file Commands sent to the
server. (one-way)
Statelessness (Toward Availability)
• NFS implemented an open (standard, well-known) and
stateless (all actions/commands are independent) protocol.

• The open() system call is an example of a stateful protocol.

• The system call looks up a file by a path.
• It gives you a file handle (or file pointer) that represents that file.
• You give that file handle to read or write calls. (not the path)
• The file handle does not directly relate to the file. (A second call to open gives a
different file handle)
• If your machine loses power… that handle is lost… you’ll need to call open again.
Statelessness (Toward Availability)
• Other stateless protocols: HTTP (but not FTP), IP (but not TCP), www
• So, in NFS, we don’t have an open.
• Instead we have an idempotent lookup function.
• Always gives us a predictable file handle. Even if the server crashes and reboots.
• Statelessness also benefits from idempotent read/write functions.
• Sending the same write command twice in a row shouldn’t matter.
• This means ambiguity of server crashes (did it do the thing I wanted?)
doesn’t matter. Just send the command again. No big deal. (kinda)
• NFS’s way of handling duplicate requests. (See Fault Tolerance slides)
• Consider: What about mutual exclusion?? (file locking) Tricky!
Statelessness And Failure (NFS) [best]
A client issues a series of writes to a file located on a particular server.
Client Server
lookup

write(fd, offset: 0, count: 15)

success

write(fd, 15, 15)

success
Local File Remote File

write(fd, 30, 15)

success
Server-side Writes Are Slow
Problem: Writes are really slow…
(Did the server crash?? Should I try again?? Delay… delay… delay)
Client Server
lookup

write(fd, offset, count)

… 1 second …

… 2 seconds? ...

success

Time relates to the amount of data we want to write… is there a good block size?
1KiB? 4KiB? 1MiB? (bigger == slower, harsher failures; small == faster, but more messages)
Server-side Write Cache?
Solution: Cache writes and commit them when we have time.
(Client gets a respond much more quickly… but at what cost? There’s always a trade-off)
Client Server
lookup

write(fd, offset, count)

400 milliseconds.

success

Write Cache:
Need to write this
block at some point!

But what if… it doesn’t?

When should it write it back? Hmm. It is not that obvious.
(Refer to Consistency discussion from previous lectures)
Write Cache Failure (NFS)
A server must commit changes to disk if it tells client it succeeded…
If it did fail, and restarted quickly, the client would never know!
Client Server
lookup

write(fd, 0, 15)

success

write(fd, 15, 15)

success (but server fails before committing cache to disk)

Local File Remote File (oops!)

write(fd, 30, 15)

success
Fault Tolerance
• So, we can allow failure, but only if we know if an operation
succeeded. (we are assuming a strong eventual consistency)
• In this case, writes… but those are really slow. Hmm.
• Hey! We’ve seen this all before…
• This is all fault tolerance basics.
• But this is our chance to see it in practice.
• [a basic conforming implementation of] NFS makes a trade-off.
It gives you distributed data that is reliably stored at the cost of
slow writes.

• Can we speed that up?

Strategies
• Problem: Slow to send data since we must wait for it to be committed.
• Also, we may write (and overwrite) data repeatedly.
• How to mitigate performance?
• Possibility: Send writes in smaller chunks.
• Trade-offs: More messages to/from server.
• Possibility: We can cache writes at the client side.
• Trade-offs:
• Client side may crash.
• Accumulated writes may stall as we send more data at once.
• Overall difficulty in knowing when we writeback.
• Possibility: We mitigate likelihood of failure on server.
• Battery-backed cache, etc. Not perfect, but removes client burden.
• Make disks faster (Just make them as fast as RAM, right? NVRAM?) ☺
• Distribute writeback data to more than one server. (partitioning! Peer-to-peer!!)
File System Structure
From Classic Hierarchical to Non-Traditional
File System Layout (Classical; NFS)
root
• We generally are used to a very
classical layout: directories and files.
• NFS introduced the Virtual File home sys
System, so some directories could be
mounted as remote (or devices)
• Therefore, some file paths have more
latency than others! Interesting.
• We navigate via a path that strictly
relates to the layout of directories as hw1.doc hw2.doc main.c main.h

a tree. (Hierarchical Layout)

/root/home/main.c
File System Layout (Classical; NFS)
• This should be CS1550-ish OS review! main.c
• Files are broken down into inodes
that point to file data. (indirection)
inode
• An inode is a set of pointers to blocks
on disk. (it may need inodes that
point to inodes to keep block sizes
small)
• The smaller the block size, the more
metadata (inodes) required.
• But easier to backup what changes.
• (We’ll see why in a minute)
Cheap Versioning (WAFL+NFS)
• Simply keep copies of prior inodes to maintain a simple snapshot!
We can keep around snapshots and back them up
to remote systems (such as NFS) at our leisure.
snapshot
inode
inode

Once we back them up, we can

overwrite the snapshot inode with the current inode.
Directories and Hierarchies
• Hierarchical directories are based on older types
of computers and operating systems designed
around severe limitations.
• NFS (+VFS) mounts remote servers to directories.
• This is convenient (easy to understand and
configure) for smaller storage networks.
• However, two different files may have the same
name and exist on two different machines.
• How to differentiate? How to find what you want?
Reconsidering Normal (Name-Addressed)
• Currently, many everyday file systems haven’t changed much.
• They are name-addressed, that is, you look them up by their name.
• File lookups in hierarchies require many reads from disparate parts of
disk as you open and read metadata for each directory.
• This can be slow. OSes have heavy complexity and caching for directories.
• Now, consider distributed file systems… if directories span machines!
• There are other approaches. Margo Seltzer in Hierarchical File
Systems are Dead suggests a tag-based approach more in line with
databases: offering indexing and search instead of file paths.
Content Addressing
• However, one approach “flips the script” and allows file lookups to be
done on the data of the file.
• That seems counter-intuitive: looking up a file via a representation of
its data. How do you know the data beforehand?

• With content-addressing, the file is stored with a name that is

derived mathematically from its data as a hash. (md5, sha, etc)
• That yields many interesting properties we will take advantage of.
Hash Function Overview
Good Hash Functions:
• Are one-way (non-invertible)
• Cannot compute original 𝑥 from result of ℎ𝑎𝑠ℎ(𝑥)
• Are deterministic
• ℎ𝑎𝑠ℎ(𝑥) is equal to ℎ𝑎𝑠ℎ(𝑥) at any time on any other machine
• Are uniform
• Are hashes have equal probability. That is:
• The set 𝐻 defined by taking a random set and applying ℎ𝑎𝑠ℎ 𝑥 results in a
normal distribution.
• Continuous
• Hashing two similar numbers should result in a dramatically different hash.
• That is: ℎ𝑎𝑠ℎ(𝑥) should be unpredictably distant from ℎ𝑎𝑠ℎ(𝑥 + 1)
Basic Hashing
• For simple integrity, we can simply hash the file.
𝑘 = ℎ𝑎𝑠ℎ(𝑓𝑖𝑙𝑒) is generated. Then key 𝑘 can be used to open the file.

• When distributing the file, one can know it got the file by simply
hashing what it received.
• Since our hash function is deterministic the hash will be the same.
• If it isn’t, our file is corrupted.

• In digital archival circles, this is called fixity.

Chunking
• However, it would be nice to determine which part of the file was
distributed incorrectly.
• Maybe we can ask a different source for just that part.
• Hmm… that’s an idea! (we’ll get there)

• Dividing up the file is called chunking, and there are things to

consider: (trade-offs!)
• How big are the chunks… the more chunks, the more hashes; the more
metadata!
• Of course, the more chunks, the smaller the chunk; therefore, the less
window for detecting corruption!
Chunking
• Take a file, divide it into chunks, hash each chunk.
vacation_video.mov

A B C D E F G H
Distribution (Detecting Failure)
• Client requests the hashes given. But receives chunks with hashes:

vacation_video.mov

A B C D F G H
We can organize a file such that it
can be referred to by a single hash,
Merkle Tree/DAG but also be divided up into more
easily shared chunks.
The hash of each node is the hash
of the hashes it points to
N6
𝑁4 = ℎ𝑎𝑠ℎ(𝑁0 + 𝑁1) 𝑁5 = ℎ𝑎𝑠ℎ(𝑁2 + 𝑁3)
𝑁6 = ℎ𝑎𝑠ℎ(𝑁4 + 𝑁5)
N4 N5
𝑁2 = ℎ𝑎𝑠ℎ(𝐸 + 𝐹)
𝑁0 = ℎ𝑎𝑠ℎ(𝐴 + 𝐵) 𝑁1 = ℎ𝑎𝑠ℎ(𝐶 + 𝐷) 𝑁3 = ℎ𝑎𝑠ℎ(𝐺 + 𝐻)

N0 N1 N2 N3
vacation_video.mov

A B C D E F G H
Merkle-based Deduplication
• Updating a chunk ripples.
• But leaves N9
intact
parts N4 N8
alone!

N0 N1 N7 N3
vacation_video.mov

A B C D R F G H
vacation_video.mov (v1)
01774f1d8f6621ccd7a7a845525e4157

Deduplication vacation_video.mov (v2)

d624ab69908b8148870bbdd0d6cd3799
• Both versions of the N6
file can co-exist N9
without
duplicating N4 N5 N8
their
content.

N0 N1 N2 N7 N3

A B C D E R F G H
(N1) 01774f1d8f6621ccd7a7a845525e4157
Distribution
{N4, N5}

• I can ask a storage (N4) aa7e074434e5ae507ec22f9f1f7df656

server for the file at
that hash. {N0, N1}
• It will give me the sub
(N1) aa7e074434e5ae507ec22f9f1f7df656
hashes.
• At each step, I can {C, D}
verify the information
by hashing what I (D) 495aa31ae809642160e38868adc7ee8e
downloaded!
D’s File Data
(N1) 01774f1d8f6621ccd7a7a845525e4157
Distribution
{N4, N5}

• Nothing is stopping me (N4) aa7e074434e5ae507ec22f9f1f7df656

from asking multiple
servers. {N0, N1}
• But how do I know
(N1) aa7e074434e5ae507ec22f9f1f7df656
which servers have
which chunk?? Hmm.
{C, D}

(C) 0bdba65117548964bad7181a1a9f99e4 (D) 495aa31ae809642160e38868adc7ee8e

C’s File Data D’s File Data } Concurrently gather

two chunks at once!
Peer-to-peer Systems
BitTorrent, Kademlia, and IPFS: Condemned yet Coordinated.
BitTorrent
• A basic peer-to-peer system based on block swapping.
• These days built on top of Distributed Hash Tables (DHTs)
• Known in non-technical circles for its use within software piracy.
• But it, or something similar, is used often!
• Blizzard has game download and WoW updates happen via BitTorrent.
• Many Linux distributions allow downloading them via BitTorrent.

• AT&T said in 2015 that BitTorrent represented around 20% of total

broadband bandwidth: https://thestack.com/world/2015/02/19/att-patents-
system-to-fast-lane-bittorrent-traffic/
• I’m actually a bit skeptical.
BitTorrent System Model
When a file is requested, a well-known node yields a peer list.
Our node serves as both client and server. (As opposed to unidirectional NFS)

main.c Adds “D” to the list.

B D
Possibly: Gossip
{A, B, C} to other nodes.

Possibly: Gossip
C Client/Server
about D to other “Tracker”
nodes downloading
this file.
BitTorrent Block Sharing
• Files are divided into chunks (blocks) and
traded among the different peers.

• As your local machine gathers

blocks, those are available
for other peers, who will
ask you for them.
Client/Server
• You can concurrently download
parts of files from different sources.
• Peers can leave and join this network at any time.
Heuristics for Fairness
• How to choose who gets a block? (No right/obvious answer)
• This is two-sided. How can you trust a server to give you the right thing?
• Some peers are faster/slower than others.
• In an open system: Some don’t play fair. They take but never give back.
• You could prioritize older nodes.
• They are less likely to suddenly disappear.
• They are more likely to cooperate. (The Millennial Struggle, am I right?)
• What if everybody did this… hmm… old nodes shunning young nodes…
• You can only give if the other node gives you a block you need.
• Fair Block/Bit-swapping. Works as long as you have some data.
• Obviously punishes first-timers (who don’t have any data to give)
• Incentivizes longevity with respect to cooperation.
Centralization Problem
• “Tracker” based solution introduces unreliable centralization.

• Getting rid of that (decentralized tracking) means:

• Organizing nodes such that it is easy to find data.
• Yet, also, not requiring knowledge about where that data is.
• And therefore, allowing data to move (migrate) as it sees fit.

• Many possible solutions. Most are VERY interesting and some are
slightly counter-intuitive (hence interesting!)
Distributed Hash Tables (DHT)
• A distributed system devoted to decentralized key/value storage
across a (presumably large or global) network.

• These are “tracker”-less. They are built to not require a centralized

database matching files against peers who have them.

• Early DHTs were motivated by peer-to-peer networks.

• Early systems (around 2001): Chord, Pastry, Tapestry
• All building off one another.
Distributed Hash Tables: Basics
• Files are content-addressed and stored by their hash (key).

• Fulfills one simple function: value = 𝑙𝑜𝑜𝑘𝑢𝑝(𝑘𝑒𝑦)

• However, the value could be anywhere! IN THE WORLD. Hmm.

• Many find a way to relate the key to the location of the server that
holds the value.

• The goal is at 𝑂 log 𝑁 queries to find data.

• Size of your network can increase exponentially as lookup cost increases
linearly. (Good if you want to scale to millions of nodes)
• Peers are given an ID as a hash of their IP
address. (unique, uniform)
Chord DHT • Such nodes maintain information about files
that have hashes that resemble their IDs.
(Distance can be the difference: A-B)
• Nodes also store information about neighbors
of successive distances. (very near, near, far,
very far… etc)
• Organizes metadata across the network to
reduce the problem to a binary search.
• Therefore needs to contact O(log N) servers.
• To find a file, contact the server with an ID
equal or slightly less than the file hash.
16 Node Network • They will then reroute to their neighbors. Repeat.
(image via Wikipedia)
• Nodes are logically organized into a
Chord System Model ring formation sorted by their ID (𝑛).
• IDs increase as one moves clockwise.
• IDs should have the same bit-width as
the keys.
ID = 𝑛 • For our purposes, keys are file hashes.

• Nodes store information about

neighbors with IDs relative to their
own in the form: (𝑚 is key size in bits)
ID near 𝑛 + 24 • 𝑛 + 2𝑖 mod 2𝑚 where 0 ≤ 𝑖 < 𝑚

• Imagine a ring with millions of nodes.

• 2𝑖 diverges quickly!
• Notice how locality is encoded.
Chord: Lookup • Nodes know at most log 𝑚 nodes.
• Nodes know more “nearby” nodes.
(1)

• When performing 𝑙𝑜𝑜𝑘𝑢𝑝 𝑘𝑒𝑦 , the

node only needs to find the node
closest to that key and forward the
request.
(4)

(3) • Let’s say 𝑘𝑒𝑦 is far away from us.

• We will ask the node farthest from us (with
the “nearest” ID less than the key)
(2)

• This node, as before, also knows about

𝑛+ 24 neighbors in a similar fashion.
• Notice it’s own locality! It looks up the
same key. Binary search… 𝑂(log 𝑁) msgs.
• Periodically, the node must check to
Chord: Upkeep, Join ensure it’s perception of the world
(the ring structure) is accurate.

• It can ask its neighbor who their

neighbor is.
• If it reports a node whose ID is closer to
𝑛 + 2𝑖 than they are… use them as that
??
neighbor instead.
Join:
Lookup our node ID to find neighbors
Tell those nodes we exist • This is done when a node enters the
Upkeep will stabilize other nodes system as well.
• All new neighbors receive information
about, and responsibility for, nearby keys.
Problems with Chord
Stabilization isn’t immediate for new nodes

• Maintaining the invariants of the

distributed data structure is hard.
• That is, the ring shape.

• When new nodes enter, they dangle

off of the ring until nodes see them.
Older nodes maintain a stable ring

• That means, it doesn’t handle short-

lived nodes very well.
• Which can be very common for
systems with millions of nodes!
Kademlia (Pseudo Geography)
• Randomly assign yourself a node ID ☺
• Measure distance using XOR: 𝑑 𝑁1 , 𝑁2 = 𝑁1 ⊕ 𝑁2 (Interesting…)
• Unlike arithmetic difference (A – B) no two nodes can have the same distance
to any key.
• XOR has the same properties as Euclidian distance, but cheaper:
• Identity: 𝑑 𝑁1 , 𝑁1 = 𝑁1 ⊕ 𝑁1 = 0
• Symmetry: 𝑑 𝑁1 , 𝑁2 = 𝑑 𝑁2 , 𝑁1 = 𝑁1 ⊕ 𝑁2 = 𝑁2 ⊕ 𝑁1
• Triangle Inequality: 𝑑 𝑁1 , 𝑁2 ≤ 𝑑 𝑁1 , 𝑁3 + 𝑑 𝑁2 , 𝑁3
𝑁1 ⊕ 𝑁2 ≤ 𝑁1 ⊕ 𝑁3 + 𝑁2 ⊕ 𝑁3 … Confounding, but true.
• Once again, we store keys near similar IDs.
• This time, we minimize the distance:
• Store key 𝑘 at any node 𝑛 that minimizes 𝑑 𝑛, 𝑘
Kademlia Network Topology

• Two “neighbors”
may be entirely
across the planet!
(or right next door)
00110

00111
Kademlia Network Topology
• Each node knows about nodes that
have a distance successively larger
00110 than it.
• Recall XOR is distance, so largest distance
occurs when MSB is different.
• It maintains buckets of nodes with IDs
Routing Table k-buckets that share a prefix of 𝑘 bits (matching
0-bit 1-bit 2-bit 3-bit 4-bit MSBs)
• There are a certain number of entries in
10001 01001 00011 00100 00111 each bucket. (not exhaustive)
10100 01100 00010 00101 • The number of entries relates to the
replication amount.
10110 01010 00001
• The overall network is a trie.
11001 01001 00000
• The buckets are subtrees of that trie.
Note: 0-bit list contains half of the overall network!
Kademlia Routing (bucket visualization)
0-bit 1 0

1 0 1 0
1-bit
1 0 1 0 1 0 1 0

1 0 1 0 1 0 1 0 1 0 1 0
2-bit
1 0 1 0 1 0 1 0

“Close”
“Far Away” 3-bit
Kademlia Routing Algorithm
• Ask the nodes we know that are
00110 “close” to 𝑘 to tell as about nodes that
are “close” to 𝑘
• Repeat by asking those nodes which
Routing Table k-buckets nodes are “close” to 𝑘 until we get a
set that say “I know 𝑘!!”
0-bit 1-bit 2-bit 3-bit 4-bit
• Because of our k-bucket scheme, each
10001 01001 00011 00100 00111 step we will look at nodes that share
10100 01100 00010 00101 an increasing number of bits with 𝑘.
10110 01010 00001 • And because of our binary tree, we
11001 01001 00000 essentially divide our search space in half.
Note: 0-bit list contains half of the overall network!
• Search: 𝑂(log 𝑁) queries.
Kademlia Routing Algorithm
• Finding 𝑘 = 00111 from node 00110.
00110 • Easy! Starts with a similar sequence.
• It’s hopefully at our own node, node 00111,
or maybe node 00100…
Routing Table k-buckets • Finding 𝑘 = 11011 from 00110:
• Worst case! No matching prefix!
0-bit 1-bit 2-bit 3-bit 4-bit
• Ask several nodes with IDs starting with 1.
10001 01001 00011 00100 00111 • This is, at worst, half of our network… so we
10100 01100 00010 00101 have to rely on the algorithm to narrow it down.
• It hopefully returns nodes that start with 11 or
10110 01010 00001 better. (which eliminates another half of our
11001 01001 00000 network from consideration)

Note: 0-bit list contains half of the overall network!

• Repeat until a node knows about 𝑘.
Kademlia: Node Introduction
• Contrary to Chord, XOR distance means nodes know exactly where
they fit.
• How “far away” you are from any key doesn’t depend on the other nodes in
the system. (It’s always your ID ⊕ 𝑘𝑒𝑦)
• Regardless the join process is more or less the same:
• Ask an existing node to find your ID, it returns a list of your neighbors.
• Tell your neighbors you exist and get their knowledge of the world
• That is, replicate their keys and k-buckets.
• As nodes contact you, record their ID in the appropriate bucket.
• When do you replace?? Which entries do you replace?? Hmm.
Applications
• IPFS (InterPlanetary File System)
• Divides files into hashes resembling a Merkle DAG.
• Uses a variant of Kademlia to look up each hash and find mirrors.
• Reconstructs files on the client-side by downloading from peers.
• Some very shaky stuff about using a blockchain (distributed ledger) to do
name resolution.
• Is this the next big thing??? (probably not, but it is cool ☺)

File System Interface & Implementation
No ratings yet
File System Interface & Implementation
75 pages
File MGT (Module 11)
No ratings yet
File MGT (Module 11)
99 pages
OS 10-1 File-System Interface Modified by AM
No ratings yet
OS 10-1 File-System Interface Modified by AM
45 pages
Lecture 2 Advanced File Systems
No ratings yet
Lecture 2 Advanced File Systems
66 pages
DC - Unit 3 Uhh Ybhg The G Hai H G BT
No ratings yet
DC - Unit 3 Uhh Ybhg The G Hai H G BT
32 pages
Distributed System DS Unit5
No ratings yet
Distributed System DS Unit5
61 pages
Lecture 08
No ratings yet
Lecture 08
25 pages
3distributed File System
No ratings yet
3distributed File System
42 pages
File Management
No ratings yet
File Management
40 pages
PUSH SDK Communication Protocol V2.0.1
No ratings yet
PUSH SDK Communication Protocol V2.0.1
28 pages
Chap 6
No ratings yet
Chap 6
54 pages
Oschapter 8
No ratings yet
Oschapter 8
27 pages
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
No ratings yet
Lecture 25: Distributed File Systems: Indranil Gupta (Indy)
27 pages
CH 10
No ratings yet
CH 10
42 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
OS Lecture-14 (File Systems)
No ratings yet
OS Lecture-14 (File Systems)
70 pages
04 en Network File Systems
No ratings yet
04 en Network File Systems
57 pages
CH 10
No ratings yet
CH 10
43 pages
Distributed File Systems
No ratings yet
Distributed File Systems
35 pages
DBMS - Unit II - PPT - With PLSQL
100% (1)
DBMS - Unit II - PPT - With PLSQL
143 pages
Ds 2016 17 Lec17
No ratings yet
Ds 2016 17 Lec17
32 pages
@klwks - Bot Os Co-4 Ha-4
No ratings yet
@klwks - Bot Os Co-4 Ha-4
17 pages
Operating System
No ratings yet
Operating System
40 pages
Overview of UNIX: References
0% (1)
Overview of UNIX: References
172 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
Distributed File Systems
No ratings yet
Distributed File Systems
6 pages
Network File System (NFS)
No ratings yet
Network File System (NFS)
31 pages
My First Operating System Presentation
No ratings yet
My First Operating System Presentation
30 pages
What Is NFS Final2
No ratings yet
What Is NFS Final2
25 pages
Distributed File System
No ratings yet
Distributed File System
43 pages
SIT102 Lecture 8.2
No ratings yet
SIT102 Lecture 8.2
32 pages
DFS Design and Implementation: Brent R. Hafner
No ratings yet
DFS Design and Implementation: Brent R. Hafner
40 pages
L8 DFS
No ratings yet
L8 DFS
35 pages
Distributed File Systems
No ratings yet
Distributed File Systems
38 pages
Distributed File Systems
No ratings yet
Distributed File Systems
22 pages
Distributed File System Implementation
100% (1)
Distributed File System Implementation
30 pages
DFSNov 1
No ratings yet
DFSNov 1
36 pages
BEIJER - CIMREX Tools
100% (1)
BEIJER - CIMREX Tools
30 pages
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
No ratings yet
Presentation ON Distributed File System: Institute of Engineering and Technology Bundelkhand University
51 pages
DFS Design and Implementation
No ratings yet
DFS Design and Implementation
40 pages
03 Nfs PDF
No ratings yet
03 Nfs PDF
48 pages
Chapter Wise Question Number of O Level (New Syllabus)
No ratings yet
Chapter Wise Question Number of O Level (New Syllabus)
55 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Techlog 2020-2 Synchronization Tool Deployment Guide
No ratings yet
Techlog 2020-2 Synchronization Tool Deployment Guide
17 pages
L6 DFS
No ratings yet
L6 DFS
27 pages
Unit 6: File-System Interface
No ratings yet
Unit 6: File-System Interface
43 pages
Chapter 12 Slides
No ratings yet
Chapter 12 Slides
17 pages
Distributed Systems U4
No ratings yet
Distributed Systems U4
8 pages
Distributed File Systems & Name Services: UNIT-4
No ratings yet
Distributed File Systems & Name Services: UNIT-4
70 pages
Vmax 3 Notes Udemy
No ratings yet
Vmax 3 Notes Udemy
26 pages
CH 8file System
No ratings yet
CH 8file System
25 pages
Networked File System: CS 537 - Introduction To Operating Systems
No ratings yet
Networked File System: CS 537 - Introduction To Operating Systems
23 pages
Design and Implementation of The Sun Network Filesystem: R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon
No ratings yet
Design and Implementation of The Sun Network Filesystem: R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon
34 pages
Distributed File Systems: Arvind Krishnamurthy Spring 2001
No ratings yet
Distributed File Systems: Arvind Krishnamurthy Spring 2001
3 pages
Distributed File Systems
No ratings yet
Distributed File Systems
31 pages
Chapter: 1.4 Basic Internet Terminology Topic: 1.4.1 Basic Internet Terminology
No ratings yet
Chapter: 1.4 Basic Internet Terminology Topic: 1.4.1 Basic Internet Terminology
3 pages
Distributed-File Systems Background
No ratings yet
Distributed-File Systems Background
9 pages
Information Systems 512
No ratings yet
Information Systems 512
11 pages
Cis 423 - Quizzes 6 - 10 PDF
100% (1)
Cis 423 - Quizzes 6 - 10 PDF
39 pages
P2P File Sharing
No ratings yet
P2P File Sharing
43 pages
Andrew - Cmu.edu: Let's Start With A Familiar Example: Andrew 10,000s of People Terabytes of Disk
No ratings yet
Andrew - Cmu.edu: Let's Start With A Familiar Example: Andrew 10,000s of People Terabytes of Disk
7 pages
File Systems 2
No ratings yet
File Systems 2
43 pages
NFS
No ratings yet
NFS
27 pages
Unit 2. Sun Network File System.
No ratings yet
Unit 2. Sun Network File System.
1 page
Issues in Distributed File Systems
No ratings yet
Issues in Distributed File Systems
10 pages
18-Distributed File Systems Study On Operating Systems
No ratings yet
18-Distributed File Systems Study On Operating Systems
24 pages
IDirect Spec Sheet Evolution X5 0817
No ratings yet
IDirect Spec Sheet Evolution X5 0817
2 pages
LVDS Source Synchronous 7:1 Serialization and Deserialization Using Clock Multiplication
No ratings yet
LVDS Source Synchronous 7:1 Serialization and Deserialization Using Clock Multiplication
18 pages
Other File Systems: LFS, NFS, and Afs
No ratings yet
Other File Systems: LFS, NFS, and Afs
37 pages
74 Series ICs
No ratings yet
74 Series ICs
13 pages
Structured Query Language
No ratings yet
Structured Query Language
16 pages
Data Warehousing & DATA MINING (SE-409) : Lecture-2
No ratings yet
Data Warehousing & DATA MINING (SE-409) : Lecture-2
36 pages
Silk Install Handbook
No ratings yet
Silk Install Handbook
91 pages
Intro Most Favorite Question For Interviewers Is Interchanging Two Variables With Out Using The Third
No ratings yet
Intro Most Favorite Question For Interviewers Is Interchanging Two Variables With Out Using The Third
10 pages
Requirements For Distributed File Systems
No ratings yet
Requirements For Distributed File Systems
4 pages
7.data Manipulation Using SQL
No ratings yet
7.data Manipulation Using SQL
25 pages
Splunk Quick Reference
No ratings yet
Splunk Quick Reference
3 pages
Chapter-4A (Transforming Data Into Information)
No ratings yet
Chapter-4A (Transforming Data Into Information)
38 pages
Master Boot Record and Volume Boot Record Notes
No ratings yet
Master Boot Record and Volume Boot Record Notes
11 pages
DS K2600T Series Access Controller Datasheet V1.0 20201027
No ratings yet
DS K2600T Series Access Controller Datasheet V1.0 20201027
4 pages
DBMS Module 3 Notes - RBK 24-1-2024
No ratings yet
DBMS Module 3 Notes - RBK 24-1-2024
23 pages
Macaw Power BI Cheat Sheet EN
No ratings yet
Macaw Power BI Cheat Sheet EN
2 pages
HANA Disks Data Partitions 2.00.040+
No ratings yet
HANA Disks Data Partitions 2.00.040+
3 pages
Custom GenIL Object Model
No ratings yet
Custom GenIL Object Model
14 pages
SQLBest Practices
No ratings yet
SQLBest Practices
7 pages
DDCA Material
No ratings yet
DDCA Material
4 pages
DBM PPT Ch1
No ratings yet
DBM PPT Ch1
38 pages
Xerox Scanning Template Setup
No ratings yet
Xerox Scanning Template Setup
6 pages

CS2510 00 Distributed Storage Overview

Uploaded by

CS2510 00 Distributed Storage Overview

Uploaded by

Distributed and

• More data == Greater probability of failure

but the papers are definitely

lookup Returns file handle for filename

• The open() system call is an example of a stateful protocol.

write(fd, offset: 0, count: 15)

write(fd, 15, 15)

write(fd, 30, 15)

write(fd, offset, count)

write(fd, offset, count)

But what if… it doesn’t?

write(fd, 15, 15)

success (but server fails before committing cache to disk)

write(fd, 30, 15)

• Can we speed that up?

a tree. (Hierarchical Layout)

Once we back them up, we can

• With content-addressing, the file is stored with a name that is

• In digital archival circles, this is called fixity.

• Dividing up the file is called chunking, and there are things to

Deduplication vacation_video.mov (v2)

• I can ask a storage (N4) aa7e074434e5ae507ec22f9f1f7df656

• Nothing is stopping me (N4) aa7e074434e5ae507ec22f9f1f7df656

(C) 0bdba65117548964bad7181a1a9f99e4 (D) 495aa31ae809642160e38868adc7ee8e

C’s File Data D’s File Data } Concurrently gather

• AT&T said in 2015 that BitTorrent represented around 20% of total

main.c Adds “D” to the list.

• As your local machine gathers

• Getting rid of that (decentralized tracking) means:

• These are “tracker”-less. They are built to not require a centralized

• Early DHTs were motivated by peer-to-peer networks.

• Fulfills one simple function: value = 𝑙𝑜𝑜𝑘𝑢𝑝(𝑘𝑒𝑦)

• The goal is at 𝑂 log 𝑁 queries to find data.

• Nodes store information about

• Imagine a ring with millions of nodes.

• When performing 𝑙𝑜𝑜𝑘𝑢𝑝 𝑘𝑒𝑦 , the

(3) • Let’s say 𝑘𝑒𝑦 is far away from us.

• This node, as before, also knows about

• It can ask its neighbor who their

• Maintaining the invariants of the

• When new nodes enter, they dangle

• That means, it doesn’t handle short-

Note: 0-bit list contains half of the overall network!

You might also like