Ipfs p2p File System
Ipfs p2p File System
(DRAFT 3)
Juan Benet
[email protected]
ABSTRACT
The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS
is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git
repository. In other words, IPFS provides a high throughput content-addressed block storage model, with contentaddressed hyper links. This forms a generalized Merkle
DAG, a data structure upon which one can build versioned
file systems, blockchains, and even a Permanent Web. IPFS
combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single
point of failure, and nodes do not need to trust each other.
1.
INTRODUCTION
For example, Linux distributions use BitTorrent to transmit disk images, and Blizzard, Inc. uses it to distribute
video game content.
parties invested in the current model. But from another perspective, new protocols have emerged and gained wide use
since the emergence of HTTP. What is lacking is upgrading
design: enhancing the current HTTP web, and introducing
new functionality without degrading user experience.
Industry has gotten away with using HTTP this long because moving small files around is relatively cheap, even for
small organizations with lots of traffic. But we are entering a new era of data distribution with new challenges: (a)
hosting and distributing petabyte datasets, (b) computing
on large data across organizations, (c) high-volume highdefinition on-demand or real-time media streams, (d) versioning and linking of massive datasets, (e) preventing accidental disappearance of important files, and more. Many
of these can be boiled down to lots of data, accessible everywhere. Pressed by critical features and bandwidth concerns, we have already given up HTTP for different data
distribution protocols. The next step is making them part
of the Web itself.
Orthogonal to efficient data distribution, version control
systems have managed to develop important data collaboration workflows. Git, the distributed source code version
control system, developed many useful ways to model and
implement distributed data operations. The Git toolchain
offers versatile versioning functionality that large file distribution systems severely lack. New solutions inspired by Git
are emerging, such as Camlistore [?], a personal file storage system, and Dat [?] a data collaboration toolchain
and dataset package manager. Git has already influenced
distributed filesystem design [9], as its content addressed
Merkle DAG data model enables powerful file distribution
strategies. What remains to be explored is how this data
structure can influence the design of high-throughput oriented file systems, and how it might upgrade the Web itself.
This paper introduces IPFS, a novel peer-to-peer versioncontrolled filesystem seeking to reconcile these issues. IPFS
synthesizes learnings from many past successful systems.
Careful interface-focused integration yields a system greater
than the sum of its parts. The central IPFS principle is
modeling all data as part of the same Merkle DAG.
2.
BACKGROUND
2.1
Distributed Hash Tables (DHTs) are widely used to coordinate and maintain metadata about peer-to-peer systems.
2.1.1
Kademlia DHT
2.2
BitTorrent [3] is a widely successful peer-to-peer filesharing system, which succeeds in coordinating networks of untrusting peers (swarms) to cooperate in distributing pieces
of files to each other. Key features from BitTorrent and its
ecosystem that inform IPFS design include:
1. BitTorrents data exchange protocol uses a quasi titfor-tat strategy that rewards nodes who contribute to
each other, and punishes nodes who only leech others
resources.
2.1.2
Coral DSHT
While some peer-to-peer filesystems store data blocks directly in DHTs, this wastes storage and bandwidth, as data
must be stored at nodes where it is not needed [5]. The
Coral DSHT extends Kademlia in three particularly important ways:
1. Kademlia stores values in nodes whose ids are nearest
(using XOR-distance) to the key. This does not take
into account application data locality, ignores far
nodes that may already have the data, and forces nearest nodes to store it, whether they need it or not.
This wastes significant storage and bandwith. Instead,
Coral stores addresses to peers who can provide the
data blocks.
2.3
2.1.3
S/Kademlia DHT
S/Kademlia [1] extends Kademlia to protect against malicious attacks in two particularly important ways:
1. S/Kademlia provides schemes to secure NodeId generation, and prevent Sybill attacks. It requires nodes to
create a PKI key pair, derive their identity from it,
and sign their messages to each other. One scheme
includes a proof-of-work crypto puzzle to make generating Sybills expensive.
2. S/Kademlia nodes lookup values over disjoint paths,
in order to ensure honest nodes can connect to each
other in the presence of a large fraction of adversaries
in the network. S/Kademlia achieves a success rate of
0.85 even with an adversarial fraction as large as half
of the nodes.
2.4
3.
IPFS DESIGN
IPFS is a distributed file system which synthesizes successful ideas from previous peer-to-peer sytems, including
DHTs, BitTorrent, Git, and SFS. The contribution of IPFS
is simplifying, evolving, and connecting proven techniques
into a single cohesive system, greater than the sum of its
parts. IPFS presents a new platform for writing and deploying applications, and a new system for distributing and
versioning large data. IPFS could even evolve the web itself.
IPFS is peer-to-peer; no nodes are privileged. IPFS nodes
store IPFS objects in local storage. Nodes connect to each
other and transfer objects. These objects represent files and
other data structures. The IPFS Protocol is divided into a
stack of sub-protocols responsible for different functionality:
1. Identities - manage node identity generation and verification. Described in Section 3.1.
2. Network - manages connections to other peers, uses
various underlying network protocols. Configurable.
Described in Section 3.2.
3. Routing - maintains information to locate specific
peers and objects. Responds to both local and remote queries. Defaults to a DHT, but is swappable.
Described in Section 3.3.
4. Exchange - a novel block exchange protocol (BitSwap)
that governs efficient block distribution. Modelled as
a market, weakly incentivizes data replication. Trade
Strategies swappable. Described in Section 3.4.
5. Objects - a Merkle DAG of content-addressed immutable objects with links. Used to represent arbitrary datastructures, e.g. file hierarchies and communication systems. Described in Section 3.5.
This allows the system to (a) choose the best function for
the use case (e.g. stronger security vs faster performance),
and (b) evolve as function choices change. Self-describing
values allow using different parameter choices compatibly.
3.2
Network
3.1
Identities
3.2.1
# an SCTP/IPv4 connection
/ip4/10.20.30.40/sctp/1234/
# an SCTP/IPv4 connection proxied over TCP/IPv4
/ip4/5.6.7.8/tcp/5678/ip4/1.2.3.4/sctp/1234/
3.3
Routing
3.4
when the distribution of blocks across nodes is complementary, meaning they have what the other wants. Often, this
will not be the case. In some cases, nodes must work for
their blocks. In the case that a node has nothing that its
peers want (or nothing at all), it seeks the pieces its peers
want, with lower priority than what the node wants itself.
This incentivizes nodes to cache and disseminate rare pieces,
even if they are not interested in them directly.
3.4.1
BitSwap Credit
3.4.2
BitSwap Strategy
bytes_sent
bytes_recv + 1
P ( send | r )
nodeid NodeId
ledger Ledger
// Ledger between the node and this peer
1
r
0
last_seen Timestamp
// timestamp of last received message
want_list []Multihash
// checksums of all blocks wanted by peer
// includes blocks wanted by peers peers
3.4.3
BitSwap Ledger
3.4.4
BitSwap Specification
}
// Protocol interface:
interface Peer {
open (nodeid :NodeId, ledger :Ledger);
send_want_list (want_list :WantList);
send_block (block :Block) -> (complete :Bool);
close (final :Bool);
}
Sketch of the lifetime of a peer connection:
1. Open: peers send ledgers until they agree.
2. Sending: peers exchange want_lists and blocks.
3. Close: peers deactivate a connection.
4. Ignored: (special) a peer is ignored (for the duration
of a timeout) if a nodes strategy avoids sending
Peer.open(NodeId, Ledger).
When connecting, a node initializes a connection with a
Ledger, either stored from a connection in the past or a
new one zeroed out. Then, sends an Open message with the
Ledger to the peer.
Upon receiving an Open message, a peer chooses whether
to activate the connection. If acording to the receivers
Ledger the sender is not a trusted agent (transmission
below zero, or large outstanding debt) the receiver may opt
to ignore the request. This should be done probabilistically
with an ignore_cooldown timeout, as to allow errors to be
corrected and attackers to be thwarted.
If activating the connection, the receiver initializes a Peer
object with the local version of the Ledger and sets the
last_seen timestamp. Then, it compares the received Ledger
with its own. If they match exactly, the connections have
opened. If they do not match, the peer creates a new zeroed
out Ledger and sends it.
Peer.send_want_list(WantList).
While the connection is open, nodes advertise their want_list
to all connected peers. This is done (a) upon opening the
connection, (b) after a randomized periodic timeout, (c) after a change in the want_list and (d) after receiving a new
block.
Upon receiving a want_list, a node stores it. Then, it
checks whether it has any of the wanted blocks. If so, it
sends them according to the BitSwap Strategy above.
Peer.send_block(Block).
Sending a block is straightforward. The node simply transmits the block of data. Upon receiving all the data, the receiver computes the Multihash checksum to verify it matches
the expected one, and returns confirmation.
Peer.close(Bool).
The final parameter to close signals whether the intention to tear down the connection is the senders or not. If
false, the receiver may opt to re-open the connection immediatelty. This avoids premature closes.
A peer connection should be closed under two conditions:
a silence_wait timeout has expired without receiving
any messages from the peer (default BitSwap uses 30
seconds). The node issues Peer.close(false).
the node is exiting and BitSwap is being shut down.
In this case, the node issues Peer.close(true).
After a close message, both receiver and sender tear down
the connection, clearing any state stored. The Ledger may
be stored for the future, if it is useful to do so.
Notes.
Non-open messages on an inactive connection should
be ignored. In case of a send_block message, the receiver may check the block to see if it is needed and
correct, and if so, use it. Regardless, all such out-oforder messages trigger a close(false) message from
the receiver to force re-initialization of the connection.
3.5
The DHT and BitSwap allow IPFS to form a massive peerto-peer system for storing and distributing blocks quickly
and robustly. On top of these, IPFS builds a Merkle DAG, a
directed acyclic graph where links between objects are cryptographic hashes of the targets embedded in the sources.
This is a generalization of the Git data structure. Merkle
DAGs provide IPFS many useful properties, including:
1. Content Addressing: all content is uniquely identified by its multihash checksum, including links.
2. Tamper resistance: all content is verified with its
checksum. If data is tampered with or corrupted, IPFS
detects it.
3. Deduplication: all objects that hold the exact same
content are equal, and only stored once. This is particularly useful with index objects, such as git trees
and commits, or common portions of data.
The IPFS Object format is:
data []byte
// opaque content data
}
3.5.1
Paths
3.5.5
IPFS is equipped to handle object-level cryptographic operations. An encrypted or signed object is wrapped in a
special frame that allows encryption or verification of the
raw bytes.
type EncryptedObject struct {
Object []bytes
// raw object data encrypted
# format
/ipfs/<hash-of-object>/<name-path-to-object>
# example
/ipfs/XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x/foo.txt
Tag []bytes
// optional tag for encryption groups
3.5.2
Local Objects
Object-level Cryptography
}
type SignedObject struct {
Object []bytes
// raw object data signed
Signature []bytes
// hmac signature
PublicKey []multihash
// multihash identifying key
}
Cryptographic operations change the objects hash, defining a different object. IPFS automatically verifies signatures, and can decrypt data with user-specified keychains.
Links of encrypted objects are protected as well, making
traversal impossible without a decryption key. It is possible to have a parent object encrypted under one key, and
a child under another or not at all. This secures links to
shared objects.
3.6
Files
3.5.3
Object Pinning
Nodes who wish to ensure the survival of particular objects can do so by pinning the objects. This ensures the
objects are kept in the nodes local storage. Pinning can be
done recursively, to pin down all linked descendent objects
as well. All objects pointed to are then stored locally. This
is particularly useful to persist files, including references.
This also makes IPFS a Web where links are permanent,
and Objects can ensure the survival of others they point to.
3.5.4
Publishing Objects
3.6.1
{
"data": "some data here",
// blobs have no links
ccc111
3.6.2
ttt111
ttt222
ttt333
bbb111
lll111
bbb222
bbb333
bbb444
bbb555
{
"data": ["blob", "list", "blob"],
// lists have an array of object types as data
"links": [
{ "hash": "XLYkgq61DYaQ8NhkcqyU7rLcnSa7dSHQ16x",
"size": 189458 },
{ "hash": "XLHBNmRQ5sJJrdMPuu48pzeyTtRo39tNDR5",
"size": 19441 },
{ "hash": "XLWVQDqxo9Km9zLyquoC9gAP8CL1gWnHZ7z",
"size": 5286 }
// lists have no names in links
]
}
3.6.3
3.6.4
{ "hash":
"name":
{ "hash":
"name":
{ "hash":
"name":
"XLa1qMBKiSEEDhojb9FFZ4tEvLf7FEQdhdU",
"parent", "size": 25309 },
"XLGw74KAy9junbh28x7ccWov9inu1Vo7pnX",
"object", "size": 5198 },
"XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm",
"author", "size": 109 }
]
}
3.6.5
Version control
3.6.6
Filesystem Paths
3.6.7
One of the main challenges with versioning and distributing large files is finding the right way to split them into
independent blocks. Rather than assume it can make the
right decision for every type of file, IPFS offers the following
alternatives:
(a) Use Rabin Fingerprints [?] as in LBFS [?] to pick
suitable block boundaries.
(b) Use the rsync [?] rolling-checksum algorithm, to detect
blocks that have changed between versions.
(c) Allow users to specify block-splitting functions highly
tuned for specific files.
3.6.8
3.7
3.7.1
Self-Certified Names
NodeId = hash(node.PubKey)
2. We assign every user a mutable namespace at:
/ipns/<NodeId>
3. A user can publish an Object to this path Signed by
her private key, say at:
/ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/
4. When other users retrieve the object, they can check
the signature matches the public key and NodeId. This
verifies the authenticity of the Object published by the
user, achieving mutable state retrival.
# behaves as symlink
ln -s /ipns/XLF2ipQ4jD3U /ipns/fs.benet.ai
routing.setValue(NodeId, <ns-object-hash>)
Any links in the Object published act as sub-names in
the namespace:
/ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/
/ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/docs
/ipns/XLF2ipQ4jD3UdeX5xp1KBgeHRhemUtaA8Vm/docs/ipfs
3.7.2
While IPNS is indeed a way of assigning and reassigning names, it is not very user friendly, as it exposes long
hash values as names, which are notoriously hard to remember. These work for URLs, but not for many kinds of offline
transmission. Thus, IPFS increases the user-friendliness of
IPNS with the following techniques.
Peer Links.
As encouraged by SFS, users can link other users Objects directly into their own Objects (namespace, home, etc).
This has the benefit of also creating a web of trust (and supports the old Certificate Authority model):
3.8
Using IPFS
4.
THE FUTURE
The ideas behind IPFS are the product of decades of successful distributed systems research in academia and open
source. IPFS synthesizes many of the best ideas from the
most successful systems to date. Aside from BitSwap, which
is a novel protocol, the main contribution of IPFS is this
coupling of systems and synthesis of designs.
IPFS is an ambitious vision of new decentralized Internet
infrastructure, upon which many different kinds of applications can be built. At the bare minimum, it can be used as
a global, mounted, versioned filesystem and namespace, or
as the next generation file sharing system. At its best, it
could push the web to new horizons, where publishing valuable information does not impose hosting it on the publisher
but upon those interested, where users can trust the content
they receive without trusting the peers they receive it from,
and where old but important files do not go missing. IPFS
looks forward to bringing us toward the Permanent Web.
5.
ACKNOWLEDGMENTS
6.
REFERENCES TODO
7.
REFERENCES