Freenet: A Distributed Anonymous Information Storage and Retrieval System
Freenet: A Distributed Anonymous Information Storage and Retrieval System
1 Introduction
Networked computer systems are rapidly growing in importance as the medium
of choice for the storage and exchange of information. However, current systems
afford little privacy to their users, and typically store any given data item in
only one or a few fixed places, creating a central point of failure. Because of a
continued desire among individuals to protect the privacy of their authorship or
readership of various types of sensitive information[28], and the undesirability
of central points of failure which can be attacked by opponents wishing to re-
move data from the system[11, 27] or simply overloaded by too much interest[1],
systems offering greater security and reliability are needed.
We are developing Freenet, a distributed information storage and retrieval
system designed to address these concerns of privacy and availability. The system
?
Work of Theodore W. Hong was supported by grants from the Marshall Aid Com-
memoration Commission and the National Science Foundation.
2 Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong
2 Related work
dummy messages to increase security, but again does not protect information
producers.
The Rewebber[26] provides a measure of anonymity for producers of web in-
formation by means of an encrypted URL service that is essentially the inverse
of an anonymizing browser proxy, but has the same difficulty of providing no
protection against the operator of the service itself. TAZ[18] extends this idea
by using chains of nested encrypted URLs that successively point to different
rewebber servers to be contacted, although this is vulnerable to traffic analysis
using replay. Both rely on a single server as the ultimate source of informa-
tion. Publius[30] enhances availability by distributing files as redundant shares
among n webservers, only k of which are needed to reconstruct a file; however,
since the identity of the servers themselves is not anonymized, an attacker might
remove information by forcing the closure of nk+1 servers. The Eternity pro-
posal[5] seeks to archive information permanently and anonymously, although it
lacks specifics on how to efficiently locate stored files, making it more akin to
an anonymous backup service. Free Haven[14] is an interesting anonymous pub-
lication system that uses a trust network and file trading mechanism to provide
greater server accountability while maintaining anonymity.
distributed.net[15] demonstrated the concept of pooling computer re-
sources among multiple users on a large scale for CPU cycles; other systems
which do the same for disk space are Napster[24] and Gnutella[17], although the
former relies on a central server to locate files and the latter employs an inefficient
broadcast search. Neither one replicates files. Intermemory[9] and India[16] are
cooperative distributed fileserver systems intended for long-term archival storage
along the lines of Eternity, in which files are split into redundant shares and dis-
tributed among many participants. Akamai[2] provides a service that replicates
files at locations near information consumers, but is not suitable for producers
who are individuals (as opposed to corporations). None of these systems attempt
to provide anonymity.
3 Architecture
These problems are addressed by the signed-subspace key (SSK), which en-
ables personal namespaces. A user creates a namespace by randomly generating
a public/private key pair which will serve to identify her namespace. To insert a
file, she chooses a short descriptive text string as before. The public namespace
key and the descriptive string are hashed independently, XORed together, and
then hashed again to yield the file key.
As with the keyword-signed key, the private half of the asymmetric key pair
is used to sign the file. This signature, generated from a random key pair, is
more secure than the signatures used for keyword-signed keys. The file is also
encrypted by the descriptive string as before.
To allow others to retrieve the file, the user publishes the descriptive string
together with her subspaces public key. Storing data requires the private key,
however, so only the owner of a subspace can add files to it.
The owner now has the ability to manage her own namespace. For example,
she could simulate a hierarchical structure by creating directory-like files contain-
ing hypertext pointers to other files. A directory under the key text/philosophy
could contain a list of keys such as text/philosophy/sun-tzu/art-of-war,
text/philosophy/confucius/analects, and text/philosophy/nozick/anar-
chy-state-utopia, using appropriate syntax interpretable by a client. Directo-
ries can also recursively point to other directories.
The third type of key is the content-hash key (CHK), which is useful for
implementing updating and splitting. A content-hash key is simply derived by
directly hashing the contents of the corresponding file. This gives every file a
pseudo-unique file key. Files are also encrypted by a randomly-generated en-
cryption key. To allow others to retrieve the file, the user publishes the content-
hash key itself together with the decryption key. Note that the decryption key
is never stored with the file but is only published with the file key, for reasons
to be explained in section 3.4.
Content-hash keys are most useful in conjunction with signed-subspace keys
using an indirection mechanism. To store an updatable file, a user first inserts
it under its content-hash key. She then inserts an indirect file under a signed-
subspace key whose contents are the content-hash key. This enables others to
retrieve the file in two steps, given the signed-subspace key.
To update a file, the owner first inserts a new version under its content-hash
key, which should be different from the old versions content hash. She then
inserts a new indirect file under the original signed-subspace key pointing to the
updated version. When the insert reaches a node which possesses the old version,
a key collision will occur. The node will check the signature on the new version,
verify that it is both valid and more recent, and replace the old version. Thus
the signed-subspace key will lead to the most recent version of the file, while old
versions can continue to be accessed directly by content-hash key if desired. (If
not requested, however, these old versions will eventually be removed from the
networksee section 3.4.) This mechanism can be used to manage directories
as well as regular files.
6 Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong
Content-hash keys can also be used for splitting files into multiple parts. For
large files, splitting can be desirable because of storage and bandwidth limita-
tions. Splitting even medium-sized files into standard-sized parts (e.g. 2n kilo-
bytes) also has advantages in combating traffic analysis. This is easily accom-
plished by inserting each part separately under a content-hash key, and creating
an indirect file (or multiple levels of indirect files) to point to the individual
parts.
All of this still leaves the problem of finding keys in the first place. The most
straightforward way to add a search capability to Freenet is to run a hypertext
spider such as those used to search the web. While an attractive solution in many
ways, this conflicts with the design goal of avoiding centralization. A possible
alternative is to create a special class of lightweight indirect files. When a real file
is inserted, the author could also insert a number of indirect files each containing
a pointer to the real file, named according to search keywords chosen by her.
These indirect files would differ from normal files in that multiple files with the
same key (i.e. search keyword) would be permitted to exist, and requests for
such keys would keep going until a specified number of results were accumulated
instead of stopping at the first file found. Managing the likely large volume of
such indirect files is an open problem.
An alternative mechanism is to encourage individuals to create their own
compilations of favorite keys and publicize the keys of these compilations. This
is an approach also in common use on the world-wide web.
= Data Request
2
3
1
= Data Reply
start a b
12
= Request Failed
5 e
f 8
failure result is propagated back to the original requestor without any further
nodes being tried. Nodes may unilaterally curtail excessive hops-to-live values
to reduce network load. They may also forget about pending requests after a
period of time to keep message memory free.
Figure 1 depicts a typical sequence of request messages. The user initiates a
request at node a. Node a forwards the request to node b, which forwards it to
node c. Node c is unable to contact any other nodes and returns a backtracking
request failed message to b. Node b then tries its second choice, e, which
forwards the request to f. Node f forwards the request to b, which detects the
loop and returns a backtracking failure message. Node f is unable to contact
any other nodes and backtracks one step further back to e. Node e forwards the
request to its second choice, d, which has the data. The data is returned from d
via e and b back to a, which sends it back to the user. The data is also cached
on e, b, and a.
This mechanism has a number of effects. Most importantly, we hypothesize
that the quality of the routing should improve over time, for two reasons. First,
nodes should come to specialize in locating sets of similar keys. If a node is listed
in routing tables under a particular key, it will tend to receive mostly requests
for keys similar to that key. It is therefore likely to gain more experience
in answering those queries and become better informed in its routing tables
about which other nodes carry those keys. Second, nodes should become similarly
specialized in storing clusters of files having similar keys. Because forwarding a
request successfully will result in the node itself gaining a copy of the requested
file, and most requests will be for similar keys, the node will mostly acquire files
with similar keys. Taken together, these two effects should improve the efficiency
of future requests in a self-reinforcing cycle, as nodes build up routing tables and
datastores focusing on particular sets of keys, which will be precisely those keys
that they are asked about.
8 Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong
user then sends the data to insert, which will be propagated along the path
established by the initial query and stored in each node along the way. Each
node will also create an entry in its routing table associating the inserter (as the
data source) with the new key. To avoid the obvious security problem, any node
along the way can unilaterally decide to change the insert message to claim itself
or another arbitrarily-chosen node as the data source.
If a node cannot forward an insert to its preferred downstream node because
the target is down or a loop would be created, the insert backtracks to the second-
nearest key, then the third-nearest, and so on in the same way as for requests.
If the backtracking returns all the way back to the original inserter, it indicates
that fewer nodes than asked for could be contacted. As with requests, nodes may
curtail excessive hops-to-live values and/or forget about pending inserts after a
period of time.
This mechanism has three effects. First, newly inserted files are selectively
placed on nodes already possessing files with similar keys. This reinforces the
clustering of keys set up by the request mechanism. Second, new nodes can use
inserts as a supplementary means of announcing their existence to the rest of the
network. Third, attempts by attackers to supplant existing files by inserting junk
files under existing keys are likely to simply spread the real files further, since
the originals are propagated on collision. (Note, however, that this is mostly only
relevant to keyword-signed keys, as the other types of keys are more strongly
verifiable.)
A new node can join the network by discovering the address of one or more
existing nodes through out-of-band means, then starting to send messages. As
mentioned previously, the request mechanism naturally enables new nodes to
learn about more of the network over time. However, in order for existing nodes to
discover them, new nodes must somehow announce their presence. This process is
complicated by two somewhat conflicting requirements. On one hand, to promote
efficient routing, we would like all the existing nodes to be consistent in deciding
which keys to send a new node (i.e. what key to assign it in their routing tables).
On the other hand, it would cause a security problem if any one node could
choose the routing key, which rules out the most straightforward way of achieving
consistency.
We use a cryptographic protocol to satisfy both of these requirements. A
new node joining the network chooses a random seed and sends an announce-
ment message containing its address and the hash of that seed to some existing
node. When a node receives a new-node announcement, it generates a random
seed, XORs that with the hash it received and hashes the result again to create
a commitment. It then forwards the new hash to some node chosen randomly
from its routing table. This process continues until the hops-to-live of the an-
nouncement runs out. The last node to receive the announcement just generates
a seed. Now all nodes in the chain reveal their seeds and the key for the new
node is assigned as the XOR of all the seeds. Checking the commitments enables
each node to confirm that everyone revealed their seeds truthfully. This yields
a consistent random key which cannot be influenced by a malicious participant.
Each node then adds an entry for the new node in its routing table under that
key.
Freenet 11
4 Protocol details
If the request is ultimately successful, the remote node will reply with a
Send.Data message containing the data requested and the address of the node
which supplied it (possibly faked). If the request is ultimately unsuccessful and
its hops-to-live are completely used up trying to satisfy it, the remote node
will reply with a Reply.NotFound. The sending node will then decrement the
hops-to-live of the Send.Data (or Reply.NotFound) and pass it along upstream,
unless it is the actual originator of the request. Both of these messages terminate
the transaction and release any resources held. However, if there are still hops-
to-live remaining, usually because the request ran into a dead end where no
viable non-looping paths could be found, the remote node will reply with a
Request.Continue giving the number of hops-to-live left. The sending node will
then try to contact the next-most likely node from its routing table. It will also
send a Reply.Restart upstream.
To insert data, the sending node sends a Request.Insert message specifying
a randomly-generated transaction ID, an initial hops-to-live and depth, and a
proposed key. The remote node will check its datastore for the key and if not
found, forward the insert to another node as described in section 3.3. Timers
and Reply.Restart messages are also used in the same way as for requests.
If the insert ultimately results in a key collision, the remote node will re-
ply with either a Send.Data message containing the existing data or a Re-
ply.NotFound (if existing data was not actually found, but routing table ref-
erences to it were). If the insert does not encounter a collision, yet runs out
of nodes with nonzero hops-to-live remaining, the remote node will reply with
a Request.Continue. In this case, Request.Continue is a failure result meaning
that not as many nodes could be contacted as asked for. These messages will
be passed along upstream as in the request case. Both messages terminate the
transaction and release any resources held. However, if the insert expires with-
out encountering a collision, the remote node will reply with a Reply.Insert,
indicating that the insert can go ahead. The sending node will pass along the
Reply.Insert upstream and wait for its predecessor to send a Send.Insert con-
taining the data. When it receives the data, it will store it locally and forward
the Send.Insert downstream, concluding the transaction.
5 Performance analysis
To test the adaptivity of the network routing, we created a test network of 1000
nodes. Each node had a datastore size of 50 items and a routing table size of 250
addresses. The datastores were initialized to be empty, and the routing tables
were initialized to connect the network in a regular ring-lattice topology in which
Freenet 13
1000
first quartile
median
Request pathlength (hops) third quartile
100
10
1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Time
each node had routing entries for its two nearest neighbors on either side. The
keys associated with these routing entries were set to be hashes of the destination
nodes addresses. Using hashes has the useful property that the resulting keys
are both random and consistent (that is, all references to a given node will use
the same key).
Inserts of random keys were sent to random nodes in the network, inter-
spersed randomly with requests for randomly-chosen keys known to have been
previously inserted, using a hops-to-live of 20 for both. Every 100 timesteps, a
snapshot of the network was taken and its performance measured using a set
of probe requests. Each probe consisted of 300 random requests for previously-
inserted keys, using a hops-to-live of 500. We recorded the resulting distribution
of request pathlengths, the number of hops actually taken before finding the data.
If the request did not find the data, the pathlength was taken to be 500.
Figure 2 shows the evolution of the first, second, and third quartiles of the re-
quest pathlength over time, averaged over ten trials. We can see that the initially
high pathlengths decrease rapidly over time. In the beginning, few requests suc-
ceed at all, but as the network converges, the median request pathlength drops
to just six.
14 Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong
100
first quartile
median
third quartile
80
Request pathlength (hops)
60
40
20
0
100 1000 10000 100000 1e+06
Network size (nodes)
5.2 Scalability
first quartile
median
third quartile
1000
Request pathlength (hops)
100
10
1
0 10 20 30 40 50 60 70 80
Node failure rate (%)
Nonetheless, even this limited network appears capable of scaling to one million
nodes with a median pathlength of just 30. Note also that the network was grown
continuously, without any steady-state convergence period.
5.3 Fault-tolerance
0.01
0.001
0.0001
10 100 1000
Number of links
6 Security
The primary goal for Freenet security is protecting the anonymity of requestors
and inserters of files. It is also important to protect the identity of storers of
files. Although trivially anyone can turn a node into a storer by requesting a
file through it, thus identifying it as a storer, what is important is that there
remain other, unidentified, holders of the file so that an adversary cannot remove
a file by attacking all of the nodes that hold it. Files must be protected against
malicious modification, and finally, the system must be resistant to denial-of-
service attacks.
Reiter and Rubin[25] present a useful taxonomy of anonymous communica-
tion properties on three axes. The first axis is the type of anonymity: sender
anonymity or receiver anonymity, which mean respectively that an adversary
cannot determine either who originated a message, or to whom it was sent. The
second axis is the adversary in question: a local eavesdropper, a malicious node
or collaboration of malicious nodes, or a web server (not applicable to Freenet).
The third axis is the degree of anonymity, which ranges from absolute privacy
(the presence of communication cannot be perceived) to beyond suspicion (the
sender appears no more likely to have originated the message than any other
potential sender), probable innocence (the sender is no more likely to be the
originator than not), possible innocence, exposed, and provably exposed (the
adversary can prove to others who the sender was).
As Freenet communication is not directed towards specific receivers, receiver
anonymity is more accurately viewed as key anonymity, that is, hiding the key
which is being requested or inserted. Unfortunately, since routing depends on
knowledge of the key, key anonymity is not possible in the basic Freenet scheme
(but see the discussion of pre-routing below). The use of hashes as keys pro-
vides a measure of obscurity against casual eavesdropping, but is of course vul-
nerable to a dictionary attack since their unhashed versions must be widely
known in order to be useful.
Freenets anonymity properties under this taxonomy are shown in Table 1.
Against a collaboration of malicious nodes, sender anonymity is preserved be-
yond suspicion since a node in a request path cannot tell whether its predeces-
sor in the path initiated the request or is merely forwarding it. [25] describes a
probabilistic attack which might compromise sender anonymity, using a statis-
tical analysis of the probability that a request arriving at a node a is forwarded
on or handled directly, and the probability that a chooses a particular node b
18 Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong
returning fictitious data. For data stored under content-hash keys or signed-
subspace keys, this is not feasible since inauthentic data can be detected unless
a node finds a hash collision or successfully forges a cryptographic signature.
Data stored under keyword-signed keys, however, is vulnerable to dictionary
attack since signatures can be made by anyone knowing the original descriptive
string.
Finally, a number of denial-of-service attacks can be envisioned. The most
significant threat is that an attacker will attempt to fill all of the networks stor-
age capacity by inserting a large number of junk files. An interesting possibility
for countering this attack is a scheme such as Hash Cash[20]. Essentially, this
scheme requires the inserter to perform a lengthy computation as payment
before an insert is accepted, thus slowing down an attack. Another alternative
is to divide the datastore into two sections, one for new inserts and one for
established files (defined as files having received at least a certain number of
requests). New inserts can only displace other new inserts, not established files.
In this way a flood of junk inserts might temporarily paralyze insert operations
but would not displace existing files. It is difficult for an attacker to artificially
legitimize her own junk files by requesting them many times, since her requests
will be satisfied by the first node to hold the data and not proceed any further.
She cannot send requests directly to the other downstream nodes holding her files
since their identities are hidden from her. However, adopting this scheme may
make it difficult for genuine new inserts to survive long enough to be requested
by others and become established.
Attackers may attempt to displace existing files by inserting alternate ver-
sions under the same keys. Such an attack is not possible against a content-hash
key or signed-subspace key, since it requires finding a hash collision or success-
fully forging a cryptographic signature. An attack against a keyword-signed key,
on the other hand, may result in both versions coexisting in the network. The
way in which nodes react to insert collisions (detailed in section 3.3) is intended
to make such attacks more difficult. The success of a replacement attack can be
measured by the ratio of corrupt versus genuine versions resulting in the system.
However, the more corrupt copies the attacker attempts to circulate (by setting
a higher hops-to-live on insert), the greater the chance that an insert collision
will be encountered, which would cause an increase in the number of genuine
copies.
7 Conclusions
users there are or how well the insert and request mechanisms are working, but
anecdotal evidence is so far positive. We are working on implementing a simula-
tion and visualization suite which will enable more rigorous tests of the protocol
and routing algorithm. More realistic simulation is necessary which models the
effects of nodes joining and leaving simultaneously, variation in node capacity
and bandwidth, and larger network sizes. We would also like to implement a
public-key infrastructure to authenticate nodes and create a searching mecha-
nism.
8 Acknowledgements
This material is partly based upon work supported under a National Science
Foundation Graduate Research Fellowship.
References
14. R. Dingledine, M.J. Freedman, and D. Molnar, The Free Haven project: dis-
tributed anonymous storage service, in Proceedings of the Workshop on Design
Issues in Anonymity and Unobservability, Berkeley, CA, USA. Springer: New York
(2001).
15. Distributed.net, http://www.distributed.net/ (2000).
16. D.J. Ellard, J.M. Megquier, and L. Park, The INDIA protocol,
http://www.eecs.harvard.edu/~ellard/India-WWW/ (2000).
17. Gnutella, http://gnutella.wego.com/ (2000).
18. I. Goldberg and D. Wagner, TAZ servers and the rewebber network: enabling
anonymous publishing on the world wide web, First Monday 3(4) (1998).
19. D. Goldschlag, M. Reed, and P. Syverson, Onion routing for anonymous and
private Internet connections, Communications of the ACM 42(2), 39-41 (1999).
20. Hash Cash, http://www.cypherspace.org/~adam/hashcash/ (2000).
21. T. Hong, Performance, in Peer-to-Peer: Harnessing the Power of Disruptive
Technologies, ed. by A. Oram. OReilly: Sebastopol, CA, USA (2001).
22. B.A. Huberman and L.A. Adamic, Growth dynamics of the world-wide web,
Nature 401, 131 (1999).
23. S. Milgram, The small world problem, Psychology Today 1(1), 60-67 (1967).
24. Napster, http://www.napster.com/ (2000).
25. M.K. Reiter and A.D. Rubin, Anonymous web transactions with Crowds, Com-
munications of the ACM 42(2), 32-38 (1999).
26. The Rewebber, http://www.rewebber.de/ (2000).
27. M. Richtel and S. Robinson, Several web sites are attacked on day after assault
shut Yahoo, The New York Times, February 9, 2000.
28. J. Rosen, The eroded self, The New York Times, April 30, 2000.
29. A.S. Tanenbaum, Modern Operating Systems. Prentice-Hall: Upper Saddle River,
NJ, USA (1992).
30. M. Waldman, A.D. Rubin, and L.F. Cranor, Publius: a robust, tamper-evident,
censorship-resistant, web publishing system, in Proceedings of the Ninth USENIX
Security Symposium, Denver, CO, USA (2000).
31. D. Watts and S. Strogatz, Collective dynamics of small-world networks, Nature
393, 440-442 (1998).
32. Zero-Knowledge Systems, http://www.zks.net/ (2000).