Kademlia Protocol Succinctly
Kademlia Protocol Succinctly
Kademlia Protocol Succinctly
Marc Clifton
Foreword by Daniel Jebaraj
Copyright © 2018 by Syncfusion, Inc.
If you obtained this book from any other source, please register and download a free copy from
www.syncfusion.com.
The authors and copyright holders provide absolutely no warranty for any information provided.
The authors and copyright holders shall not be liable for any claim, damages, or any other
liability arising from, out of, or in connection with the information in this book.
Please do not use this book if the listed terms are unacceptable.
3
Table of Contents
Requirements .......................................................................................................................13
4
The ID class: unit tests .........................................................................................................24
Bucket-splitting .....................................................................................................................36
Implementation .....................................................................................................................43
Implementation .....................................................................................................................64
Implementation .....................................................................................................................89
Implementation .....................................................................................................................93
5
Bootstrapping .....................................................................................................................107
BootstrapWithinBootstrappingBucket .................................................................................110
BootstrapOutsideBootstrappingBucket ...............................................................................111
Storing key-values onto the new node when a new node registers .....................................135
Over-caching ......................................................................................................................138
Storing key-values onto the new node when a new node registers .....................................140
Serializing...........................................................................................................................144
Deserializing .......................................................................................................................144
6
Thread Safety .....................................................................................................................148
Parallel queries...................................................................................................................148
Request messages.............................................................................................................157
Responses .........................................................................................................................162
Server implementation........................................................................................................163
Bootstrapping .....................................................................................................................184
Conclusion ............................................................................................................................194
7
The Story behind the Succinctly Series
of Books
Daniel Jebaraj, Vice President
Syncfusion, Inc.
Whenever platforms or tools are shipping out of Microsoft, which seems to be about every other
week these days, we have to educate ourselves, quickly.
While more information is becoming available on the Internet and more and more books are
being published, even on topics that are relatively new, one aspect that continues to inhibit us is
the inability to find concise technology overview books.
We are usually faced with two options: read several 500+ page books or scour the web for
relevant blog posts and other articles. Just as everyone else who has a job to do and customers
to serve, we find this quite frustrating.
We firmly believe, given the background knowledge such developers have, that most topics can
be translated into books that are between 50 and 100 pages.
This is exactly what we resolved to accomplish with the Succinctly series. Isn’t everything
wonderful born out of a deep desire to change things for the better?
Free forever
Syncfusion will be working to produce books on several topics. The books will always be free.
Any updates we publish will also be free.
8
Free? What is the catch?
There is no catch here. Syncfusion has a vested interest in this effort.
As a component vendor, our unique claim has always been that we offer deeper and broader
frameworks than anyone else on the market. Developer education greatly helps us market and
sell against competing vendors who promise to “enable AJAX support with one click,” or “turn
the moon to cheese!”
We sincerely hope you enjoy reading this book and that it helps you better understand the topic
of study. Thank you for reading.
9
About the Author
Marc Clifton is a former Microsoft MVP (2004–2007) and current Code Project MVP (2005-
2018). He is passionate about software architecture and all things .NET, and he also enjoys
writing Python applications for the Raspberry PI and Beaglebone Single Board Computers. He
loves writing (215 articles and counting on the Code Project) and exploring new technology
applications (peer-to-peer networking, blockchain, big data, contextual data, declarative and
functional programming, etc.). He has been a contractor for 20+ years in numerous industries,
including commercial satellite design, boat yard management, emergency services record
management, and insurance. When not sitting in front of the computer, Marc enjoys playing the
lyre and going on adventures with his fiancee.
10
Chapter 1 Introduction
Kademlia, according to a paper1 published in 2015 by Xing Shi Cai and Luc Devoyre, is “the de
facto standard searching algorithm for P2P (peer-to-peer) networks on the Internet.” Kademlia is
a protocol specification for decentralizing peer-to-peer network operations, efficiently storing and
retrieving data across the network.
• It is decentralized, meaning that data is not stored on a central server, but rather
redundantly stored on peers.
• It is fault tolerant, meaning that if one or more peers drops out of the network, the data,
having been stored on multiple peers, should still be retrievable.
• Complicated database engines are not required—the data stored on a P2P network is
typically stored in key-value pairs, making it suitable for even IoT devices with limited
storage to participate in the network.
Kademlia was designed by Petar Maymounkov and David Mazières in 2002. Wikipedia says this
about Kademlia:
“It specifies the structure of the network and the exchange of information through node lookups.
Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is
formed by the participant nodes. Each node is identified by a number or node ID. The node ID
serves not only as identification, but the Kademlia algorithm uses the node ID to locate values
(usually file hashes or keywords). In fact, the node ID provides a direct map to file hashes and
that node stores information on where to obtain the file or resource.”2
1 http://www.tandfonline.com/doi/abs/10.1080/15427951.2015.1051674?src=recsys&journalCode=uinm20
2 https://en.wikipedia.org/wiki/Kademlia
3 https://en.wikipedia.org/wiki/Smart_contract
11
protocols work is important, as blockchain is one of those revolutionary technologies that
already has, and will continue to have, an impact on software application development.
Many people think that centralized data, except for performance reasons, is on its way out.4 As
that last link states: “The more the data management industry consolidates, the more opposing
forces decentralize the market.” And peer-to-peer decentralizing has built in redundancy
protecting from single-point data loss and access failures. Not that decentralizing doesn’t have
its own problems—security will probably be the main one, if it isn’t already.
In recognition that there are some interesting and complicated cryptocurrency and blockchain
technologies coming down the road, and that these need to be understood, protocols like
Kademlia are a good starting point for looking at any P2P DHT implementation. As to why
Kademlia specifically, the summary to the Kademlia specification says it best:
“With its novel XOR-based metric topology, Kademlia is the first peer-to-peer system to combine
provable consistency and performance, latency-minimizing routing, and a symmetric,
unidirectional topology. Kademlia furthermore introduces a concurrency parameter, α, that lets
people trade a constant factor in bandwidth for asynchronous lowest-latency hop selection and
delay-free fault recovery. Finally, Kademlia is the first peer-to-peer system to exploit the fact that
node failures are inversely related to uptime.”
4 http://sandhill.com/article/is-data-decentralization-the-new-trend/
5 https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf
6 https://github.com/zencoders/sambatyon/tree/master/Kademlia/Kademlia
7 http://xlattice.sourceforge.net/components/protocol/kademlia/specs.html
8 https://github.com/bmuller/kademlia
12
When reviewing an open-source implementation, it is recommended that you inspect the code
to verify that it implements the optimizations described in the longer specification.
Other languages
I have not looked carefully at implementations in languages other than C# and Python. There
are implementations in many other languages, those being primarily written in Java, Javascript,
and Go. Looking briefly at implementations in these other languages, it’s fairly easy to tell which
version of the specification they implement, so again, beware that depending on which
specification was used, you can have very different implementations. For example, an
implementation in Java makes a very specific (yet seemingly arbitrary) rule about bucket
splitting (we’ll get to that) that isn’t found in the spec.
Requirements
The code implemented in this book requires:
• C# 7
• .NET Framework 4.7
• Visual Studio 2017
13
• Improving Lookup Performance over a Widely-Deployed DHT
• Review on Detection and Mitigation of Sybil attack in the network, Procedia
Computer Science 78 (2016) 395-401
• Flowchain: A Case Study on Building a Blockchain for the IoT
• Distributed Ledger Technology: beyond block chain, a report by the UK Government
Chief Scientific Adviser
• Key collisions in key-values: While improbable, the best way to mitigate this is to
have your own peer create a random key for you.
• Peer ID collision: Again, the node ID should be created for you.
• Encrypting of values.
• Privacy of keys: While not practical with today’s technology, a malicious peer could
query for stored values across the entire 2160 key space.
• Serialization format of packets sent over the wire.
• Ability to limit what a peer stores based on value length.
• Private peer groups: Joining a public P2P network but creating a private peer group
within the network.
• Partial participation: What if you want to participate in a peer network for storing and
retrieving key-values, but don’t want to store key-values yourself? Perhaps you’re
running an IoT device with limited storage?
• Registering multiple peer IDs from the same network address: This is of particular
concern because you can use this to degrade the performance of a peer, as
discussed in the section “Degrading a Kademlia Peer.”
• A peer tampering with a value when it propagates the value to other peers: This can
be remedied by including a public key to ensure the value hasn’t been changed. And
while it’s strange to say “a peer,” it becomes an issue when you download a peer
application and you have no idea what’s going on inside—and even if you had the
source, would you know where to look?
14
Regarding unit tests
Some unit tests set up specific conditions for testing code. Others use randomly generated IDs
(node IDs and keys) and verify results through a different implementation of the same algorithm.
To ensure repeatability of those tests, the Random class is seeded in debug mode with the same
value, and is exposed as a public static field so that some unit tests can perform their tests with
a range of seeds.
#if DEBUG
public static Random rnd = new Random(1);
#else
private static Random rnd = new Random();
#endif
Also, most of these unit tests are really system-level tests, or at least partial-system tests. With
actual unit tests of specific code sections in a particular method, well, it gets inane rather
quickly. So you’ll see a lot of setup stuff being done in the higher-level tests.
The unit tests presented in this book are important not just because they test the underlying
implementation, but also because they demonstrate how to set up scenarios for testing the
Kademlia protocol under specific conditions. Thoroughly understanding the unit tests is probably
an even better way of understanding the Kademlia protocol than looking at the implementation!
15
Chapter 2 Key Concepts
The most complex part of the code is in the registration of new peers, because this involves
some magic numbers based on the Kademlia authors’ research into the performance of other
networks, such as Chord9 and Pasty,10 and the behavior of peers in those networks.
Kademlia terminology
Terms used specifically in the Kademlia specification are described here.
Overlay network: An overlay network is one in which each node keeps a (usually partial) list of
other nodes participating in the network.
Node ID: This is a 160-bit node identifier obtained from a SHA1 hash of some key, or is
randomly generated.
k-Bucket: A collection of at most k nodes (or contacts), also simply called a bucket. Each node
handles up to k contacts within a range of IDs. Initially, the ID range is the entire spectrum from
0 <= id <= 2160 - 1.
Key-Value: Peers store values based on 160-bit SHA1 hashed keys. Each stored entry consists
of a key-value pair.
Router: The router manages the collection of k-buckets, and also determines into which nodes
a key-value should be stored.
Distance/Closeness: The distance between a host and the key is an XOR computation of the
host’s ID with the key. Kademlia’s most significant feature is the use of this XOR computation to
determine the distance/closeness between IDs.
Prefix: A prefix is the term used to describe the n most significant bits (MSB) of an ID.
9 https://en.wikipedia.org/wiki/Chord_(peer-to-peer)
10 https://en.wikipedia.org/wiki/Pastry_(DHT)
16
Depth: The depth of a bucket is defined as the shared prefix of a bucket. Because buckets are
associated with ranges from 2i to 2i+1 - 1 where 0 <= i < 160, one could say that the depth of a
bucket is 160 - i. We’ll see later that this may not be the case.
Bucket Split: A bucket split potentially happens when a node’s k-bucket is full—meaning it has
k contacts—and a new contact with a given 160-bit key wants to register within the bucket’s
range for that key. At this point, an algorithm kicks in that:
• Under one condition, splits the bucket at the range midpoint into two ranges, placing
contacts into the appropriate new buckets.
• Under a second condition, splits the bucket when a specific depth qualifier is met.
• Under a third condition, replaces a peer that no longer responds with the newest
contact that is in a “pending” queue for that bucket.
Communication protocol
The Kademlia protocol consists of four remote procedure calls (RPCs). All RPCs require that
the sender provides a random RPC ID, which must be echoed by the recipient: “In all RPCs, the
recipient must echo a 160-bit random RPC ID, which provides some resistance to address
forgery; PINGS can also be piggy-backed on RPC replies for the RPC recipient to obtain
additional assurance of the sender’s network address.”
Anytime a peer is contacted with any of the four RPCs, it goes through the process of adding or
updating the contact in its own list. The concept of “closeness” will be discussed in detail later.
Ping
“The PING RPC probes a node to see if it is online.” This is considered a “primitive” function, in
that it just returns the random RPC ID that accompanied the Ping request.
Store
STORE instructs a node to store a (key,value) pair for later retrieval. This is also considered a
“primitive” function, as it again just returns the random RPC ID that accompanied the STORE
request. “To store a (key,value) pair, a participant locates the k closest nodes to the key and
sends them STORE RPCS.” The participant does this by inspecting its own k-closest nodes to
the key.
FindNode
“FIND_NODE takes a 160-bit ID as an argument. The recipient of the RPC returns (IP address,
UDP port, Node ID) triples for the k nodes it knows about closest to the target ID. These triples
can come from a single k-bucket, or they may come from multiple k-buckets if the closest k-
bucket is not full. In any case, the RPC recipient must return k items (unless there are fewer
than k nodes in all its k-buckets combined, in which case it returns every node it knows about).”
17
In an abstracted communication protocol, the recipient needs to return information about the
protocol: the kind of protocol and whatever is required to contact a peer using that protocol. If
multiple protocols are supported, we can consider two options:
• Return node information only for the protocols that the requester says it supports.
• Alternatively (and not as good an option), the requester can filter out returned nodes
whose protocols aren’t supported.
• The peer itself may support multiple protocols, so it should probably indicate what those
are when it registers with another peer.
• The peer may have a preferred protocol.
None of the issues of different protocols is discussed in the spec—this is purely my own
enhancement.
• A peer can issue this RPC on contacts it knows about, updating its own list of “close”
peers.
• A peer may issue this RPC to discover other peers on the network.
FindValue
“FIND_VALUE behaves like FIND_NODE—returning (IP address, UDP port, Node ID) triples—
with one exception. If the RPC recipient has received a STORE RPC for the key, it just returns
the stored value.”
If the FindValue RPC returns a list of other peers, it is up to the requester to continue searching
for the desired value from that list. Also, note this technique for caching key-values:
“To find a (key,value) pair, a node starts by performing a lookup to find the k nodes with IDs
closest to the key. However, value lookups use FIND_VALUE rather than FIND_NODE RPCS.
Moreover, the procedure halts immediately when any node returns the value. For caching
purposes, once a lookup succeeds, the requesting node stores the (key,value) pair at the
closest node it observed to the key that did not return the value.”
Other considerations
Expiration time
“Additionally, each node republishes (key,value) pairs as necessary to keep them alive, as
described later in Section 2.5. This ensures persistence (as we show in our proof sketch) of the
(key,value) pair with very high probability. For Kademlia’s current application (file sharing), we
also require the original publisher of a (key,value) pair to republish it every 24 hours. Otherwise,
(key,value) pairs expire 24 hours after publication, so as to limit stale index information in the
system. For other applications, such as digital certificates or cryptographic hash to value
mappings, longer expiration times may be appropriate.”
18
If we want to consider using Kademlia in a distributed ledger implementation, it seems
necessary that key-values never expire—otherwise, this would result in an integrity loss of the
ledger data.
Over-caching
“Because of the unidirectional nature of the topology, future searches for the same key are likely
to hit cached entries before querying the closest node. During times of high popularity for a
certain key, the system might end up caching it at many nodes. To avoid “over-caching,” we
make the expiration time of a (key,value) pair in any node’s database exponentially inversely
proportional to the number of nodes between the current node and the node whose ID is closest
to the key ID. While simple LRU eviction would result in a similar lifetime distribution, there is no
natural way of choosing the cache size, since nodes have no a priori knowledge of how many
values the system will store.”
Bucket refreshes
“Buckets are generally kept fresh by the traffic of requests traveling through nodes. To handle
pathological cases in which there are no lookups for a particular ID range, each node refreshes
any bucket to which it has not performed a node lookup in the past hour. Refreshing means
picking a random ID in the bucket’s range and performing a node search for that ID.”
Joining a network
“To join the network, a node u must have a contact to an already participating node w. u inserts
w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u
refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both
populates its own k-buckets and inserts itself into other nodes’ k-buckets as necessary.”
19
Chapter 3 Getting Started
>>> 2 ** 160
1461501637330902918203684832716283019655932542976L
• Dht: The peer’s entry point for interacting with other peers.
• Router: Manages peer (node) lookups for acquiring nearest peers and finding key-
value pairs.
• Node: Provides the Ping, Store, FindNode, and FindValue implementations.
• BucketList: Manages the contacts (peers) in each bucket and the algorithm for
adding contacts (peers) to a particular bucket.
• ID: Implements a wrapper for BigInteger and various helper methods and operator
overloads for the XOR logic
• Contact: Maintains the protocol that the contact (peer) uses, its ID, and LastSeen
DateTime, which is used for determining whether a peer should be tested for
eviction.
20
• KBucket: Retains the collection of contacts (peers) that are associated with a
specific bucket, implements the bucket splitting algorithm, and provides other useful
methods for obtaining information regarding the bucket.
• Blue: classes
• Orange: interfaces
• Purple: collections
• Green: value type fields
21
The ID class: initial implementation
Code Listing 3: ID Class
public class ID
{
public BigInteger Value { get { return id; } set { id = value; } }
Note: The Validate class defines some helper methods for asserting the
condition and throwing the exception specified as the generic parameter.
There are two things to note here. First, we append the byte array with a 0 to force unsigned
values in the BigInteger. If we don’t do this, any byte array where the MSB of byte[0] is set
will be treated as a negative number, which we don’t want when we are comparing the range of
a bucket. This is handled by a simple extension method, as in Code Listing 4.
22
Code Listing 4: The Append0 Extension Method
Second, the byte array is in little-endian order, meaning that the least significant byte is stored
first. This can get confusing when contrasted with the way that the Kademlia specification talks
about ID “prefixes” and the way we tend to represent bits in an array. For example, this ID, in
bits, is written out in big-endian order:
11001000 00001100
The 6-bit prefix would be 110010. As a BigInteger, the array is in little-endian order, so the
bytes are ordered such that the least significant byte (the LSB) is stored first:
00001100 11001000
23
Code Listing 6: Bits Extension Method
return bytes.SelectMany(GetBits);
[TestClass]
public class IDTests
{
[TestMethod]
public void LittleEndianTest()
{
byte[] test = new byte[20];
24
test[0] = 1;
[TestMethod]
public void PositiveValueTest()
{
byte[] test = new byte[20];
test[19] = 0x80;
Assert.IsTrue(new ID(test) == BigInteger.Pow(new BigInteger(2), 159),
"Expected
value to be 1.");
}
[TestMethod, ExpectedException(typeof(IDLengthException))]
public void BadIDTest()
{
byte[] test = new byte[21];
new ID(test);
}
[TestMethod]
public void BigEndianTest()
{
byte[] test = new byte[20];
test[19] = 0x80;
Assert.IsTrue(new ID(test).AsBigEndianBool[0] == true, "Expected big
endian bit
15 to be set.");
25
Assert.IsTrue(new ID(test).AsBigEndianBool[8] == false, "Expected big
endian bit
7 to NOT be set.");
}
{
protected Node node;
26
/// <summary>
/// Initialize a contact with its protocol and ID.
/// </summary>
public Contact(IProtocol protocol, ID contactID)
{
Protocol = protocol;
ID = contactID;
Touch();
}
/// <summary>
/// Update the fact that we’ve just seen this contact.
/// </summary>
public void Touch()
{
LastSeen = DateTime.Now;
}
27
public bool IsBucketFull { get { return contacts.Count == Constants.K;
} }
public KBucket()
{
contacts = new List<Contact>();
low = 0;
high = BigInteger.Pow(new BigInteger(2), 160);
}
/// <summary>
/// Initializes a k-bucket with a specific ID range.
/// </summary>
public KBucket(BigInteger low, BigInteger high)
{
contacts = new List<Contact>();
this.low = low;
this.high = high;
}
28
}
[TestClass]
public class KBucketTests
{
[TestMethod, ExpectedException(typeof(TooManyContactsException))]
public void TooManyContactsTest()
{
KBucket kbucket = new KBucket();
29
}
30
buckets = new List<KBucket>();
}
}
31
bucketList = new BucketList(contact.ID);
this.storage = storage;
}
return ourContact;
}
Of note here is the interface IStorage, which abstracts the storage mechanism for key-value
pairs. Ultimately, IStorage will implement the following methods from Code Listing 15.
32
string Get(BigInteger key);
DateTime GetTimeStamp(BigInteger key);
void Set(ID key, string value, int expirationTimeSec = 0);
int GetExpirationTimeSec(BigInteger key);
void Remove(BigInteger key);
List<BigInteger> Keys { get; }
void Touch(BigInteger key);
33
Chapter 4 Adding Contacts
Version 2, Section 2.2 of the specification initially states this simple algorithm for dealing adding
contacts:
“When a Kademlia node receives any message (request or reply) from another node, it updates
the appropriate k-bucket for the sender’s node ID. If the sending node already exists in the
recipient’s k-bucket, the recipient moves it to the tail of the list. If the node is not already in the
appropriate k-bucket and the bucket has fewer than k entries, then the recipient just inserts the
new sender at the tail of the list. If the appropriate k-bucket is full, however, then the recipient
pings the k-bucket’s least-recently seen node to decide what to do. If the least recently seen
node fails to respond, it is evicted from the k-bucket and the new sender inserted at the tail.
Otherwise, if the least-recently seen node responds, it is moved to the tail of the list, and the
new sender’s contact is discarded.”
34
Figure 2: The Add Contact Algorithm
35
“k-buckets effectively implement a least-recently seen eviction policy, except that live nodes are
never removed from the list. This preference for old contacts is driven by our analysis of
Gnutella trace data collected by Saroiu, et. al. ... The longer a node has been up, the more likely
it is to remain up another hour. By keeping the oldest live contacts around, k-buckets maximize
the probability that the nodes they contain will remain online. A second benefit of k-buckets is
that they provide resistance to certain DoS attacks. One cannot flush the nodes’ routing state by
flooding the system with new nodes. Kademlia nodes will only insert the new nodes in the k-
buckets when old nodes leave the system.”
We also observe that this has nothing to do with binary trees, which is something version 2 of
the spec introduced. This is basically a hangover from version 1 of the spec. However, Section
2.4 states something slightly different:
“Nodes in the routing tree are allocated dynamically, as needed. Initially, a node u’s routing tree
has a single node—one k-bucket covering the entire ID space. When u learns of a new contact,
it attempts to insert the contact in the appropriate k-bucket. If that bucket is not full, the new
contact is simply inserted. Otherwise, if the k-bucket’s range includes u’s own node ID, then the
bucket is split into two new buckets, the old contents divided between the two, and the insertion
attempt repeated. If a k-bucket with a different range is full, the new contact is simply dropped.”
Bucket-splitting
The purpose of allowing a bucket to split if it contains the host’s node ID is so that the host
keeps a list of nodes that are “close to it”—closeness is defined essentially by the integer
difference of the node IDs, not the XOR difference (more on this whole XOR thing later).
36
Figure 3: Bucket Splitting
What happened to pinging the least-seen contact and replacing it? Again, the spec then goes
on to say:
“One complication arises in highly unbalanced trees. Suppose node u joins the system and is
the only node whose ID begins 000. Suppose further that the system already has more than k
nodes with prefix 001. Every node with prefix 001 would have an empty k-bucket into which u
should be inserted, yet u’s bucket refresh would only notify k of the nodes. To avoid this
problem, Kademlia nodes keep all valid contacts in a subtree of size at least k nodes, even if
this requires splitting buckets in which the node’s own ID does not reside. Figure 5 illustrates
these additional splits.”
• A subtree of size at least k nodes: The reason the subtree contains “at least k nodes”
is that when the parent is split, it creates two subtrees whose total number of nodes
begins with k but may contain more than k nodes as nodes are added to each
branch (or the branch splits again into two more branches.)
37
• Even if this requires splitting buckets in which the node’s own ID does not reside: Not
only is this contradictory, but there’s no explanation of what “even if this requires”
means. How do you code this?
This section of the specification apparently creates much confusion—I found several links with
people asking about this section. It’s unfortunate that the original authors themselves do not
answer these questions. Jim Dixon has a very interesting response on The Mail Archive ,11
which I present in full here:
“The source of confusion is that the 13-page version of the Kademlia uses the same
term to refer to two different data structures. The first is well-defined: k-bucket i contains
zero to k contacts whose XOR distance is [2^i..2^(i+1)). It cannot be split. The current
node can only be in bucket zero, if it is present at all. In fact, its presence would be
pointless or worse.
The second thing referred to as a k-bucket doesn’t have the same properties.
Specifically, the current node must be present, it wanders from one k-bucket to another,
these k-buckets can be split, and there are sometimes ill-defined constraints on the
characteristics of subtrees of k-buckets, such as the requirement that “Kademlia nodes
keep all valid contacts in a subtree of size of at least k nodes, even if this requires
splitting buckets in which the node’s own ID does not reside” (section 2.4, near the end).
In a generous spirit, you might say that the logical content of the two descriptions is the
same. However, for someone trying to implement Kademlia, the confusion of terms
causes headaches—and leads to a situation where all sorts of things are described as
Kademlia, because they can be said to be, if you are of a generous disposition.
However, not surprisingly, they don’t interoperate.”
So, my decision, given the ambiguity of the spec, is to ignore this, because as you will see next,
there is yet another version of how contacts are added.
In Section 4.2, on Accelerated Lookups, we have a different specification for how contacts are
added:
“Section 2.4 describes how a Kademlia node splits a k-bucket when the bucket is full and its
range includes the node’s own ID. The implementation, however, also splits ranges not
containing the node’s ID, up to b - 1 levels. If b = 2, for instance, the half of the ID space not
containing the node’s ID gets split once (into two ranges); if b = 3, it gets split at two levels into a
maximum of four ranges, etc. The general splitting rule is that a node splits a full k-bucket if the
bucket’s range contains the node’s own ID or the depth d of the k-bucket in the routing tree
satisfies d (mod b) != 0.”
Depth. According to the spec: “The depth is just the length of the prefix shared by all nodes in
the k-bucket’s range.” Do not confuse that with this statement in the spec: “Define the depth, h,
of a node to be 160 - i, where i is the smallest index of a nonempty bucket.” The former is
referring to the depth of a k-bucket, the latter the depth of the node.
11 https://www.mail-archive.com/[email protected]/msg00042.html
38
With regard to the definition of depth, does this mean “the length of prefix shared by any node
that would reside in the k-bucket’s range,” or does it mean “the length of a prefix shared by all
nodes currently in the k-bucket”?
def depth(self):
sp = sharedPrefix([bytesToBitString(n.id) for n in
self.nodes.values()])
return len(sp)
def sharedPrefix(args):
i = 0
while i < min(map(len, args)):
if len(set(map(operator.itemgetter(i), args))) != 1:
break
i += 1
return args[0][:i]
Here, the depth is determined by the shared prefixes in the nodes. So, when we use the
following algorithm to determine whether a bucket can be split.
• First, the HasInRange is testing whether our node ID is close to the contact’s node
ID. If our node ID is in the range of the bucket associated with the contact’s node ID,
then we know the two nodes are “close” in terms of the integer difference. Initially,
the range spans the entire 2160 ID space, so everybody is “close.” This test of
closeness is refined as new contacts are added.
• Regarding the depth mod 5 computation, I asked Brian Muller this question: “Is the
purpose of the depth to limit the number of ‘new’ nodes that a host will maintain
(ignoring for the moment the issue of pinging an old contact to see if can be
replaced)?” He replied: “Yes! The idea is that a node should know about nodes
spread across the network—though definitely not all of them. The depth is used as a
way to control how ‘deep’ a node’s understanding of the network is (and the number
of nodes it knows about).”
39
The depth to which the bucket has split is based on the number of bits shared in the prefix of the
contacts in the bucket. With random IDs, this number will initially be small, but as bucket ranges
become more narrow from subsequent splits, more contacts will begin the share the same prefix
and the bucket when split, will result in less “room” for new contacts. Eventually, when the
bucket range becomes narrow enough, the number of bits shared in the prefix of the contacts in
the bucket reaches the threshold b, which the spec says should be 5.
40
Figure 4: Degrading Adding Contacts
We can verify this with a unit test.
Code Listing 19: Forcing Add Contact Failure
[TestMethod]
public void ForceFailedAddTest()
{
Contact dummyContact = new Contact(new VirtualProtocol(), ID.Zero);
((VirtualProtocol)dummyContact.Protocol).Node =
new Node(dummyContact, new VirtualStorage());
41
Assert.IsTrue(bucketList.Buckets[0].Contacts.Count == 1,
"Expected 1 contact in bucket 0.");
Assert.IsTrue(bucketList.Buckets[1].Contacts.Count == 20,
"Expected 20 contacts in bucket 1.");
// This next contact should not split the bucket as depth == 5 and
therefore adding
// the contact will fail.
// Any unique ID >= 2^159 will do.
byte[] id = new byte[20];
id[19] = 0x80;
Contact newContact = new Contact(dummyContact.Protocol, new ID(id));
bucketList.AddContact(newContact);
Assert.IsTrue(bucketList.Buckets.Count == 2,
"Bucket split should not have occurred.");
Assert.IsTrue(bucketList.Buckets[0].Contacts.Count == 1,
"Expected 1 contact in bucket 0.");
Assert.IsTrue(bucketList.Buckets[1].Contacts.Count == 20,
"Expected 20 contacts in bucket 1.");
What we’ve effectively done is break Kademlia, as the peer will no longer accept half of the
possible ID range. As long as the peer ID is outside the range of a bucket whose shared prefix
mod b is 0, we can continue this process by adding contacts with a shared prefixes (assume
b==5) 01xxx, 001xx, 0001x, and 00001, and again for every multiple of b bits. If a peer has a
“small” ID, you can easily prevent it from accepting new contacts within half of its bucket ranges.
• IDs should not be created by the user; they should be assigned by the library. Of course,
given the open-source nature of all these implementations, enforcing this is impossible.
42
• A contact’s ID should be unique for its network address—in other words, a malicious
peer should not be able to create multiple contacts simply by providing a unique ID in its
contact request.
• One might consider increasing b as i in 2i increases. There might be some justification
for this, as the range 2159 through 2160 - 1 contains half the possible contacts, one might
allow the depth for bucket splitting to be greater than the recommended b = 5.
Implementation
Figure 5 shows the flowchart of what we are initially implementing.
43
Figure 5: Adding Contacts with Bucket Splitting
44
As I’m using Brian Muller’s implementation as the authority with regards to the spec, we’ll go
with how he coded the algorithm and will (eventually) incorporate the fallback where we discard
nodes in a full k-bucket that don’t respond to a ping—but that’s later.
The BucketList class implements the algorithm to add a contact. Note that the lock
statements ensure bucket lists are manipulated synchronously, as the peer server will be
receiving commands asynchronously.
lock (this)
{
KBucket kbucket = GetKBucket(contact.ID);
if (kbucket.Contains(contact.ID))
{
// Replace the existing contact, updating the network info and
// LastSeen timestamp.
kbucket.ReplaceContact(contact);
}
else if (kbucket.IsBucketFull)
{
if (CanSplit(kbucket))
{
// Split the bucket and try again.
(KBucket k1, KBucket k2) = kbucket.Split();
int idx = GetKBucketIndex(contact.ID);
buckets[idx] = k1;
45
buckets.Insert(idx + 1, k2);
buckets[idx].Touch();
buckets[idx + 1].Touch();
AddContact(contact);
}
else
{
// TODO: Ping the oldest contact to see if it’s still
// around and replace it if not.
}
}
else
{
// Bucket isn’t full, so just add the contact.
kbucket.AddContact(contact);
}
}
}
Later, this will be extended to handle delayed eviction and adding new contacts that can’t fit in
the bucket to a pending queue.
46
}
/// <summary>
/// Returns number of bits that are in common across all contacts.
/// If there are no contacts, or no shared bits, the return is 0.
/// </summary>
public int Depth()
{
bool[] bits = new bool[0];
if (contacts.Count > 0)
{
// Start with the first contact.
bits = contacts[0].ID.Bytes.Bits().ToArray();
return bits.Length;
}
/// <summary>
47
/// Returns a new bit array of just the shared bits.
/// </summary>
protected bool[] SharedBits(bool[] bits, ID id)
{
bool[] idbits = id.Bytes.Bits().ToArray();
int q = Constants.ID_LENGTH_BITS - 1;
int n = bits.Length - 1;
List<bool> sharedBits = new List<bool>();
return sharedBits.ToArray();
48
Code Listing 22: The Split Method
Contacts.ForEach(c =>
{
// <, because the High value is exclusive in the HasInRange test.
KBucket k = c.ID < midpoint ? k1 : k2;
k.AddContact(c);
});
Recall that the ID is stored as a little-endian value, and the prefix is most significant bits, so we
have to work the ID backwards, n-1 to 0. Also note the implementation of the Bytes property in
the ID class:
/// <summary>
/// The array returned is in little-endian order (lsb at index 0)
/// </summary>
public byte[] Bytes
{
get
{
// Zero-pad msb's if ToByteArray length != Constants.LENGTH_BYTES
byte[] bytes = new byte[Constants.ID_LENGTH_BYTES];
49
byte[] partial =
id.ToByteArray().Take(Constants.ID_LENGTH_BYTES).ToArray();
// remove msb 0 at index 20.
partial.CopyTo(bytes, 0);
return bytes;
}
}
Unit tests
Here are a few basic unit tests. The VirtualProtocol and VirtualStorage classes will be
discussed later. Also note that the constructors for Contact don’t match the code in the basic
framework shown previously, as the unit tests here are reflective of the final implementation of
the code base.
[TestMethod]
public void UniqueIDAddTest()
{
Contact dummyContact = new Contact(new VirtualProtocol(), ID.Zero);
((VirtualProtocol)dummyContact.Protocol).Node =
new Node(dummyContact, new VirtualStorage());
50
[TestMethod]
public void DuplicateIDTest()
{
Contact dummyContact = new Contact(new VirtualProtocol(), ID.Zero);
((VirtualProtocol)dummyContact.Protocol).Node =
new Node(dummyContact, new VirtualStorage());
[TestMethod]
public void BucketSplitTest()
{
Contact dummyContact = new Contact(new VirtualProtocol(), ID.Zero);
((VirtualProtocol)dummyContact.Protocol).Node =
new Node(dummyContact, new VirtualStorage());
Assert.IsTrue(bucketList.Buckets.Count > 1,
"Bucket should have split into two or more buckets.");
}
51
Distribution tests reveal importance of randomness
What happens instead if we randomize the ID based on a random distribution of bucket slot
rather than a simple random ID? By this, we mean distributing the IDs evenly across the bucket
space, not the ID space. Some helper methods:
// TODO: Optimize
for (int i = bit + 1; i < Constants.ID_LENGTH_BITS; i++)
{
newid.ClearBit(i);
}
// TODO: Optimize
for (int i = minLsb; i < bit; i++)
{
if ((rnd.NextDouble() < 0.5) || forceBit1)
{
newid.SetBit(i);
}
}
return newid;
}
52
/// <summary>
/// Clears the bit n, from the little-endian LSB.
/// </summary>
public ID ClearBit(int n)
{
byte[] bytes = Bytes;
bytes[n / 8] &= (byte)((1 << (n % 8)) ^ 0xFF);
id = new BigInteger(bytes.Append0());
// for continuations.
return this;
}
/// <summary>
/// Sets the bit n, from the little-endian LSB.
/// </summary>
public ID SetBit(int n)
{
byte[] bytes = Bytes;
bytes[n / 8] |= (byte)(1 << (n % 8));
id = new BigInteger(bytes.Append0());
// for continuations.
return this;
}
53
Code Listing 26: RandomID and RandomIDInKeySpace
return id;
}
}
/// <summary>
/// Produce a random ID.
/// </summary>
public static ID RandomID
{
get
{
byte[] buffer = new byte[Constants.ID_LENGTH_BYTES];
rnd.NextBytes(buffer);
54
return new ID(buffer);
}
}
Let’s look at what happens when we assign a node ID as one of 2i where 0 <= i < 160 and add
3,200 integer random contact IDs. Here’s the unit test, which outputs the count of contacts
added to each node ID in the set of i.
[TestMethod]
public void DistributionTestForEachPrefix()
{
Contact dummyContact = new Contact(new VirtualProtocol(), ID.Zero);
((VirtualProtocol)dummyContact.Protocol).Node =
new Node(dummyContact, new VirtualStorage());
Random rnd = new Random();
StringBuilder sb = new StringBuilder();
160.ForEach((i) =>
{
BucketList bucketList =
new BucketList(new ID(BigInteger.Pow(new BigInteger(2), i)),
dummyContact);
3200.ForEach(() =>
{
Contact contact = new Contact(new VirtualProtocol(), ID.RandomID);
((VirtualProtocol)contact.Protocol).Node =
new Node(contact, new VirtualStorage());
bucketList.AddContact(contact);
});
55
int contacts = bucketList.Buckets.Sum(b => b.Contacts.Count);
sb.Append(i + "," + contacts + CRLF);
});
File.WriteAllText("prefixTest.txt", sb.ToString());
Compare this with the distribution of contact counts when the contact ID is selected from a
random prefix with randomized bits after the prefix, as opposed to a random integer ID.
[TestMethod]
public void
DistributionTestForEachPrefixWithRandomPrefixDistributedContacts()
{
Contact dummyContact =
new Contact(new VirtualProtocol(), ID.Zero);
56
((VirtualProtocol)dummyContact.Protocol).Node =
new Node(dummyContact, new VirtualStorage());
StringBuilder sb = new StringBuilder();
160.ForEach((i) =>
{
BucketList bucketList =
new BucketList(new ID(BigInteger.Pow(new BigInteger(2), i)),
dummyContact);
File.WriteAllText("prefixTest.txt", sb.ToString());
}
57
Figure 7: Distributed Prefix Distribution
If there was a question as to whether to choose a node ID based on an even distribution in the
prefix space versus simply a random integer ID, I think this clearly demonstrates that a random
integer ID is the best choice.
58
Chapter 5 Node Lookup
“The most important procedure a Kademlia participant must perform is to locate the k closest
nodes to some given node ID. We call this procedure a node lookup. Kademlia employs a
recursive algorithm for node lookups. The lookup initiator starts by picking a nodes from its
closest non-empty k-bucket (or, if that bucket has fewer than a entries, it just takes the a closest
nodes it knows of). The initiator then sends parallel, asynchronous FIND_NODE RPCS to the a
nodes it has chosen, a is a system-wide concurrency parameter, such as 3.”
And:
“In the recursive step, the initiator resends the FIND_NODE to nodes it has learned about from
previous RPCs. (This recursion can begin before all a of the previous RPCs have returned). Of
the k nodes the initiator has heard of closest to the target, it picks a that it has not yet queried
and resends the FIND_NODE RPC to them. Nodes that fail to respond quickly are removed
from consideration until and unless they do respond. If a round of FIND_NODES fails to return a
node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the
k closest nodes it has not already queried. The lookup terminates when the initiator has queried
and gotten responses from the k closest nodes it has seen. When a = 1, the lookup algorithm
resembles Chord’s in terms of message cost and the latency of detecting failed nodes.
However, Kademlia can route for lower latency because it has the flexibility of choosing any one
of k nodes to forward a request to.”
59
Given a key...
Find closest
non-empty k-bucket
Closer
No
nodes? Yes
No remaining No
un-queried nodes
Return
max(k) closer For these nodes, up to k...
nodes
60
This looks complicated, but the implementation (shown later) is fairly straightforward.
Terminology: The lookup initiator: This is your own peer wanting to make a store or retrieve
call to other peers. The node lookup is performed before you store or retrieve a value so that
your peer has a reasonable.
“To store a (key,value) pair, a participant locates the k closest nodes to the key and sends them
STORE RPCS.” Here the term “locates” actually means performing the lookup. Contrast with
(from the spec): “To find a (key,value) pair, a node starts by performing a lookup to find the k
nodes with IDs closest to the key,” in which the term “lookup” is specifically used.
61
async def set_digest(self, dkey, value):
“““
Set the given SHA1 digest key (bytes) to the given value in the
network.
“““
node = Node(dkey)
nearest = self.protocol.router.findNeighbors(node)
...
From the spec, what does “nodes that fail to respond quickly” mean? Particularly, the term
“quickly”? This is an implementation-specific determination.
What if you, as a peer, don’t have peers in your own k-buckets? That shouldn’t happen (you
should at least have the peer you are contacting), but if that peer only has you in its k-buckets,
then there’s nothing to return.
From the spec, in “from its closest non-empty k-bucket,” what does “closest” mean? I am
assuming here that it is the XOR distance metric, but then the question is, what do we use as
the “key” for a bucket with a range of contacts? Since this is not defined, the implementation will
search all the contacts across all buckets for the initial set of contacts that are closer. Also, the
XOR distance computation means that we can’t just ping-pong in an outer search from the
bucket containing the range in which the key resides. This better matches the other condition
“or, if that bucket has fewer than a entries, it just takes the a closest nodes it knows of,” which
implies searching for all a closest nodes across all buckets.
Again from the spec: “The lookup initiator starts by picking a nodes.” What does “picking”
mean? Does this mean additionally sorting the contacts in the “closest bucket” also by
closeness? It’s completely undefined.
If you want to try the “closest bucket” version, enable the #define TRY_CLOSEST_BUCKET, which
is implemented like Code Listing 30.
62
List<Contact> nodesToQuery = allNodes.Take(Constants.ALPHA).ToList();
Otherwise, the implementation simply gets the closest a contacts across all buckets.
Code Listing 31: Closest Contacts Across All Buckets
However, this implementation, when testing with virtual nodes (where the system essentially
knows every other node) effectively gets the k closest contacts because it’s searched all the
buckets in virtual node space. So, if we want to exercise the algorithm, Code Listing 32 is better.
List<Contact> allNodes =
node.BucketList.GetKBucket(key).Contacts.Take(Constants.K).ToList();
This actually leads to the next problem. In the initial acquisition of contacts as per the previous
code, should contacts (I’m using “contact” and “node” rather interchangeably) that are closer at
this point be added to the list of closer contacts? The spec doesn’t say not to, but it doesn’t
explicitly say one should do this. Given that we pick only a contacts to start with, we definitely
don’t have the k contacts that the lookup is expected to return, so I’m implementing this as
described above—the a closest contacts we have are added to the “closer” list, and the a farther
contacts we have are added to the “farther” list.
Code Listing 33: Adding Closer/Further Contacts to the Probed Contacts List
What do we do with the contacts outside of a? Given this (from the spec): “If a round of
FIND_NODES fails to return a node any closer than the closest already seen, the initiator
resends the FIND_NODE to all of the k closest nodes it has not already queried,” does it apply
to the first query of a nodes, or only to the set of nodes returned after the query? I’m going to
assume that it applies to the remainder of the a nodes not queried in the first query, which will
be a maximum of k-a contacts.
63
The spec says this: “Most operations are implemented in terms of the above lookup procedure.”
Which operations, and when? We’ll have to address this later.
Therefore, the distance between a node and a key is the node ID XORed with the key.
Unfortunately, an XOR distance metric is not amenable to a pre-sorted list of IDs. The resulting
“distance” computation can be very different for two keys when XOR’d with the contact list. As
described on Stack Overflow:12
“The thing is that buckets don’t have to be full, and if you want to send, let’s say, 20 nodes in a
response, a single bucket will not suffice. So you must traverse the routing table (either sorted
based on your own node ID or by the natural distance) in ascending distance (XOR) order
relative to the target key to visit multiple buckets. Because the XOR distance metric folds at
each bit-carry (XOR == carry-less addition), it does not map nicely to any routing table layout. In
other words, visiting the nearest buckets won’t do...I figure that many people simply iterate over
the whole routing table, because for regular nodes it will only contain a few dozen buckets at
most, and a DHT node does not see much traffic, so it only has to execute this operation a few
times per second. If you implement this in a dense, cache-friendly data structure, then the lion’s
share might actually be the memory traffic and not the CPU instructions doing a few XORs and
comparisons.
The author of that post provides an implementation13 that doesn’t require a full table scan, but in
my implementation, we’ll just do a full scan of the bucket list contacts.
Implementation
Let’s start with a baseline implementation that for the moment:
12https://stackoverflow.com/questions/30654398/implementing-find-node-on-torrent-kademlia-routing-
table
13
https://github.com/the8472/mldht/blob/9fb056390b50e9ddf84ed7709283b528a77a0fe5/src/lbms/plugins/
mldht/kad/KClosestNodesSearch.java#L104-L170
64
This simplifies the implementation so that we can provide some unit tests for the basic
algorithm, then add the parallelism and a concept later, and our unit tests should still pass. Also,
the methods here are all marked as virtual in case you want to override the implementation.
First, some helper methods:
return closest;
}
public
(List<Contact> contacts, Contact foundBy, string val)
RpcFindNodes(ID key, Contact contact)
{
var (newContacts, timeoutError) =
contact.Protocol.FindNode(node.OurContact, key);
// Null continuation here to support unit tests where a DHT hasn't been
set up.
dht?.HandleError(timeoutError, contact);
65
return (newContacts, null, null);
}
Note that in this code we have a method for handling timeouts and other errors, which we’ll
describe later.
Func<
ID,
Contact,
(List<Contact> contacts,
Contact foundBy,
string val)> rpcCall,
List<Contact> closerContacts,
List<Contact> fartherContacts,
out string val,
out Contact foundBy)
{
// As in, peer's nodes:
66
// Null continuation is a special case primarily for unit testing when
we have
// no nodes in any buckets.
var nearestNodeDistance = nodeToQuery.ID ^ key;
lock (locker)
{
closerContacts.
AddRangeDistinctBy(peersNodes.
Where(p => (p.ID ^ key) < nearestNodeDistance),
(a, b) => a.ID == b.ID);
}
lock (locker)
{
fartherContacts.
AddRangeDistinctBy(peersNodes.
Where(p => (p.ID ^ key) >= nearestNodeDistance),
(a, b) => a.ID == b.ID);
}
Note that we always exclude our own node and the nodes we’re contacting. Also note the lock
statements to synchronize manipulating the contact list.
Also:
67
This parameter handles calling either FindNode or FindValue, which we’ll discuss later.
This flowchart all comes together in the Lookup method in the Router class. This is where the
complexity of the flowchart is implemented, so there’s a lot here. Remember that the rpcCall is
calling either a FindNode or a FindValue, so some of the logic here has to figure out to exit the
lookup if a value is found. There’s extensive use of tuples here as well, which hopefully makes
the code clearer! Lastly, this method is actually an abstract method in a base class because
later on, we’ll see implementing this algorithm as a parallel “find” (something the spec talks
about), but for now, Code Listing 38 shows the nonparallel implementation.
#if DEBUG
List<Contact> allNodes =
node.BucketList.GetKBucket(key).Contacts.Take(Constants.K).ToList();
#else
// This is a bad way to get a list of close contacts with virtual nodes
because
// we're always going to get the closest nodes right at the get go.
68
List<Contact> allNodes =
node.BucketList.GetCloseContacts(key,
node.OurContact.ID).Take(Constants.K).ToList();
#endif
closerContacts.AddRange(nodesToQuery.Where(
n => (n.ID ^ key) < (node.OurContact.ID ^ key)));
fartherContacts.AddRange(nodesToQuery.Where(
n => (n.ID ^ key) >= (node.OurContact.ID ^ key)));
var queryResult =
Query(key, nodesToQuery, rpcCall, closerContacts, fartherContacts);
if (queryResult.found)
{
#if DEBUG // For unit testing.
CloserContacts = closerContacts;
69
FartherContacts = fartherContacts;
#endif
return queryResult;
}
// Add any new closer contacts to the list we're going to return.
ret.AddRangeDistinctBy(closerContacts, (a, b) => a.ID == b.ID);
// Spec: The lookup terminates when the initiator has queried and
received
// responses from the k closest nodes it has seen.
while (ret.Count < Constants.K && haveWork)
{
closerUncontactedNodes =
closerContacts.Except(contactedNodes).ToList();
fartherUncontactedNodes =
fartherContacts.Except(contactedNodes).ToList();
bool haveCloser = closerUncontactedNodes.Count > 0;
bool haveFarther = fartherUncontactedNodes.Count > 0;
70
// We're about to contact these nodes.
queryResult =
Query(key, newNodesToQuery, rpcCall, closerContacts,
fartherContacts);
if (queryResult.found)
{
#if DEBUG // For unit testing.
CloserContacts = closerContacts;
FartherContacts = fartherContacts;
#endif
return queryResult;
}
}
else if (haveFarther)
{
var newNodesToQuery =
fartherUncontactedNodes.Take(Constants.ALPHA).ToList();
contactedNodes.AddRangeDistinctBy(fartherUncontactedNodes,
(a, b) => a.ID == b.ID);
queryResult =
Query(key, newNodesToQuery, rpcCall, closerContacts,
fartherContacts);
if (queryResult.found)
71
{
#if DEBUG // For unit testing.
CloserContacts = closerContacts;
FartherContacts = fartherContacts;
#endif
return queryResult;
}
}
}
This algorithm is iteration, not recursion, as the specification states. Every implementation I’ve
seen uses iteration.
72
Code Listing 39: AddRangeDistinctBy
Unit tests
Before getting into unit tests, we need to be able to create virtual nodes, which means
implementing a minimal VirtualProtocol:
/// <summary>
/// For unit testing with deferred node setup.
/// </summary>
public VirtualProtocol(bool responds = true)
{
Responds = responds;
}
73
/// <summary>
/// Register the in-memory node with our virtual protocol.
/// </summary>
public VirtualProtocol(Node node, bool responds = true)
{
Node = node;
Responds = responds;
}
/// <summary>
/// Get the list of contacts for this node closest to the key.
/// </summary>
public (List<Contact> contacts, RpcError error) FindNode(Contact
sender, ID key)
{
return (Node.FindNode(sender, key).contacts, NoError());
}
/// <summary>
/// Returns either contacts or null if the value is found.
/// </summary>
public (List<Contact> contacts, string val, RpcError error)
FindValue(Contact sender, ID key)
74
{
var (contacts, val) = Node.FindValue(sender, key);
/// <summary>
/// Stores the key-value on the remote peer.
/// </summary>
public RpcError Store(
Contact sender,
ID key,
string val,
bool isCached = false,
int expTimeSec = 0)
{
Node.Store(sender, key, val, isCached, expTimeSec);
return NoError();
}
Notice that for unit testing, we have the option to simulate a nonresponding node.
We also have to implement FindNode in the Node class (the other methods we’ll implement
later).
75
Code Listing 41: FindNode
Validate.IsFalse<SendingQueryToSelfException>(sender.ID ==
ourContact.ID,
"Sender should not be ourself!");
SendKeyValuesIfNewContact(sender);
bucketList.AddContact(sender);
// Exclude sender.
var contacts = bucketList.GetCloseContacts(key, sender.ID);
Note that this call attempts to add the sender as a contact, which either adds (maybe) the
contact to the recipient’s bucket list or updates the contact information. If the contact is new, any
key-values that are “closer” to the new peer are sent. This is an important aspect of the
Kademlia algorithm: peers that are “closer” to the stored key-values get those key-values so that
the efficiency of lookups to find a key-value is improved.
We also need an algorithm for finding close contacts across the recipient’s bucket range.
/// <summary>
/// Brute force distance lookup of all known contacts, sorted by
distance, then we
/// take at most k (20) of the closest.
/// </summary>
/// <param name="toFind">The ID for which we want to find close
contacts.</param>
/// <param name="exclude">The ID to exclude (the requestor's ID)</param>
public List<Contact> GetCloseContacts(ID key, ID exclude)
76
{
lock (this)
{
var contacts = buckets.
SelectMany(b => b.Contacts).
Where(c => c.ID != exclude).
Select(c => new { contact = c, distance = c.ID ^ key }).
OrderBy(d => d.distance).
Take(Constants.K);
This tests the sorting and the maximum number of contacts returned limit. Note that it’s probably
not the most ideal situation that I’m using all random IDs; however, to ensure consistent unit
testing, we seed the random number generator with the same value in DEBUG mode.
#if DEBUG
public static Random rnd = new Random(1);
#else
private static Random rnd = new Random();
#endif
[TestMethod]
77
public void GetCloseContactsOrderedTest()
{
Contact sender = new Contact(null, ID.RandomID);
// Verify the contacts with the smallest distances were returned from
all
// possible distances.
78
var lastDistance = distances[distances.Count - 1];
Assert.IsTrue(others.Count() == 0,
"Expected no other contacts with a smaller distance than the
greatest
distance to exist.");
/// <summary>
/// Given that all the nodes we’re contacting are nodes *being*
contacted,
/// the result should be no new nodes to contact.
/// </summary>
[TestMethod]
public void NoNodesToQueryTest()
{
// Setup
router = new Router(new Node(new Contact(null, ID.Mid), new
VirtualStorage()));
79
// Fixup protocols:
nodes.ForEach(n => n.OurContact.Protocol = new VirtualProtocol(n));
// Our contacts:
nodes.ForEach(n => router.Node.BucketList.AddContact(n.OurContact));
// Each peer needs to know about the other peers except of course
itself.
nodes.ForEach(n => nodes.Where(nOther => nOther != n).
ForEach(nOther => n.BucketList.AddContact(nOther.OurContact)));
// Select the key such that n ^ 0 == n
// This ensures that the distance metric uses only the node ID, which
makes for
// an integer difference for distance, not an XOR distance.
key = ID.Zero;
// all contacts are in one bucket.
contactsToQuery = router.Node.BucketList.Buckets[0].Contacts;
closerContacts = new List<Contact>();
fartherContacts = new List<Contact>();
contactsToQuery.ForEach(c =>
{
router.GetCloserNodes(key, c, router.RpcFindNodes,
closerContacts,
fartherContacts,
out var _, out var _);
});
Assert.IsTrue(closerContacts.ExceptBy(contactsToQuery, c=>c.ID).Count()
== 0,
"No new nodes expected.");
80
Assert.IsTrue(fartherContacts.ExceptBy(contactsToQuery,
c=>c.ID).Count() == 0,
"No new nodes expected.");
}
List<Contact> closeContacts =
router.Lookup(key, router.RpcFindNodes, true).contacts;
List<Contact> contactedNodes = new List<Contact>(closeContacts);
GetAltCloseAndFar(
contactsToQuery,
closerContactsAltComputation,
fartherContactsAltComputation);
81
Assert.IsTrue(closeContacts.Count >=
closerContactsAltComputation.Count,
"Expected at least as many contacts.");
closerContactsAltComputation.ForEach(c =>
Assert.IsTrue(closeContacts.Contains(c)));
});
}
82
});
// Fixup protocols:
nodes.ForEach(n => n.OurContact.Protocol = new VirtualProtocol(n));
// Our contacts:
nodes.ForEach(n => router.Node.BucketList.AddContact(n.OurContact));
// Each peer needs to know about the other peers except of course itself.
contactsToQuery =
router.Node.BucketList.GetKBucket(key).Contacts.Take(Constants.ALPHA).ToLis
t();
// or:
// contactsToQuery =
//
router.FindClosestNonEmptyKBucket(key).Contacts.Take(Constants.ALPHA).ToLis
t();
83
fartherContactsAltComputation = new List<Contact>();
Furthermore, we use a different implementation for acquiring closer and farther contacts, so that
we can verify the algorithm under test. Code Listing 48 shows the alternate implementation.
Code Listing 48: Alternate Implementation for Getting Closer and Farther Nodes
84
foreach (Contact closeContactOfContactedNode in
closeContactsOfContactedNode)
{
// Which of these contacts are closer?
if ((closeContactOfContactedNode.ID ^ key) < distance)
{
closer.AddDistinctBy(closeContactOfContactedNode, c =>
c.ID.Value);
}
farther.AddDistinctBy(closeContactOfContactedNode, c =>
c.ID.Value);
}
}
}
}
SimpleCloserContacts test
This test, and the next one, exercise the part of the Lookup algorithm before the while loop.
Note how the bucket and IDs are set up.
[TestMethod]
public void SimpleAllCloserContactsTest()
{
// Setup
85
// By selecting our node ID to zero, we ensure that all distances of
// other nodes are > the distance to our node.
router = new Router(new Node(new Contact(null, ID.Max), new
VirtualStorage()));
// Fixup protocols:
nodes.ForEach(n => n.OurContact.Protocol = new VirtualProtocol(n));
// Our contacts:
nodes.ForEach(n => router.Node.BucketList.AddContact(n.OurContact));
// Each peer needs to know about the other peers except of course
itself.
nodes.ForEach(n => nodes.Where(nOther => nOther != n).
ForEach(nOther => n.BucketList.AddContact(nOther.OurContact)));
86
Assert.IsTrue(contacts.Count == Constants.K, "Expected k closer
contacts.");
Assert.IsTrue(router.CloserContacts.Count == Constants.K,
"All contacts should be closer.");
Assert.IsTrue(router.FartherContacts.Count == 0,
"Expected no farther contacts.");
}
SimpleFartherContacts test
Again, note how the bucket and IDs are set up in Code Listing 50.
/// <summary>
/// Creates a single bucket with node IDs 2^i for i in [0, K) and
87
Constants.K.ForEach((n) => nodes.Add(new Node(new Contact(null,
new ID(BigInteger.Pow(new BigInteger(2), n))), new
VirtualStorage())));
// Fixup protocols:
nodes.ForEach(n => n.OurContact.Protocol = new VirtualProtocol(n));
// Our contacts:
nodes.ForEach(n => router.Node.BucketList.AddContact(n.OurContact));
// Each peer needs to know about the other peers except of course
itself.
nodes.ForEach(n => nodes.Where(nOther => nOther != n).
ForEach(nOther => n.BucketList.AddContact(nOther.OurContact)));
88
Chapter 6 Value Lookup
From the spec: “FIND_VALUE behaves like FIND_NODE - returning (IP address, UDP port,
Node ID) triples - with one exception. If the RPC recipient has received a STORE RPC for the
key, it just returns the stored value.”
That seems clear enough, but we must consider this part of the spec as well:
“To find a (key,value) pair, a node starts by performing a lookup to find the k nodes with IDs
closest to the key. However, value lookups use FIND_VALUE rather than FIND_NODE RPCS.
Moreover, the procedure halts immediately when any node returns the value. For caching
purposes, once a lookup succeeds, the requesting node stores the (key,value) pair at the
closest node it observed to the key that did not return the value.”
However, the spec says this: “Most operations are implemented in terms of the above lookup
procedure.” When we’re performing a lookup, the initiator will be using the lookup call for both
finding nodes and finding values. If it’s finding values, it needs to stop if/when the value is found.
This statement from the spec can be ambiguous: “For caching purposes, once a lookup
succeeds, the requesting node stores the (key,value) pair at the closest node it observed to the
key that did not return the value.” What does “requesting node” mean? Is it the node performing
the lookup, or the node that made the GetValue request? It would seem to be the former,
because “it observed to the key that did not return the value” would otherwise not make any
sense.
We can see now the reason for passing in the RPC, as we want to use the exact same lookup
algorithm for FindNode as we do for FindValue.
Discussing value lookup tests doesn’t make sense outside of the context of the Dht wrapper, so
the unit tests for the value lookup will be done in the Dht testing.
Implementation
First, we need to implement a simple virtual (in memory) storage mechanism. We don’t show
the implementation of the full interface, as that isn’t required yet.
89
{
protected ConcurrentDictionary<BigInteger, StoreValue> store;
public VirtualStorage()
{
store = new ConcurrentDictionary<BigInteger, StoreValue>();
}
We can also now implement Store and FindValue in the Node class.
90
bool isCached = false,
int expirationTimeSec = 0)
{
Validate.IsFalse<SendingQueryToSelfException>(sender.ID ==
ourContact.ID,
"Sender should not be ourself!");
bucketList.AddContact(sender);
if (isCached)
{
cacheStorage.Set(key, val, expirationTimeSec);
}
else
{
SendKeyValuesIfNewContact(sender);
storage.Set(key, val, Constants.EXPIRATION_TIME_SECONDS);
}
}
if (storage.Contains(key))
{
91
return (null, storage.Get(key));
}
else if (CacheStorage.Contains(key))
{
return (null, CacheStorage.Get(key));
}
else
{
// Exclude sender.
return (bucketList.GetCloseContacts(key, sender.ID), null);
}
}
92
Chapter 7 The DHT Class
We use a wrapper Dht class, which will become the main entry point for our peer, for interacting
with other peers. The purposes of this class are:
• When storing a value, use the lookup algorithm to find other closer peers to propagate
the key-value.
• When looking up a value, if our peer doesn’t have the value, we again use the lookup
algorithm to find other closer nodes that might have the value.
• Later, we’ll add a bootstrapping method that registers our peer with another peer and
initializes our bucket list with that peer’s closest contacts.
Implementation
Code Listing 54: The Dht Class
public Dht(
ID id,
IProtocol protocol,
Func<IStorage> storageFactory,
BaseRouter router)
{
originatorStorage = storageFactory();
FinishInitialization(id, protocol, router);
}
93
protected void FinishInitialization(ID id, IProtocol protocol,
BaseRouter router)
{
ourId = id;
ourContact = new Contact(protocol, id);
node = new Node(ourContact);
node.Dht = this;
node.BucketList.Dht = this;
this.protocol = protocol;
this.router = router;
this.router.Node = node;
this.router.Dht = this;
}
public void Store(ID key, string val)
{
TouchBucketWithKey(key);
string ourVal;
List<Contact> contactsQueried = new List<Contact>();
94
(bool found, List<Contact> contacts, string val) ret = (false, null,
null);
}
// else this is where we will deal with republish and cache storage
later
else
{
var lookup = router.Lookup(key, router.RpcFindValue);
if (lookup.found)
{
ret = (true, null, lookup.val);
// Find the first close contact (other than the one the value
// was found by) in which to *cache* the key-value.
var storeTo = lookup.contacts.Where(c => c != lookup.foundBy).
OrderBy(c => c.ID ^ key).FirstOrDefault();
if (storeTo != null)
{
95
HandleError(error, storeTo);
}
}
}
return ret;
}
What exactly should the sender do when a value is not found? The Dht returns the nearest
nodes, but given that the lookup failed to find the value, we know these nodes also do not have
the value. As far as I’ve been able to determine, neither the spec nor a search of the web
indicates what to do.
Unit tests
LocalStoreFoundValue
To get started, let’s just make sure we can set and get values in our local store with an empty
bucket list.
[TestMethod]
public void LocalStoreFoundValueTest()
{
VirtualProtocol vp = new VirtualProtocol();
96
string retval = dht.FindValue(key).val;
Assert.IsTrue(retval == val, "Expected to get back what we stored");
}
ValueStoredInCloserNode
This test creates a single contact and stores the value in that contact. We set up the IDs so that
the contact’s ID is less (XOR metric) than our peer’s ID, and we use a key of ID.Zero to
prevent further complexities when computing the distance. Most of the code here is to set up the
conditions to make this test!
[TestMethod]
public void ValueStoredInCloserNodeTest()
{
VirtualProtocol vp1 = new VirtualProtocol();
VirtualProtocol vp2 = new VirtualProtocol();
VirtualStorage store1 = new VirtualStorage();
VirtualStorage store2 = new VirtualStorage();
// Ensures that all nodes are closer, because ID.Max ^ n < ID.Max when
n > 0.
Dht dht = new Dht(ID.Max, vp1, new Router(), store1, store1, new
VirtualStorage());
vp1.Node = dht.Router.Node;
97
dht.Router.Node.BucketList.AddContact(otherContact);
Assert.IsFalse(store1.Contains(key),
"Expected our peer to NOT have cached the key-value.");
Assert.IsTrue(store2.Contains(key),
"Expected other node to HAVE cached the key-value.");
// Try and find the value, given our Dht knows about the other contact.
string retval = dht.FindValue(key).val;
The method SimpleStore simply stores the value in the node’s storage—this method is
available only in DEBUG mode for unit testing.
98
ValueFoundInFartherNode
We can change the setup of the IDs and verify that the we find the value in a farther node.
[TestMethod]
public void ValueStoredInFartherNodeTest()
{
VirtualProtocol vp1 = new VirtualProtocol();
VirtualProtocol vp2 = new VirtualProtocol();
VirtualStorage store1 = new VirtualStorage();
VirtualStorage store2 = new VirtualStorage();
// Ensures that all nodes are closer, because ID.Max ^ n < ID.Max when
n > 0.
99
// Set the value in the other node, to be discovered by the lookup
process.
string val = "Test";
otherNode.SimpleStore(key, val);
Assert.IsFalse(store1.Contains(key),
"Expected our peer to NOT have cached the key-value.");
Assert.IsTrue(store2.Contains(key),
"Expected other node to HAVE cached the key-value.");
// Try and find the value, given our Dht knows about the other contact.
string retval = dht.FindValue(key).val;
ValueStoredGetsPropagated
Here we test that when we store a value to our peer, it also gets propagated to another peer
that our peer knows about.
[TestMethod]
public void ValueStoredGetsPropagatedTest()
{
VirtualProtocol vp1 = new VirtualProtocol();
VirtualProtocol vp2 = new VirtualProtocol();
VirtualStorage store1 = new VirtualStorage();
VirtualStorage store2 = new VirtualStorage();
100
// Ensures that all nodes are closer, because ID.Max ^ n < ID.Max when
n > 0.
Dht dht = new Dht(ID.Max, vp1, new Router(), store1, store1, new
VirtualStorage());
vp1.Node = dht.Router.Node;
dht.Store(key, val);
Assert.IsTrue(store1.Contains(key),
"Expected our peer to have stored the key-value.");
Assert.IsTrue(store2.Contains(key),
"Expected the other peer to have stored the key-value.");
}
101
GetValuePropagatesToCloserNode
This test verifies that, given three nodes (the first of which is us), where node 2 has the value, a
get value also propagates to node 3 because a lookup was performed.
[TestMethod]
public void GetValuePropagatesToCloserNodeTest()
{
VirtualProtocol vp1 = new VirtualProtocol();
VirtualProtocol vp2 = new VirtualProtocol();
VirtualProtocol vp3 = new VirtualProtocol();
VirtualStorage store1 = new VirtualStorage();
VirtualStorage store2 = new VirtualStorage();
VirtualStorage store3 = new VirtualStorage();
VirtualStorage cache3 = new VirtualStorage();
// Ensures that all nodes are closer, because ID.Max ^ n < ID.Max when
n > 0.
Dht dht = new Dht(ID.Max, vp1, new Router(), store1, store1, new
VirtualStorage());
vp1.Node = dht.Router.Node;
// Setup node 2:
102
dht.Router.Node.BucketList.AddContact(otherContact2);
// Setup node 3:
103
Assert.IsTrue(cache3.Contains(key), "Key should be in the cache
store.");
Assert.IsTrue(cache3.GetExpirationTimeSec(key.Value) ==
Constants.EXPIRATION_TIME_SECONDS / 2, "Expected 12 hour
expiration.");
104
Chapter 8 The Dht–Bootstrapping
From the spec: “To join the network, a node u must have a contact to an already participating
node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own
node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the
refreshes, u both populates its own k-buckets and inserts itself into other nodes’ k-buckets as
necessary.”
“The joining node inserts the bootstrap node into one of its k-buckets. The joining node then
does a FIND_NODE of its own ID against the bootstrap node (the only other node it knows).
The “self-lookup” will populate other nodes’ k-buckets with the new node ID, and will populate
the joining node’s k-buckets with the nodes in the path between it and the bootstrap node. After
this, the joining node refreshes all k-buckets further away than the k-bucket the bootstrap node
falls in. This refresh is just a lookup of a random key that is within that k-bucket range.”
By choosing a random ID within the contact’s bucket range, we are creating an ID whose prefix
determines the ordering of the contacts returned by GetCloseContacts:
This will sort the contacts such that those that are closer—those where no bits are set in the
prefix of the contact—are first in the list. Ideally, with many peers participating, we should get k
contacts that are closer.
Of particular note here is that when a peer network is small or in the throes of being born, other
contacts that nodes have will not be discovered until the bootstrapping bucket splits. We’ll see
how the network self-corrects later on. It’s also interesting to realize that “joining” actually means
contacting another node with any one of the four RPC calls. A new peer could join an existing
network with its first RPC being FindValue!
Bootstrapping implementation
Getting a random ID within a bucket range is based on knowing that bucket ranges are always
powers of 2. We use this for unit testing.
/// <summary>
/// Returns an ID within the range of the bucket's Low and High range.
/// The optional parameter forceBit1 is for our unit tests.
105
/// This works because the bucket low-high range will always be a power
of 2!
/// </summary>
public static ID RandomIDWithinBucket(KBucket bucket, bool forceBit1 =
false)
{
// Simple case:
// High = 1000
// Low = 0010
// We want random values between 0010 and 1000
return id;
106
Bootstrapping
The actual bootstrap implementation is straightforward.
/// <summary>
/// Bootstrap our peer by contacting another peer, adding its contacts
/// to our list, then getting the contacts for other peers not in the
/// bucket range of our known peer we're joining.
/// </summary>
public RpcError Bootstrap(Contact knownPeer)
{
node.BucketList.AddContact(knownPeer);
var (contacts, error) = knownPeer.Protocol.FindNode(ourContact, ourId);
HandleError(error, knownPeer);
if (!error.HasError)
{
contacts.ForEach(c => node.BucketList.AddContact(c));
KBucket knownPeerBucket = node.BucketList.GetKBucket(knownPeer.ID);
// Resolve the list now, so we don't include additional contacts as
we
// add to our bucket additional contacts.
var otherBuckets = node.BucketList.Buckets.Where(
b => b != knownPeerBucket).ToList();
otherBuckets.ForEach(b => RefreshBucket(b));
107
}
return error;
contacts.ForEach(c =>
{
[TestMethod]
public void RandomWithinBucketTests()
{
108
// Must be powers of 2.
List<(int low, int high)> testCases = new List<(int low, int high)>()
{
(0, 256), // 7 bits should be set
(256, 1024), // 2 bits (256 + 512) should be set
(65536, 65536 * 2), // no additional bits should be set.
(65536, 65536 * 4), // 2 bits (65536 and 65536*2) should be set.
109
BootstrapWithinBootstrappingBucket
In the actual bootstrapping unit test, we are setting up a bootstrapping peer we are joining to
with 10 contacts. One of those contacts also knows about 10 other contacts. The joining peer
will receive 10 contacts (for a total of 11, the bootstrapper + 10) and will not find any others
because the “other peers not in the known peer bucket” are all in the same bucket (the bucket
hasn’t split yet). The IDs for our peers are irrelevant in this scenario.
[TestMethod]
public void BootstrapWithinBootstrappingBucketTest()
{
// We need 22 virtual protocols. One for the bootstrap peer,
// 10 for the nodes the bootstrap peer knows about, and 10 for the
nodes
// one of those nodes knows about, and one for us to rule them all.
VirtualProtocol[] vp = new VirtualProtocol[22];
22.ForEach((i) => vp[i] = new VirtualProtocol());
// Us
Dht dhtUs = new Dht(ID.RandomID, vp[0], () => new VirtualStorage(), new
Router());
vp[0].Node = dhtUs.Router.Node;
110
Contact c = new Contact(vp[i + 2], ID.RandomID);
n = new Node(c, new VirtualStorage());
vp[i + 2].Node = n;
dhtBootstrap.Router.Node.BucketList.AddContact(c);
});
// One of those nodes, in this case the last one we added to our
bootstrapper
// for convenience, knows about 10 other contacts.
10.ForEach((i) =>
{
Contact c = new Contact(vp[i + 12], ID.RandomID);
Node n2 = new Node(c, new VirtualStorage());
vp[i + 12].Node = n;
n.BucketList.AddContact(c); // Note we're adding these contacts to
the 10th node.
});
dhtUs.Bootstrap(dhtBootstrap.Router.Node.OurContact);
Assert.IsTrue(dhtUs.Router.Node.BucketList.Buckets.Sum(
c => c.Contacts.Count) == 11, "Expected our peer to get 11
contacts.");
BootstrapOutsideBootstrappingBucket
In this test, we set up 20 nodes in the bootstrap peer so that we know how the buckets split for
us (20 in the left one, one in the right one) and add 10 contacts to the one in the right one.
Because out bootstrap peer will be in our left bucket, we should have a total of 31 contacts
(bootstrap + its 20 contacts + the other nodes 10 contacts).
111
Code Listing 65: BootstrapOutsideBootstrappingBucketTest
[TestMethod]
public void BootstrapOutsideBootstrappingBucketTest()
{
// We need 32 virtual protocols. One for the bootstrap peer,
// 20 for the nodes the bootstrap peer knows about, 10 for the nodes
// one of those nodes knows about, and one for us to rule them all.
VirtualProtocol[] vp = new VirtualProtocol[32];
32.ForEach((i) => vp[i] = new VirtualProtocol());
// All IDs are < 2^159 except the last one, which is >= 2^159
112
// which will force a bucket split for _us_
if (i < 19)
{
id = ID.Zero.RandomizeBeyond(Constants.ID_LENGTH_BITS - 1);
}
else
{
id = ID.Max;
}
// One of those nodes, in this case specifically the last one we added
to
// our bootstrapper so that it isn't in the bucket of our bootstrapper,
// we add 10 contacts. The IDs of those contacts don't matter.
10.ForEach((i) =>
{
Contact c = new Contact(vp[i + 22], ID.RandomID);
Node n2 = new Node(c, new VirtualStorage());
vp[i + 22].Node = n;
n.BucketList.AddContact(c);// Note we're adding these contacts to the
10th node.
});
dhtUs.Bootstrap(dhtBootstrap.Router.Node.OurContact);
113
Assert.IsTrue(dhtUs.Router.Node.BucketList.Buckets.Sum(
c => c.Contacts.Count) == 31, "Expected our peer to have 31
contacts.");
}
114
Chapter 9 Bucket Management
else if (kbucket.IsBucketFull)
{
if (CanSplit(kbucket))
{
// Split the bucket and try again.
(KBucket k1, KBucket k2) = kbucket.Split();
115
int idx = GetKBucketIndex(contact.ID);
buckets[idx] = k1;
buckets.Insert(idx + 1, k2);
buckets[idx].Touch();
buckets[idx + 1].Touch();
AddContact(contact);
}
else
{
if (error.HasError)
{
116
Bucket refresh
From the spec: “Buckets are generally kept fresh by the traffic of requests traveling through
nodes. To handle pathological cases in which there are no lookups for a particular ID range,
each node refreshes any bucket to which it has not performed a node lookup in the past hour.
Refreshing means picking a random ID in the bucket’s range and performing a node search for
that ID.”
The phrase “any bucket to which it has not performed a node lookup” is subject to at least two
interpretations. One way to interpret this is possibly “the bucket whose range contains the key in
the key-value pair for a Store or FindValue operation. Another interpretation is “the k-bucket
containing the range for any contact ID queried during during the lookup process.” This second
approach might seem more correct because the original alpha contacts is determined from the
list of closest contacts across all buckets, but it then becomes arbitrary as to whether to also
touch the buckets containing the contacts returned by the FindNodes query that are then
queried further.
I am choosing the first interpretation, which means that the bucket containing the key gets
touched in the Store and FindValue methods of the Dht class.
TouchBucketWithKey(key);
...
}
117
}
118
// Isolate in a separate list as contacts collection for this bucket
might change.
List<Contact> contacts = bucket.Contacts.ToList();
contacts.ForEach(c =>
{
var (newContacts, timeoutError) = c.Protocol.FindNode(ourContact,
rndId);
HandleError(timeoutError, c);
newContacts?.ForEach(otherContact =>
node.BucketList.AddContact(otherContact));
});
}
Note that now when a bucket is refreshed, it is always touched, which updates its “last seen”
timestamp.
Unit tests
Two unit tests for eviction verify the two possible conditions.
/// <summary>
/// Tests that a nonresponding contact is evicted after
/// Constant.EVICTION_LIMIT tries.
/// </summary>
[TestMethod]
public void NonRespondingContactEvictedTest()
{
// Create a DHT so we have an eviction handler.
119
Dht dht = new Dht(ID.Zero, new VirtualProtocol(), () => null, new
Router());
Assert.IsTrue(bucketList.Buckets[1].Contacts.Count == 20,
"Expected 20 contacts in bucket 1.");
Constants.EVICTION_LIMIT.ForEach(() =>
bucketList.AddContact(nextNewContact));
120
Assert.IsTrue(bucketList.Buckets[1].Contacts.Count == 20,
"Expected 20 contacts in bucket 1.");
Assert.IsTrue(dht.PendingContacts.Count == 0,
"Pending contact list should now be empty.");
Assert.IsFalse(bucketList.Buckets.SelectMany(
b => b.Contacts).Contains(nonRespondingContact),
"Expected bucket to NOT contain non-responding contact.");
Assert.IsTrue(bucketList.Buckets.SelectMany(
b => b.Contacts).Contains(nextNewContact),
"Expected bucket to contain new contact.");
Assert.IsTrue(dht.EvictionCount.Count == 0,
"Expected no contacts to be pending eviction.");
}
/// <summary>
/// Tests that a nonresponding contact puts the new contact into a
pending list.
/// </summary>
[TestMethod]
public void NonRespondingContactDelayedEvictionTest()
{
// Create a DHT so we have an eviction handler.
Dht dht = new Dht(ID.Zero, new VirtualProtocol(), () => null, new
Router());
121
Assert.IsTrue(bucketList.Buckets.Count == 2,
"Bucket split should have occurred.");
Assert.IsTrue(bucketList.Buckets[0].Contacts.Count == 1,
"Expected 1 contact in bucket 0.");
Assert.IsTrue(bucketList.Buckets[1].Contacts.Count == 20,
"Expected 20 contacts in bucket 1.");
VirtualProtocol(((VirtualProtocol)nonRespondingContact.Protocol).Node,
false);
nonRespondingContact.Protocol = vpUnresponding;
bucketList.AddContact(nextNewContact);
Assert.IsTrue(bucketList.Buckets[1].Contacts.Count == 20,
"Expected 20 contacts in bucket 1.");
122
Assert.IsTrue(dht.PendingContacts.Count == 1, "Expected one pending
contact.");
Assert.IsTrue(dht.PendingContacts.Contains(nextNewContact),
"Expected pending contact to be the 21st contact.");
Assert.IsTrue(dht.EvictionCount.Count == 1,
"Expected one contact to be pending eviction.");
}
Both unit tests setup a “failed split” so that the eviction routines are triggered.
123
Assert.IsTrue(bucketList.Buckets.Count == 1,
"Bucket split should not have occurred.");
Assert.IsTrue(bucketList.Buckets[0].Contacts.Count == 1,
"Expected 1 contact in bucket 0.");
// make sure contact IDs all have the same 5-bit prefix
// and are in the 2^159 ... 2^160 - 1 space
byte[] bcontactID = new byte[20];
bcontactID[19] = 0x80;
// 1000 xxxx prefix, xxxx starts at 1000 (8)
// this ensures that all the contacts in a bucket match only the prefix
as
// only the first 5 bits are shared.
// |----| shared range
// 1000 1000 ...
// 1000 1100 ...
// 1000 1110 ...
byte shifter = 0x08;
int pos = 19;
Constants.K.ForEach(() =>
{
bcontactID[pos] |= shifter;
ID contactID = new ID(bcontactID);
dummyContact = new Contact(new VirtualProtocol(), ID.One);
((VirtualProtocol)dummyContact.Protocol).Node =
new Node(dummyContact, new VirtualStorage());
bucketList.AddContact(new Contact(dummyContact.Protocol, contactID));
shifter >>= 1;
124
if (shifter == 0)
{
shifter = 0x80;
--pos;
}
});
return bucketList;
125
Chapter 10 Key-Value Management
In this section, you will learn something that is not at all clearly stated in the Kademlia
specification—there are actually three kinds of data store:
It is important to know that the Kademlia specification does not discuss this at all. The
information in this section has been gleaned from closely looking at the discussion on the eMule
project forum, particularly miniminime’s discussion of the roles that a node can take on.14
Implementing this approach requires that the receiver peer knows whether to store the key-
value in the republish store or in the cache store.
Republished store
This store contains key-values that have been republished by other peers as part of process of
distributing the data among peers. This store never contains the peer’s (as an originator) own
storage, but only key-values received from other peers. Key-values in this store are republished
to other peers only if a closer peer is found. This check occurs every hour and is optimized to
avoid calling the lookup algorithm:
• For a particular bucket if it has already been queried for closer nodes.
• If the key has been republished in the last hour.
Cached store
The cached store is used when republishing a FindValue request onto the next closest node.
The intention here is to avoid hotspots during lookup of popular keys by temporarily republishing
them onto other nearby peers. These temporary key-values have an expiration time. Key-values
in the cached store are never republished, and are removed after the expiration time.
14 https://forum.emule-project.net/index.php?showtopic=32335&view=findpost&p=214837
126
Storage mechanisms in the Dht class
The three storage mechanisms are managed in the Dht class.
The Dht class implements a few constructor options, the first primarily for aiding unit testing.
/// <summary>
/// Use this constructor to initialize the stores to the same instance.
/// </summary>
public Dht(
ID id,
IProtocol protocol,
Func<IStorage> storageFactory, BaseRouter router)
{
originatorStorage = storageFactory();
republishStorage = storageFactory();
cacheStorage = storageFactory();
FinishInitialization(id, protocol, router);
SetupTimers();
}
/// <summary>
/// Supports different concrete storage types. For example, you may want
/// the cacheStorage to be an in-memory store, the originatorStorage to
be
/// a SQL database, and the republish store to be a key-value database.
/// </summary>
127
public Dht(
ID id,
IProtocol protocol,
BaseRouter router,
IStorage originatorStorage,
IStorage republishStorage,
IStorage cacheStorage)
{
this.originatorStorage = originatorStorage;
this.republishStorage = republishStorage;
this.cacheStorage = cacheStorage;
FinishInitialization(id, protocol, router);
SetupTimers();
}
The ability to specify different storage mechanisms can be very useful; however, this means that
a Node must store the key-value in the appropriate storage.
/// <summary>
/// Store a key-value pair in the republish or cache storage.
/// </summary>
public void Store(
Contact sender,
ID key,
string val,
bool isCached = false,
int expirationTimeSec = 0)
{
Validate.IsFalse<SendingQueryToSelfException>(sender.ID ==
ourContact.ID,
"Sender should not be ourself!");
bucketList.AddContact(sender);
128
if (isCached)
{
cacheStorage.Set(key, val, expirationTimeSec);
}
else
{
SendKeyValuesIfNewContact(sender);
storage.Set(key, val, Constants.EXPIRATION_TIME_SECONDS);
}
Republishing key-values
From the spec: “To ensure the persistence of key-value pairs, nodes must periodically republish
keys. Otherwise, two phenomena may cause lookups for valid keys to fail. First, some of the k
nodes that initially get a key-value pair when it is published may leave the network. Second,
new nodes may join the network with IDs closer to some published key than the nodes on which
the key-value pair was originally published. In both cases, the nodes with a key-value pair must
republish it so as once again to ensure it is available on the k nodes closest to the key.
To compensate for nodes leaving the network, Kademlia republishes each key-value pair once
an hour. A naive implementation of this strategy would require many messages—each of up to k
nodes storing a key-value pair would perform a node lookup followed by k - 1 STORE RPCs
every hour.”
From Wikipedia,15 which can be helpful for understanding the spec with different phrasing:
“Periodically, a node that stores a value will explore the network to find the k nodes that are
close to the key value and replicate the value onto them. This compensates for disappeared
nodes.”
And:
“The node that is providing the file [key-value] will periodically refresh the information onto the
network (perform FIND_NODE and STORE messages). When all of the nodes having the file
[key-value] go offline, nobody will be refreshing its values (sources and keywords) and the
information will eventually disappear from the network.”
15 https://en.wikipedia.org/wiki/Kademlia
129
The Wikipedia write-up clarifies what is meant by “on the k nodes closest to the key.” In other
words, for each key, a FindNode is called to find closer nodes, and the value is republished.
Without the optimizations, this can be a time-consuming process if there’s a lot of key-values in
a node’s store, which is addressed in an optimization later.
First optimization
From the spec: “Fortunately, the republishing process can be heavily optimized. First, when a
node receives a STORE RPC for a given key-value pair, it assumes the RPC was also issued to
the other k - 1 closest nodes, and thus the recipient will not republish the key-value pair in the
next hour. This ensures that as long as republication intervals are not exactly synchronized, only
one node will republish a given key-value pair every hour.”
This first optimization is simple—when receiving a store, update the timestamp on the key-
value. Any key-value that has been touched within the last hour is not republished, as we can
assume:
Second optimization
From the spec: “A second optimization avoids performing node lookups before republishing
keys. As described in Section 2.4, to handle unbalanced trees, nodes split k-buckets as required
to ensure they have complete knowledge of a surrounding subtree with at least k nodes. If,
before republishing key-value pairs, a node u refreshes all k-buckets in this subtree of k nodes,
it will automatically be able to figure out the k closest nodes to a given key. These bucket
refreshes can be amortized over the republication of many keys.”
This second optimization is straightforward—if we’ve done a bucket refresh within the last hour,
we can avoid calling FindNode (the node lookup algorithm.) How do we determine the bucket to
test if it’s been refreshed? The bucket for which the key is in range should contain some closer
contacts we’ve seen for that key. While the answer might be obvious, it’s worthwhile to discuss
the reasoning here.
Buckets in the bucket list are maintained in range order rather than in a tree, which naturally
orders them by their prefix.
Table 1: Bucket Range(s)
Initial
0 .. 2160 1
Bucket
Two
0 .. 2159 | 2159 .. 2160 01, 1
Buckets
130
State Bucket Range(s) Prefix(es)
Four
0 .. 2158 | 2158 .. 2159 | 2159 - 2159 + 2158 | 2159 - 2159+2158 .. 2160 001, 01, 10, 1
Buckets
When we identify a bucket with a given key, the contacts in that bucket are closest, as per the
XOR computation on the prefix. For example, looking at the four buckets with prefixes 001, 01,
10, and 1, we see that the contacts in the key’s bucket range are closest (the closest bucket
contacts are in green, and farther bucket contacts are in red).
For this reason, we use the bucket for which the key is in range. Also, new key-values that are
published onto to closer nodes persist for 24 hours.
Implementation
Key-values in the republish store are republished at a particular interval, typically every hour.
/// <summary>
/// Replicate key values if the key-value hasn’t been touched within
/// the republish interval. Also don't do a FindNode lookup if the bucket
/// containing the key has been refreshed within the refresh interval.
/// </summary>
protected void KeyValueRepublishElapsed(object sender, ElapsedEventArgs
e)
{
131
DateTime now = DateTime.Now;
republishStorage.Keys.Where(k =>
(now - republishStorage.GetTimeStamp(k)).TotalMilliseconds >=
Constants.KEY_VALUE_REPUBLISH_INTERVAL).ForEach(k=>
{
ID key = new ID(k);
StoreOnCloserContacts(key, republishStorage.Get(key));
republishStorage.Touch(k);
});
Note how a lookup is only performed if the bucket containing the key hasn’t itself been refreshed
recently (within the past hour).
/// <summary>
/// Perform a lookup if the bucket containing the key has not been
refreshed,
/// otherwise just get the contacts the k closest contacts we know about.
/// </summary>
protected void StoreOnCloserContacts(ID key, string val)
{
DateTime now = DateTime.Now;
132
// Bucket has been refreshed recently, so don't do a lookup as we
// have the k closes contacts.
contacts = node.BucketList.GetCloseContacts(key, node.OurContact.ID);
}
else
{
contacts = router.Lookup(key, router.RpcFindNodes).contacts;
}
contacts.ForEach(c =>
{
RpcError error = c.Protocol.Store(node.OurContact, key, val);
HandleError(error, c);
});
}
/// <summary>
/// Any expired keys in the republish or node's cache are removed.
/// </summary>
133
}
Originator republishing
From the spec: “For Kademlia’s current application (file sharing), we also require the original
publisher of a (key,value) pair to republish it every 24 hours. Otherwise, (key,value) pairs expire
24 hours after publication, so as to limit stale index information in the system. For other
applications, such as digital certificates or cryptographic hash to value mappings, longer
expiration times may be appropriate.”
Republishing originator data is handled in a timer event that resends the key-values in the
originator’s storage.
originatorStorage.Keys.Where(
k => (now - originatorStorage.GetTimeStamp(k)).TotalMilliseconds >=
Constants.ORIGINATOR_REPUBLISH_INTERVAL).ForEach(k =>
{
134
ID key = new ID(k);
// Just use close contacts, don't do a lookup.
var contacts = node.BucketList.GetCloseContacts(key,
node.OurContact.ID);
contacts.ForEach(c =>
{
originatorStorage.Touch(k);
});
Interpretation:
A new node (contact) will be instructed to store key-values that exist on the bootstrapping node
(the one it’s boostrapping with) for key-values that meet the following condition: The key XOR’d
with the bootstrapping node’s ID < (closer than) the key XOR’d the IDs of other nodes.
135
What does “other nodes” mean? Are these all other contacts the bootstrapping node knows
about, or just the k closest contacts in the joining node’s bucket, or some other interpretation?
We have to understand what “exploiting complete knowledge of their surrounding subtrees”
means. First, this indicates that it isn’t just the joining node’s bucket. It would make sense to
interpret this as “store the values onto the joining node for any key-value where the joining node
will be closer to that key when there are no other nodes that are closer.” If the joining node
becomes the closest node to a key-value, then it is requested to store that key-value.
It’s interesting to note that this algorithm executes regardless of whether the bootstrapping node
actually added the the joining node to a k-bucket. Remember also that “joining” actually means
contacting another node with any one of the four RPC calls. When a new node registers,
republished key-values persist for 24 hours.
/// <summary>
/// For a new contact, we store values to that contact whose keys ^
ourContact
/// are less than stored keys ^ [otherContacts].
/// </summary>
protected void SendKeyValuesIfNewContact(Contact sender)
{
List<Contact> contacts = new List<Contact>();
if (IsNewContact(sender))
{
lock (bucketList)
{
// Clone so we can release the lock.
if (contacts.Count() > 0)
{
136
// and our distance to the key < any other contact's distance to
the key...
storage.Keys.AsParallel().ForEach(k =>
{
// our min distance to the contact.
var distance = contacts.Min(c => k ^ c.ID);
Annoyingly, for every stored value, there just isn’t any way to avoid performing the XOR
computation on every contact. This could get expensive, and it is currently optimized using
Linq’s parallel feature.
Determining whether a contact is new is slightly more complicated than one would think. We
need to check not only whether the contact exists in any of our buckets, but also whether it’s a
pending contact—one that wasn’t placed in a bucket because the bucket was full, but
nonetheless has already received any closer keys.
/// <summary>
/// Returns true if the contact isn't in the bucket list or the
/// pending contacts list.
/// </summary>
protected bool IsNewContact(Contact sender)
137
{
bool ret;
lock (bucketList)
{
// If we have a new contact...
ret = bucketList.ContactExists(sender);
}
return !ret;
}
Over-caching
From the spec: “To avoid ‘over-caching,’ we make the expiration time of a (key,value) pair in
any node’s database exponentially inversely proportional to the number of nodes between the
current node and the node whose ID is closest to the key ID.”
• Inversely proportional: Meaning that the expiration time is shorter the more nodes
that are between the current node and the closest node.
• Exponentially inversely proportional: Meaning the expiration time is a lot shorter with
the more nodes that are between the current node and closest node.
138
The specification provides no guidance for what the calculation for “exponentially inversely
proportional” should actually be. It’s also undefined as to what the time constants are—what is a
baseline time for which a key-value should persist? It is assumed that this should be a
maximum of 24 hours. We also need to track an expiration time that is separate from the key-
value republish timestamp. Furthermore, up to this point, I haven’t implemented the concept of
accelerated lookup optimization, which is where the value of b comes from. In this
implementation, where we have bucket ranges, rather than a bucket-per-bit in the key space,
the accelerated lookup optimization is irrelevant, so we’ll use b==5 which is the spec’s
recommended value for that optimization.
Also, who does the computation “between the current node and the node whose ID is closest to
the key ID?” Is the current node:
• The sender that is caching the key-value on another code and counts the number of
nodes between itself and receiving node?
• The receiver that is handling the store request and counts the number of nodes between
itself and the sender node?
As discussed earlier, the entire concept of having separate stores (originator, republished,
cached) is never discussed in the Kademlia specification. Without understanding these three
different stores, trying to understand how caching works is probably impossible.
Caching occurs in only one place—when a value being looked up (and successfully found) is
stored on a “close” node:
{
...
var lookup = router.Lookup(key, router.RpcFindValue);
if (lookup.found)
{
ret = (true, null, lookup.val);
// Find the first close contact (other than the one the value was
found by)
// in which to *cache* the key-value.
if (storeTo != null)
139
{
int separatingNodes = GetSeparatingNodesCount(ourContact, storeTo);
int expTimeSec = (int)(Constants.EXPIRATION_TIME_SECONDS /
Math.Pow(2, separatingNodes));
RpcError error = storeTo.Protocol.Store(node.OurContact, key,
lookup.val,
true, expTimeSec);
HandleError(error, storeTo);
}
Note the true flag, indicating that this RPC Store call is for caching purposes.
In a subclass of the Dht overriding this method, only the cached store expires.
140
Code Listing 84: TestNewContactGetsStoredContactsTest
[TestMethod]
public void TestNewContactGetsStoredContactsTest()
{
// Set up a node at the midpoint.
// The existing node has the ID 10000....
Node existing = new Node(new Contact(null, ID.Mid), new
VirtualStorage());
string val1 = "Value 1";
string valMid = "Value Mid";
Assert.IsTrue(existing.Storage.Keys.Count == 2,
"Expected the existing node to have two key-values.");
141
Node unseen = new Node(unseenContact, new VirtualStorage());
unseenvp.Node = unseen; // final fixup.
Assert.IsTrue(unseen.Storage.Keys.Count == 0,
"The unseen node shouldn't have any key-values!");
// Contacts V1 V2
// 10000000 00...0001 10...0000
// 01000000
// Math:
// c1 ^ V1 c1 ^ V2 c2 ^ V1 c2 ^ V2
// 100...001 000...000 010...001 110...000
Assert.IsTrue(unseen.Storage.Keys.Count == 1,
"Expected 1 value stored in our new node.");
Assert.IsTrue(unseen.Storage.Contains(ID.Mid),
"Expected valMid to be stored.");
Assert.IsTrue(unseen.Storage.Get(ID.Mid) == valMid,
"Expected valMid value to match.");
}
142
Other optimizations
Ping is simply a “respond back with the random ID” that was sent. Internally, the buckets are
potentially updated, and if the contact is new, store RPC calls are made to it for any values that
it should store, as discussed above when a new node registers.
Piggy-backed ping
In his paper, Bruno Spori writes:16
“The situation is different when the first message a node received is a request message. In this
case, the receiver cannot be sure whether the sender’s contact information [is] correct. It could
be that the request was faked. To determine this, the piggy-backed ping is used. The effect of
the piggy-backed ping is that the original sender of the request must send a ping reply upon
receiving the reply message. Thus, the receiver of the request message is able to determine the
correctness of the sender as well.”
We will instead rely on the error-handling mechanism for evicting contacts that do not respond
or respond incorrectly or with errors. Error handling will be discussed later.
16 http://pub.tik.ee.ethz.ch/students/2006-So/SA-2006-19.pdf
143
Chapter 11 Persisting the DHT
The bucket lists and contacts in each bucket need to be persisted so the last known state of the
DHT can be restored. This is baked into the Dht implementation, serializing the data in a JSON
file. The persistence of key-values is handled separately, and is defined by the specific
implementation needs. Note that the VirtualStorage class provided in the baseline code does
not persist its data. Internally, various properties are decorated with the JsonIgnore attribute to
prevent circular serialization, and some classes have parameter-less public constructors for
deserialization.
Serializing
This is straightforward—the only trick is enabling the type name handling in Newtonsoft.Json
so that properties with abstract and interface types also serialize their concrete type.
/// <summary>
/// Returns a JSON string of the serialized DHT.
/// </summary>
public string Save()
{
var settings = new JsonSerializerSettings();
settings.TypeNameHandling = TypeNameHandling.Auto;
return json;
}
Deserializing
The deserializer is equally simple, however note the call to DeserializationFixups. This
reduces the size of the JSON by not serializing certain properties that can be obtained from
other properties. As a result, some minor fixups are necessary.
144
Code Listing 86: Load the DHT
return dht;
}
145
Code Listing 87: DhtSerializationTest
[TestMethod]
public void DhtSerializationTest()
{
TcpSubnetProtocol p1 = new TcpSubnetProtocol(“http://127.0.0.1”, 2720,
1);
// Ensures that all nodes are closer, because ID.Max ^ n < ID.Max when
n > 0.
Dht dht = new Dht(ID.Max, p1, new Router(), store1, store1, new
VirtualStorage());
Assert.IsTrue(newDht.Node.BucketList.ContactExists(otherContact),
"Expected our contact to have the other contact.");
146
Assert.IsTrue(newDht.Router.Node == newDht.Node,
"Router node not initialized.");
}
When you look at the JSON, you suddenly realize that shared objects, particularly contacts, are
deserialized into separate instances. Because there are assumptions in the code regarding
“same instance,” and also as a way of ensuring that we’re comparing contacts correctly (using
their IDs), the Contact class implements IComparable and operator == and operator !=
overloads.
147
Chapter 12 Considerations for an
Asynchronous Implementation
Thread Safety
These entry points to the node must be re-entrant:
• Ping
• Store
• FindNode
• FindValue
The first issue is with add contacts and the bucket manipulation that occurs. Collections should
not be modified or otherwise manipulated while they are searched. We’ve seen the use of lock
statements in previous code to ensure that collections are not modified asynchronously.
There are potentially more optimized approaches, such as locking only the specific KBucket
being manipulated and only locking the BucketList when it itself is being modified; however, I
will leave those for another time.
It is also assumed that the storage implementation can be re-entrant. In the virtual storage, this
is handled by ConcurrentDictionary instances, for example:
Parallel queries
From the spec: “The initiator then sends parallel, asynchronous find_node RPCs to the a [sic]
nodes it has chosen, a is a system-wide concurrency parameter, such as 3.”
In the lookup algorithm, Kademlia uses parallel, asynchronous queries to reduce timeout delays
from failed nodes. Waiting for at least some of the nodes to respond in each batch of three
closest nodes gives the system a chance to get even closer nodes to those first set of close
nodes with the hope of acquiring k closer contacts without having to explore farther contacts.
It’s not particularly clear why all the k closer contacts aren’t queried in parallel to start with.
Maybe the idea is that you want to try to get closer contacts from already close contacts.
Certainly all the contacts could be queried and from the ones that respond first, we can select k
closer ones. On the other hand, querying all the contacts simultaneously probably results in
unnecessary network traffic as many of the FindNode RPC calls will be ignored.
148
The BaseRouter abstract class
For unit testing, it’s useful to keep the nonparallel implementation, but ideally, both parallel and
nonparallel calls to the Router should be made in the same way. An abstract BaseRouter class
allows for this.
The ParallelRouter
The ParallelRouter queues contacts to query, in addition to some other information each
thread needs to know about when executing the RPC call.
149
The ParallelRouter also initializes an internal thread pool.
Work is queued and a semaphore is released for a thread to pick up the work.
150
RpcCall = rpcCall,
CloserContacts = closerContacts,
FartherContacts = fartherContacts,
FindResult = findResult
});
Semaphore.Release();
if (contactQueue.TryDequeue(out item))
{
string val;
Contact foundBy;
if (GetCloserNodes(
item.Key,
item.Contact,
item.RpcCall,
item.CloserContacts,
item.FartherContacts,
151
out val,
out foundBy))
{
if (!stopWork)
{
// Possible multiple "found"
lock (locker)
{
item.FindResult.Found = true;
item.FindResult.FoundBy = foundBy;
item.FindResult.FoundValue = val;
item.FindResult.FoundContacts = new
List<Contact>(item.CloserContacts);
}
}
}
}
}
}
The salient point with the previous code is that when a value is found, it takes a snapshot of the
current closer contacts and stores all the information about a closer contact in fields belonging
to the ParallelLookup class.
The ParallelRouter must terminate its search after a certain amount of time, which handles
unresponsive contacts. Whenever a response is received and new contacts are added to the list
of contacts that can be queried, a timer is reset. The Lookup call exits when a value is found (for
FindValue), or k closer contacts have been found, or the time period expires.
/// <summary>
/// Sets the time of the query to now.
/// </summary>
152
protected void SetQueryTime()
{
now = DateTime.Now;
/// <summary>
/// Returns true if the query time has expired.
/// </summary>
protected bool QueryTimeExpired()
{
return (DateTime.Now - now).TotalMilliseconds > Constants.QUERY_TIME;
}
The Lookup inner loop is the where the work is done, as with the nonparallel version, but notice
instead how work is queued and we wait for responses—particularly the check for whether
we’ve waited long enough in the haveWork assignment.
...
ret.AddRangeDistinctBy(closerContacts, (a, b) => a.ID == b.ID);
// Spec: The lookup terminates when the initiator has queried and gotten
responses from the k closest nodes it has seen.
while (ret.Count < Constants.K && haveWork)
{
Thread.Sleep(Constants.RESPONSE_WAIT_TIME);
return foundReturn;
153
}
List<Contact> closerUncontactedNodes =
closerContacts.Except(contactedNodes).ToList();
List<Contact> fartherUncontactedNodes =
fartherContacts.Except(contactedNodes).ToList();
bool haveCloser = closerUncontactedNodes.Count > 0;
bool haveFarther = fartherUncontactedNodes.Count > 0;
154
alphaNodes.ForEach(
n => QueueWork(key, n, rpcCall, closerContacts, fartherContacts,
findResult));
SetQueryTime();
}
}
We can now take the Dht tests for the nonparallel version and create parallel versions of those
tests, passing in the ParallelRouter instead:
The result is that the parallel router unit tests also pass.
Figure 9
A potential problem occurs when there are threads still waiting for a response, and that
response possibly occurs at some point after the Lookup method exits. We deal with this in
several ways:
This ensures that even if there are threads still performing work on a previous lookup, they do
not affect the results of the current lookup.
155
While the use of a single locker object for blocking updates to collections and updating the find
value is slightly inefficient, it avoids using nested locks; otherwise the thread, when if finds a
value, would technically have to lock both the closerContacts collection and the findResult
instance. Nested locks should be avoided. Also note that the Lookup method itself is not
intended to be re-entrant.
156
Chapter 13 A Basic TCP Subnet Protocol
• A single port is used along with a subnet identifier to route requests to the correct
handler. The subnet identifier makes it easier to test multiple “servers” on the same
machine as we don’t need to open (and “allow”) a unique port number per “server.”
• JSON is used as the serialization format for request and response data.
• RPC calls are synchronous, in that they wait for a response. There are other
implementations that continue based on the random ID and handler, which I chose not to
implement.
Request messages
Requests issued by the DHT are serialized to JSON from the following classes.
public BaseRequest()
{
RandomID = ID.RandomID.Value;
}
}
157
}
158
{
public int Subnet { get; set; }
}
On the server side, which receives these messages, they are handled by a common request
class.
/// <summary>
/// For passing to Node handlers with common parameters.
/// </summary>
public class CommonRequest
{
public object Protocol { get; set; }
public string ProtocolName { get; set; }
public BigInteger RandomID { get; set; }
public BigInteger Sender { get; set; }
public BigInteger Key { get; set; }
public string Value { get; set; }
public bool IsCached { get; set; }
159
public int ExpirationTimeSec { get; set; }
As the comment states, the common request simplifies the server implementation by having
RPC handler methods with the same parameter.
Request handlers
The request handlers extract the pertinent pieces of the CommonRequest and call the
appropriate method of the Node class. The important part here is that the contact protocol must
be returned as part of the FindNode and FindValue response. Note that the returns are
anonymous objects.
160
public object ServerFindNode(CommonRequest request)
{
return new
{
Contacts = contacts.Select(c =>
new
{
Contact = c.ID.Value,
Protocol = c.Protocol,
ProtocolName = c.Protocol.GetType().Name
}).ToList(),
RandomID = request.RandomID
};
}
return new
161
{
Contacts = contacts?.Select(c =>
new
{
Contact = c.ID.Value,
Protocol = c.Protocol,
ProtocolName = c.Protocol.GetType().Name
})?.ToList(),
RandomID = request.RandomID,
Value = val
};
}
Responses
JSON responses are deserialized into the following classes.
162
{
public BigInteger Contact { get; set; }
public object Protocol { get; set; }
public string ProtocolName { get; set; }
}
Server implementation
The server is a straightforward HttpListener implemented as a C# HttpListenerContext
object, but note how the subnet ID is used to route the request to the specific node associated
with the subnet.
163
string data = new StreamReader(context.Request.InputStream,
context.Request.ContentEncoding).ReadToEnd();
if (context.Request.HttpMethod == "POST")
{
Type requestType;
string path = context.Request.RawUrl;
// Remove "//"
// Prefix our call with "Server" so that the method name is
unambiguous.
string methodName = "Server" + path.Substring(2);
CommonRequest commonRequest =
JsonConvert.DeserializeObject<CommonRequest>(data);
int subnet = ((ITcpSubnet)JsonConvert.DeserializeObject(
data, requestType)).Subnet;
INode node;
164
}
else
{
context.Response.Close();
}
165
Subnet = subnet,
Sender = sender.ID.Value,
Key = key.Value,
RandomID = id.Value
}, out error, out timeoutError);
try
{
var contacts = ret?.Contacts?.Select(
val => new Contact(Protocol.InstantiateProtocol(
val.Protocol, val.ProtocolName), new ID(val.Contact))).ToList();
/// <summary>
/// Attempt to find the value in the peer network.
/// </summary>
/// <returns>A null contact list is acceptable here as it is a valid
return
/// if the value is found.
166
/// The caller is responsible for checking the timeoutError flag to make
/// sure null contacts is not
/// the result of a timeout error.</returns>
public (List<Contact> contacts, string val, RpcError error)
FindValue(Contact sender, ID key)
{
ErrorResponse error;
ID id = ID.RandomID;
bool timeoutError;
try
{
var contacts = ret?.Contacts?.Select(
val => new Contact(Protocol.InstantiateProtocol(val.Protocol,
val.ProtocolName), new ID(val.Contact))).ToList();
167
}
catch (Exception ex)
{
168
public RpcError Store(Contact sender, ID key, string val, bool isCached =
false,
int expirationTimeSec = 0)
{
ErrorResponse error;
ID id = ID.RandomID;
bool timeoutError;
The RpcError class manages the kinds of errors that we can encounter, and is instantiated in
the GetRpcError method.
169
Code Listing 101: Handling RPC Errors
Note that this class reflects several different errors that can occur:
170
• ID mismatch: The peer responded, but not with an ID that matched the sender’s random
ID.
• Peer: The peer encountered an exception, in which case the exception message is
returned to the caller.
• Deserialization: The Post method catches JSON deserialization errors, which also
indicates an error with the peer response.
[TestClass]
public class TcpSubnetTests
{
protected string localIP = "http://127.0.0.1";
protected int port = 2720;
protected TcpSubnetServer server;
[TestInitialize]
public void Initialize()
{
server = new TcpSubnetServer(localIP, port);
}
[TestCleanup]
public void TestCleanup()
{
server.Stop();
}
...
171
The unit tests exercise each of the four RPC calls as well as a timeout error.
PingRouteTest
This test verifies the Ping RPC call.
[TestMethod]
public void PingRouteTest()
{
TcpSubnetProtocol p1 = new TcpSubnetProtocol(localIP, port, 1);
TcpSubnetProtocol p2 = new TcpSubnetProtocol(localIP, port, 2);
ID ourID = ID.RandomID;
Contact c1 = new Contact(p1, ourID);
Node n1 = new Node(c1, new VirtualStorage());
Node n2 = new Node(new Contact(p2, ID.RandomID), new VirtualStorage());
server.RegisterProtocol(p1.Subnet, n1);
server.RegisterProtocol(p2.Subnet, n2);
server.Start();
p2.Ping(c1);
}
Oddly there’s no assertion here, as nothing of note happens. The point of this is that no
exceptions are thrown.
StoreRouteTest
This test verifies the Store RPC call.
[TestMethod]
public void StoreRouteTest()
172
{
TcpSubnetProtocol p1 = new TcpSubnetProtocol(localIP, port, 1);
TcpSubnetProtocol p2 = new TcpSubnetProtocol(localIP, port, 2);
ID ourID = ID.RandomID;
Contact c1 = new Contact(p1, ourID);
Node n1 = new Node(c1, new VirtualStorage());
Node n2 = new Node(new Contact(p2, ID.RandomID), new VirtualStorage());
server.RegisterProtocol(p1.Subnet, n1);
server.RegisterProtocol(p2.Subnet, n2);
server.Start();
Assert.IsTrue(n2.Storage.Contains(testID),
"Expected remote peer to have value.");
Assert.IsTrue(n2.Storage.Get(testID) == testValue,
"Expected remote peer to contain stored value.");
}
FindNodesRouteTest
This test verifies the FindNodes RPC call.
[TestMethod]
public void FindNodesRouteTest()
{
TcpSubnetProtocol p1 = new TcpSubnetProtocol(localIP, port, 1);
TcpSubnetProtocol p2 = new TcpSubnetProtocol(localIP, port, 2);
173
ID ourID = ID.RandomID;
Contact c1 = new Contact(p1, ourID);
Node n1 = new Node(c1, new VirtualStorage());
Node n2 = new Node(new Contact(p2, ID.RandomID), new VirtualStorage());
server.RegisterProtocol(p1.Subnet, n1);
server.RegisterProtocol(p2.Subnet, n2);
server.Start();
ID id = ID.RandomID;
List<Contact> ret = p2.FindNode(c1, id).contacts;
FindValueRouteTest
This test verifies the FindValue RPC call.
[TestMethod]
public void FindValueRouteTest()
{
174
TcpSubnetProtocol p1 = new TcpSubnetProtocol(localIP, port, 1);
TcpSubnetProtocol p2 = new TcpSubnetProtocol(localIP, port, 2);
ID ourID = ID.RandomID;
Contact c1 = new Contact(p1, ourID);
Node n1 = new Node(c1, new VirtualStorage());
Node n2 = new Node(new Contact(p2, ID.RandomID), new VirtualStorage());
server.RegisterProtocol(p1.Subnet, n1);
server.RegisterProtocol(p2.Subnet, n2);
server.Start();
ID testID = ID.RandomID;
string testValue = "Test";
p2.Store(c1, testID, testValue);
Assert.IsTrue(n2.Storage.Contains(testID),
"Expected remote peer to have value.");
Assert.IsTrue(n2.Storage.Get(testID) == testValue,
"Expected remote peer to contain stored value.");
UnresponsiveNodeTest
This test verifies that an unresponsive node results in a timeout error.
175
Code Listing 107: UnresponsiveNodeTest
[TestMethod]
public void UnresponsiveNodeTest()
{
TcpSubnetProtocol p1 = new TcpSubnetProtocol(localIP, port, 1);
TcpSubnetProtocol p2 = new TcpSubnetProtocol(localIP, port, 2);
p2.Responds = false;
ID ourID = ID.RandomID;
Contact c1 = new Contact(p1, ourID);
Node n1 = new Node(c1, new VirtualStorage());
Node n2 = new Node(new Contact(p2, ID.RandomID), new VirtualStorage());
server.RegisterProtocol(p1.Subnet, n1);
server.RegisterProtocol(p2.Subnet, n2);
server.Start();
ID testID = ID.RandomID;
string testValue = "Test";
RpcError error = p2.Store(c1, testID, testValue);
176
Chapter 14 RPC Error Handling and
Delayed Eviction
One of the optimizations in the Kademlia protocol can now be implemented—delayed eviction.
From the spec: “When a Kademlia node receives an RPC from an unknown contact and the k-
bucket for that contact is already full with k entries, the node places the new contact in a
replacement cache of nodes eligible to replace stale k-bucket entries. The next time the node
queries contacts in the k-bucket, any unresponsive ones can be evicted and replaced with
entries in the replacement cache. The replacement cache is kept sorted by time last seen, with
the most recently seen entry having the highest priority as a replacement candidate.”
What happens when the peer throws an exception or the random ID that the peer responds with
doesn’t match what was sent? To make matters simple, in this implementation we’ll handle all
error conditions, including timeouts, in the same way.
The implementation for handling evictions and replacing them with contacts waiting to be added
to the bucket is handled in the Dht. First, the error handler.
/// <summary>
/// Put the timed out contact into a collection and increment the number
of
// times it has timed out.
/// If it has timed out a certain amount, remove it from the bucket and
/// replace it with the most
/// recent pending contact that are queued for that bucket.
/// </summary>
public void HandleError(RpcError error, Contact contact)
{
// For all errors:
int count = AddContactToEvict(contact.ID.Value);
if (count == Constants.EVICTION_LIMIT)
{
ReplaceContact(contact);
177
}
/// <summary>
/// The contact that did not respond (or had an error) gets n tries
before
/// being evicted and replaced with the most recently contact that wants
to
/// go into the non-responding contact's kbucket.
/// </summary>
/// <param name="toEvict">The contact that didn't respond.</param>
if (count == Constants.EVICTION_LIMIT)
{
ReplaceContact(toEvict);
178
}
• When a contact fails to respond with an RPC call, its eviction count is incremented, and
if exceeded, it is removed and replaced with the most recently seen contact that goes in
the bucket.
• Alternatively, when a contact is being added to a full bucket and the last seen contact
fails to respond (or has an error), it is added to the eviction pool, and the new contact
wanting to be added is placed into the pending contacts pool.
return count;
}
179
{
EvictContact(bucket, toEvict);
ReplaceWithPendingContact(bucket);
}
}
Validate.IsTrue<BucketDoesNotContainContactToEvict>(bucket.Contains(toEvict
.ID),
"Bucket doesn't contain the contact to be evicted.");
bucket.EvictContact(toEvict);
}
/// <summary>
/// Find a pending contact that goes into the bucket that now has room.
/// </summary>
protected void ReplaceWithPendingContact(KBucket bucket)
{
Contact contact;
180
if (contact != null)
{
pendingContacts.Remove(contact);
bucket.AddContact(contact);
}
}
}
181
Chapter 15 Putting It Together: A Demo
Figure 10
As usual, there’s nothing like a visual demo to see what is happening. This demo uses my
open-source diagramming tool FlowSharp17 as the drawing canvas. As we see in Figure 11, we
start with 60 peers (green), five of which are known peers (red).
NUM_DHT.ForEach((n) =>
{
17 https://github.com/cliftonm/FlowSharp
182
IProtocol protocol = new TcpSubnetProtocol("http://127.0.0.1", 2720,
n);
Dht dht = new Dht(ID.RandomID, protocol,
() => new VirtualStorage(), new Router());
peerColor[dht.ID.Value] = Color.Green;
server.RegisterProtocol(n, dht.Node);
dhts.Add(dht);
dhtPos.Add(new Rectangle(XOFFSET + rnd.Next(-JITTER, JITTER) + (n %
ITEMS_PER_ROW) * XSPACING, YOFFSET + rnd.Next(-JITTER, JITTER) + (n /
ITEMS_PER_ROW) * YSPACING, SIZE, SIZE));
});
}
NUM_KNOWN_PEERS.ForEach(() =>
{
Dht knownPeer = workingList[rnd.Next(workingList.Count)];
peerColor[knownPeer.ID.Value] = Color.Red;
knownPeers.Add(knownPeer);
workingList.Remove(knownPeer);
});
183
Figure 11
Bootstrapping
We can now bootstrap to a random peer.
184
As Figure 12 shows, after each peer bootstraps with one of the known peers (randomly
selected), the peer network is established.
Figure 12
The directionality of the connection is not shown—arrows would get lost in this drawing!
Bucket Refresh
To illustrate bucket refresh, let’s start with a smaller set of peers (25).
185
Figure 13
A bucket refresh calls FindNode all the contacts in each bucket. This updates the contacts for
each peer based on the k closest contacts returned by the contact.
186
Figure 14
The newly discovered contacts are drawn in purple. In a small network like this, just about every
peer learns about every other peer—another iteration of bucket refreshing results in only a
couple more contacts being discovered.
Store value
When a noncached value is stored to a peer, it is republished to close peers. We can see this
by coloring the originator with yellow, the immediate peer we’re storing the value to in blue, and
the peers to which the value is republished in orange.
/// <summary>
/// Color the originator with yellow
/// the immediate peer we're storing the value to in blue
187
/// and the peers to which the value is republished in orange:
/// </summary>
private void btnPublish_Click(object sender, EventArgs e)
{
firstContacts = new List<Dht>();
storeKey = ID.RandomID;
originatorDht = dhts[(int)nudPeerNumber.Value];
originatorDht.Store(storeKey, "Test");
System.Threading.Thread.Sleep(500);
In a small network, because the store gets published to k peers, most of the peers are involved.
188
Figure 15
Store republish
When we force an immediate store republish, we see one node, which is a closer contact,
getting the republished key-value.
189
Figure 16
In a larger network, this becomes more obvious (nodes have been moved to the topmost in the
rendering).
190
Figure 17
Remember that key-values are republished only on k closer contacts, so not every peer gets the
republished key-value.
191
Chapter 16 Things Not Implemented
There are a few items from the specification not implemented here.
UDP dropouts
From the spec:
“A related problem is that because Kademlia uses UDP, valid contacts will sometimes fail to
respond when network packets are dropped. Because packet loss often indicates network
congestion, Kademlia locks unresponsive contacts and avoids sending them any further RPCs
for an exponentially increasing backoff interval. Because at most stages Kademlia’s lookup only
needs to hear from one of k nodes, the system typically does not retransmit dropped RPCs to
the same node.
When a contact fails to respond to 5 RPCs in a row, it is considered stale. If a fe-bucket is not
full or its replacement cache is empty, Kademlia merely flags stale contacts rather than remove
them. This ensures, among other things, that if a node’s own network connection goes down
temporarily, the node won’t completely void all of its k-buckets.”
This is true not just for UDP packets but any connection—it may go down for a while. This
algorithm is somewhat entangled with delayed eviction. In delayed eviction, the spec state “any
unresponsive ones can be evicted.” It is the spec’s description of UDP dropouts that actually
defines what “unresponsive” actually means.
Accelerated lookups
Again, from the spec:
“When a contact fails to respond to 5 RPCs in a row, it is considered stale. If a fe-bucket is not
full or its replacement cache is empty, Kademlia merely flags stale contacts rather than remove
them. This ensures, among other things, that if a node’s own network connection goes down
temporarily, the node won’t completely void all of its k-buckets.
Another optimization in the implementation is to achieve fewer hops per lookup by increasing
the routing table size. Conceptually, this is done by considering IDs b bits at a time instead of
just one bit at a time. As previously described, the expected number of hops per lookup is log2n.
By increasing the routing table’s size to an expected 2b log2b n k-buckets, we can reduce the
number of expected hops to log2b n.”
In this implementation we have bucket ranges rather than a bucket per prefix bits in the key
space; therefore, the accelerated lookup optimization is irrelevant, because the bucket ranges
typically span many prefix bits.
192
Sybil attacks
Peer-to-peer networks are vulnerable to Sybil attacks:
“In a Sybil attack, the attacker subverts the reputation system of a peer-to-peer network by
creating a large number of pseudonymous identities, using them to gain a disproportionately
large influence. A reputation system’s vulnerability to a Sybil attack depends on how cheaply
identities can be generated, the degree to which the reputation system accepts inputs from
entities that do not have a chain of trust linking them to a trusted entity, and whether the
reputation system treats all entities identically.”18
In the current implementation, if the peer network is already well populated (most k-buckets are
full) a Sybil attack would not replace “good,” known peers—the attack would simply place the
attempt into the DHT’s pending contact buffer. In a mostly unpopulated network (most k-buckets
have room for more peers), the subsequent failure to get a response from a peer would result in
its eventual eviction. The piggyback ping approach is also a means for the recipient of the RPC
call to verify the sender.
18 https://en.wikipedia.org/wiki/Sybil_attack
193
Conclusion
Petar Maymounkov wrote: “...my opinion is that Kademlia is so simple, that with a modern
language like Go, a good implementation of Kademlia (the algorithm) is no more than 100 lines
of code.”19 Perhaps when looking at just the algorithm (the router in particular), this may be true,
but there is a tremendous amount of detail that has to go into the architecture and
implementation of the Kademlia protocol. As one dives deep into the spec, there are
contradictions between the two versions, between implementations out there in the wild and the
spec, and numerous ambiguities that must be resolved by carefully understanding the spec and
carefully inspecting other implementations and descriptions of the protocol. Anyone adopting a
library that implements the Kademlia protocol should have a thorough understanding of these
contradictions and ambiguities, and also must go through any implementation with a fine-tooth
comb to see how these are addressed—if they are addressed at all. In particular, when looking
at other implementations, does the implementation (in no particular order of importance):
These are some of the issues that we should look for. Even in the initial release of this
implementation, I have not addressed all these concerns. Regardless, if you made it this far, I
suspect you have a much better understanding of the Kademlia protocol, which also gives you
some tools for looking at other P2P protocols.
19 http://www.maymounkov.org/kademlia
194