Dxhash: A Scalable Consistent Hashing Based On The Pseudo-Random Sequence
Dxhash: A Scalable Consistent Hashing Based On The Pseudo-Random Sequence
✦
arXiv:2107.07930v2 [cs.DS] 18 Nov 2023
–
Statelessness × × ×
Lookup O(log(cn)) ②
O(1) O(log(n)) O(1), O(log(m)) ⑤
O((1 +ln( n
a
))2 ) ⑥ O( n
a
)
n
Update O(log(cn)) O(mlog(m)) ③ O(1) ④ O(1), O(m) O(1) ⑦ O( n−a ) or O(1) ⑧
n
Memory (Bytes) 24cn 4m O(1) 4m 16n 8
to 5n
①
Maglev and DxHash are statelessness for removals. The removal order of nodes does not affects the lookup results of these
two CH algorithms.
②
Karger Ring introduces virtual nodes for balance. Constant c in Karger Ring denotes the number of the virtual nodes pointing
to each physical node, and n is the number of physical nodes.
③
m in Maglev is the size of the lookup table. The value of m is recommended to be a prime number greater than 100n for
the balance and minimal disruption.
④
The updates in JCH are limited because only the last inserted node is allowed to be removed.
⑤
The updates in SACH are limited because the total number of nodes cannot exceed the initial maximum size. Similar to
Maglev, m in SACH denotes the size of the lookup table which is much larger than a. There are two update complexity
matching the two update schemes in SACH.
⑥
n is the upper bound of the cluster size, and a is the number of active nodes.
⑦
The upper bound of AnchorHash is immutable, while DxHash supports Scale operation to double the upper bound.
⑧
The update complexity and memory footprint of DxHash are determined by the detailed implementation.
these two has respective limitations. Another CH algorithm, Highest Random Weight (HRW)
In this paper, we propose DxHash, a scalable consistent [16], ensures complete balance and monotonicity. HRW as-
hashing algorithm based on the pseudo-random sequence. signs a unique identifier to each node and calculates random
By iteratively selecting possible nodes using a pseudo- weights for mapping new items based on a combination
random generator, DxHash provides nearly ideal perfor- of the item’s key and the node’s ID. The node with the
mance that satisfies the mentioned six properties. In the highest weight is selected as the mapping result. However,
evaluation, when the cluster size exceeds 1 million nodes HRW suffers from poor scalability due to significant com-
and 50% of the nodes fail, DxHash can still process 13.3 mil- putational overhead, resulting in a O(n) lookup complexity,
lion queries per second. Compared to state-of-the-art works, making it unsuitable for large-scale clusters.
DxHash exhibits better lookup and update performance
and improved scalability, with a smaller memory footprint. MaglevHash, proposed by Google in 2016 [3], is a high-
Furthermore, we combine distributed storage scenarios with efficiency CH that maintains large memory tables, where
DxHash to propose weighted DxHash. weighted DxHash keys are hashed to table indexes, and table contents are
adjusts the load on arbitrary nodes to make full use of node IDs, allowing for O(1) complexity queries. However,
hardware resources. for balance, the table size is much larger than the number of
The rest of the paper is organized as follows. Section 2 in- nodes, introducing significant extra memory consumption.
troduces related works and motivation, comparing classical Additionally, MaglevHash struggles with minimal disruption
or state-of-the-art CH algorithms. Sections 3 and 4 introduce and low update complexity.
DxHash. Section 5 introduces weighted DxHash. In Section
6, we evaluate the performance of DxHash in comparison Jump Consistent Hash (JCH) is a notable CH algo-
with existing CH algorithms. Finally, Section 7 concludes the rithm that leverages Pseudo-Random Sequence (PRS) [7].
paper. Here is a special thanks to chatGPT for contributions JCH calculates a pseudo-random sequence based on a key
to improving the writing of this paper. and compares it with a specified probability to determine
the node to which the key belongs. Although JCH meets
2 BACKGROUND standard CH requirements, it does not support arbitrary
node additions or removals. Changes are restricted to the
2.1 Related Work tail node, or else it would disrupt the minimal disruption
Karger Ring, the original CH scheme proposed in 1997 [6], property. Consequently, JCH is not suitable for scenarios
maps both nodes and keys into a cyclic hash space. The involving random and frequent node updates.
ring’s values increase from 0 to 232 in the clockwise direc-
tion, and each node is responsible for the keys in its assigned Recent CH proposals, such as SACH [10] and An-
segment. While Karger Ring achieves minimal disruption, en- chorHash [9], bring new perspectives. SACH uses double
suring only the affected keys in the segment are remapped hashing similar to Maglev, with two update algorithms,
during node addition or removal, it struggles to maintain fast but unbalanced, and slow but balanced. However,
balance due to variable segment lengths. To address this, SACH still faces challenges in memory footprint and up-
virtual nodes are introduced, raising the memory footprint date complexity, and data skew increases with failure rates.
a
[17]. Attempts to redistribute data for load balancing [4] AnchorHash, while near-ideal with O(1 + log( w )) lookup
introduce extra data migration and break minimal disruption. complexity, faces issues with a fixed upper bound on the
Moreover, the O(log(n)) complexity of Karger Ring for both cluster size and strict statefulness, preventing concurrent
update and lookup raises concerns about performance. updates.
3
2.2 Motivation
(a) Cluster NSArray Query k1 Query k2
Since the exsiting CH algorithms has their own problems, Node 0 Node State S of k1 S of k2
we proposed DxHash, a stateless, scalable and consistent Node 1 0 √ 𝒊𝒊 𝑺𝑺[𝒊𝒊] 𝒊𝒊 𝑺𝑺[𝒊𝒊]
1 √ 𝟎𝟎 1 𝟎𝟎 6
hash which meets the six requirements almost-perfectly.
2 √ 𝟏𝟏 4 𝟏𝟏 4
Table 1 shows the theoretical performance comparisons Node 2
3 √ 𝟐𝟐 5 𝟐𝟐 7
between DxHash and other CH algorithms. We can find 4 k
𝟑𝟑 2 k
𝟑𝟑 3
that all aspects of DxHash are as good as or better than 5 2
𝟒𝟒 … 2
𝟒𝟒 …
others, except that the lookup complexity is a bit worse than 6 result 1 result 3
Node 3
7
AnchorHash. However, the experimental results in Section
6.7 will show that the practical performance of DxHash is Insert Node 4
usually higher than AnchorHash due to a minor constant (b) Cluster NSArray Query k1 Query k2
term in DxHash’s lookup complexity. Besides, DxHash out- Node 0 Node State S of k1 S of k2
performs AnchorHash because: (1) DxHash supports the Node 1 0 √ 𝒊𝒊 𝑺𝑺[𝒊𝒊] 𝒊𝒊 𝑺𝑺[𝒊𝒊]
Remove Node 1
3 D X H ASH A LGORITHM
(c) Cluster NSArray Query k1 Query k2
DxHash utilizes an array, called NSArray, to represent the Node 0 Node State S of k1 S of k2
state of nodes in a cluster or network. The size of the array Node 1 0 √ 𝒊𝒊 𝑺𝑺[𝒊𝒊] 𝒊𝒊 𝑺𝑺[𝒊𝒊]
DxHash removes nodes by marking the corresponding Although setting this threshold affects balance and minimal
items in the NSArray as inactive, thus preparing them for disruption, we believe it is necessary to avoid the mentioned
future assignments. Figure 1c demonstrates the removal boundary cases. Moreover, even in scenarios where there
of node 1. The item 1 in the NSArray is set to inactive, is only one active node in a large cluster, the number of
causing a change in the mapping result of k1. Initially, k1 affected keys whose search length exceeds the threshold is
was mapped to node 1, but since node 1 is now inactive, k1 tiny. The probability of a key not matching the only active
is remapped to node 4. node after 8n searches is:
8n
n−1
3.3 Proof of Minimal Disruption and Balance. P= (1)
n
In this section, we provide a proof for the properties of
Minimal Disruption and Balance guaranteed by DxHash. When n is sufficiently large, P is approximately e18 , indi-
Theorem 1 (Minimal Disruption): DxHash ensures Min- cating that only 0.03% of keys are affected by the threshold.
imal Disruption. This calculation demonstrates that terminating the search
Proof: Let k be an arbitrary key and S be the generated whose length is larger than 8n has no noticeable impact in
node-ID sequence. We assume that the nth entry in S practice.
(S[n]) represents the original mapping result of k . To prove One another boundary case is that, the cluster requires
Minimal Disruption, we consider the cases of node removal to scale when the cluster is full of active nodes. Some CH
and addition: algorithms [10], [9] do not support Scale operations for two
(i) Removal: Suppose node b is being removed. If S[n] = reasons. First, scaling out a cluster may require a complete
b, it implies that k was originally mapped to the removed remapping of keys, which can be a resource-intensive task.
node b. The change in mapping does not violate Minimal Second, this situation can be avoided by setting a large
Disruption. If S[n] ̸= b, it means that k was not initially enough initial upper bound for the cluster. In contrast,
mapped to node b. Since ∀m < n, the state of node S[m] DxHash supports Scale operations to cater to broader ap-
remains inactive and is unaffected by the removal of a node, plications. In DxHash, the cluster size is limited by the
the key k continues to be mapped to node S[n]. size of the NSArray. When the cluster reaches its maximum
(ii) Addition: Let b be a newly added node. If ∃l < n such capacity and all items in the NSArray are active, DxHash
that b = S[l], we define the minimum value of l as lmin . As behaves as a classic hash algorithm that maps objects to
lmin < n, for ∀m < lmin , node S[m] is inactive. Conse- nodes with a single calculation. To scale out the cluster,
quently, k will be remapped to node S[lmin ] = b because it DxHash doubles the size of the NSArray and sets the new
becomes the first active node in S . If ∀l < n, b ̸= S[l], the items to inactive. This doubling of the range in the classic
addition of node b does not impact the mapping of k , and hash algorithm results in only half of the loads needing to
therefore, the mapping of k remains unchanged. be migrated. As a result, DxHash reduces the remapping
Thus, we can see that the changed node is either the orig- effort by half. It’s important to note that remapping half
inal or the destination of the remapped keys. Consequently, the loads can still be a significant task. Therefore, the Scale
DxHash achieves Minimal Disruption. operation is suitable for inserting active nodes in batches to
Theorem 2 (Balance): In DxHash, there is an equal prob- amortize the overhead. For scenarios involving only a few
ability for a key to be mapped to each active node. node updates, it is recommended to initialize the NSArray
Proof: The process of locating a key across nodes in- with a sufficiently large size.
volves repeated calculations to (pseudo-)randomly generate Figure 2 illustrates an example of a Scale operation.
node IDs. DxHash terminates the calculations only when a Initially, there are 3 active nodes, and the length of the
generated node ID corresponds to an active node. At the NSArray is four. After inserting a new node, the NSArray
ith round, the probability distribution of S[i] among all becomes full, and inserting another node would require
active nodes is uniform due to the randomness of the PRNG. a complete remapping. To avoid this, the length of the
Since the probability distribution at each calculation round NSArray is expanded to 8. Items 1-4 in the array are active,
is uniform, the overall distribution is also uniform. Thus, while the remaining items are marked as inactive, ready for
the Balance property is proven. subsequent node insertions.
NSArray is implemented as a boolean array, with each item The time complexity of the Lookup function significantly
occupying only 1 bit. This bit represents the state of the impacts DxHash’s performance. Let n denote the length of
corresponding node, indicating whether it is active (1) or the NSArray and a denote the number of active nodes. We
inactive (0). On the other hand, DxHash employs a 4-byte use p to represent the fraction na , which corresponds to the
(32-bit) integer queue called IQueue to store inactive node active ratio in the NSArray. In each iteration of the Lookup
IDs. The functionality of IQueue will be further discussed in function (lines 3-6), the probability of hitting an active node
§4.2. is p, while the probability of hitting an inactive node is
Native DxHash supports four essential functions: (1 − p). The distribution of hitting an active node at each
Lookup, AddNode, RmNode, and Init. Algorithm 1 outlines the iteration follows the Bernoulli Distribution, and the number
functions. of iterations (i.e., search length) follows the Geometric Dis-
tribution [12]. Denoting the search length as τ , the expected
Algorithm 1: Native DxHash value of τ is:
/* This function receives a given key 1
and output the corresponding node. E(τ ) = (2)
p
*/
1 Function Lookup(k ): Substituting p = na back into Formula 2, we find:
Result: A working node ID (nID ) Theorem 3 (Query Complexity): In DxHash, given the size
2 r ← k; of the NSArray, n, and the number of active nodes, a, the
3 repeat Average Search Length (ASL) for keys is na .
4 r ← R(r);
5 nID ← r mod |NSArray|; 4.2 AddNode
6 until NSArray[nID ] = 1;
7 return nID ; The AddNode function is responsible for node insertions in
DxHash. When a new node joins the cluster, DxHash assigns
/* This function finds an inactive node an inactive ID to the node and adjusts the corresponding
ID and rests the ID to active. */ item in the NSArray. The key issue in this process is obtain-
8 Function AddN ode(): ing an inactive ID efficiently. Instead of performing a linear
Result: An inactive node ID (nID )
search on the NSArray, which has a time complexity of
9 nID ← IQueue.pop();
O(n), DxHash introduces a new data structure called IQueue
10 NSArray[nID ] ← 1;
to expedite insertions. IQueue is a 4-byte (32-bit) integer
11 return nID ;
queue that stores all inactive node IDs for fast insertions.
/* This function receives an active In the pseudocode of the native DxHash (Alg. 1), lines 9-11
node ID to remove it. */ demonstrate how DxHash obtains an inactive node ID from
12 Function RmN ode(nID ): the IQueue in constant time complexity (O(1)).
Result: Void
13 NSArray[nID ] ← 0 ;
14 IQueue.push(nID ) ; 4.3 RmNode
15 return ; The RmNode function handles node removals. When a node
/* This function initializes the IQueue is removed, DxHash updates the data structures, namely
by a given NSArray. the NSArray and IQueue. In Alg. 1, the RmNode function
*/
16 Function Init(NSArray): receives an active node ID as input. First, the corresponding
Result: Void item in the NSArray is set to 0 to indicate that the node is
17 IQueue ← ∅; inactive. Then, the node ID is added to the IQueue for future
18 for item ∈ NSArray do assignments. The time complexity of the RmNode function is
19 if item = 0 then also O(1).
20 IQueue.push(item);
4.4 Scale
21 return;
DxHash supports the scale and shrink operations to adjust
the upper bound of the cluster size. When the NSArray is
full and new nodes are ready to join, the Scale operation is
triggered. Alg. 2 presents the pseudocode for the Scale func-
4.1 Lookup
tion. In this operation, the size of the NSArray is doubled
Lookup is the core function of DxHash, responsible for map- (lines 2-3), and the new node IDs are set to be inactive (lines
ping a key to its corresponding node ID. DxHash utilizes a 4-5).
PRNG R(x) to generate pseudo-random numbers cyclically
(lines 3-6), with the key serving as the random seed (line
2). By performing modulo operation with the size of the 4.5 Shrink
NSArray, a random node ID is obtained (line 5). The loop The Shrink function is used when there are too many inac-
continues until an active node is encountered (line 6), at tive nodes in the cluster. First, DxHash counts the number
|N SArray|
which point the loop terminates, and the node ID is returned of active nodes with IDs greater than 2 (lines 8-12,
as the lookup result (line 7). Alg. 2). Then, the size of the NSArray is halved (line 13), and
6
the IQueue is rebuilt based on the updated NSArray (line NSArray items pseudo-randomly and repeatedly until an
14). Finally, DxHash reassigns the same number of nodes inactive item is found. The pseudocode for this approach is
counted in the first step (lines 15-16). shown in Alg. 3. In lines 2-6, we reuse the code from the
When scaling out or shrinking the cluster, DxHash oper- Lookup procedure with two modifications. First, we change
ates on the NSArray and IQueue, involving a maximum of the random seed to use a constant instead of a given key for
n nodes. The time complexity of both the Scale and Shrink reproducibility, which is 1228 shown as line 2. Second, we
operations is O(n). However, the number of remapped keys terminate the loop when an inactive node is selected instead
caused by the Shrink operation is relatively greater than that of an active one, as we are looking for an inactive node ID
caused by the Scale operation. This is because the Shrink (line 6). Similar to the calculation in Formula 2, the time
operation includes additional node deletions and insertions. complexity of this insertion approach can be estimated as
n
The Shrink operation is triggered only when the active ratio O( n−a ).
is very small (e.g., 1%).
Algorithm 3: AddNode 2
Algorithm 2: Scale adjustment
/* This function finds an inactive node
1 Function Scale_out(): ID and resets the ID to active. */
Result: The cluster size after adjustment 1 Function AddN ode():
2 n ← |NSArray|; Result: An inactive node ID (nID )
3 NSArray is resized to 2n; 2 r ← 1228;
4 for i ∈ [n, 2n) do 3 repeat
5 RmNode(i); 4 r ← R(r);
6 return 2n ; 5 nID ← r mod |NSArray|;
6 until NSArray[nID ] = 0;
7 Function Shink ():
Result: The cluster size after adjustment 7 NSArray[nID ] ← 1;
8 n ← |NSArray|; 8 return nID ;
9 count ← 0;
10 for i ∈ [ n2 , n) do This new insertion approach strikes a balance between
11 if NSArray[i] = 1 then space footprint, statelessness, and update efficiency. Firstly,
12 count ← count + 1; since no additional data structure is required, the memory
NSArray is resized to n footprint of DxHash is solely determined by the size of the
13
2;
14 Init(NSArray); NSArray, which is n8 bytes or 125 KB per million nodes.
15 for i ∈ [0, count) do This memory footprint is only 0.8% of that of AnchorHash
16 AddNode(); (16n bytes). Secondly, the new design achieves stronger
n
statelessness, as the insertion order is unaffected by the
17 return 2 ; history of node removals. The shortcoming is the higher
n
insertion complexity, which is O( n−a ) now. Although the
insertion efficiency decreases, it is still superior to most CH
algorithms. The size of the NSArray n is twice the number of
4.6 Optional Trade-off active nodes a initially. When there are only a few additions
n
4.6.1 New method for insertion: minimal memory footprint or removals, the time complexity of O( n−a ) remains rela-
but slower insertions. tively constant and independent of the absolute cluster size.
In the worst-case scenario where only one inactive node
DxHash introduces the IQueue data structure to accelerate
exists (n − a = 1), the time complexity is O(n), which is no
insertions, but this comes at the cost of storage efficiency
worse than the method of linear search. It is worth noting
and statelessness. The IQueue can vary in length from 0 to
that the cases where no inactive or active nodes exist are
n, and each item in the queue requires 4 bytes of storage. not considered in this section, as the boundary cases are
On average, the expected memory footprint of the IQueue
discussed separately in §3.4. In summary, DxHash without
is 2n bytes. In comparison, the size of the NSArray is
IQueue is a stateless CH algorithm with a small memory
much smaller, only n8 bytes. Additionally, the use of IQueue
footprint and acceptable update overhead.
for insertions introduces a dependency on the removal or-
der, which compromises the statelessness property. The se-
quence of node IDs in the IQueue is determined by the order 4.6.2 NSArray organized in Bytes: faster lookup but larger
of removals. Consequently, the order of insertions becomes memory.
deterministic and stateful, which goes against the desired The native implementation of DxHash uses 1 bit for each
statelessness characteristic of the DxHash algorithm. To ad- node to significantly reduce the memory footprint. How-
dress this, we propose an alternative insertion approach for ever, since memory devices are typically byte-addressed,
DxHash to achieve less memory and stronger statelessness performing operations on bytes is much faster than on
without the need for an additional data structure. individual bits. Therefore, when high lookup performance
The key to inserting a node is to obtain an inactive is desired, the NSArray in DxHash can be organized as a
node ID for assignment. The new insertion approach is byte array, with each node’s state represented by a single
inspired by the Lookup procedure in DxHash, which accesses byte.
7
Compared to the native implementation of DxHash, this node IDs are always smaller than H[i]. However, at i = 4,
solution sacrifices some storage efficiency in exchange for the weight of node S[4] = 3 is 1, which is larger than
improved lookup performance. However, the memory foot- H[4] = 0.9. Therefore, node 3 becomes the final mapping
print is still lower than that of AnchorHash. As the NSArray result for the key k .
becomes a byte array, its memory footprint is equal to n The main idea behind Weighted DxHash is to influence
bytes. Taking into account the memory usage of IQueue, the probability of a key being mapped to a node based
which is 4a at most, the total memory requirement for on the node’s weight. In step 1 of Weighted DxHash, keys
this approach is 5n bytes, resulting in a 69% reduction in pointing to node S[i] with weight W have a probability of
memory compared to AnchorHash (16n bytes). W to be accepted by that node in step 2. As the weight
decreases, the node becomes less likely to accept keys,
resulting in a reduced load. When the weight is 0, the node
5 W EIGHTED D X H ASH rejects all keys, effectively having no load. If weights are
Weighted DxHash is introduced to address the issue of load only set to 0 and 1, Weighted DxHash is downgraded to
distribution in clusters or networks consisting of hetero- native DxHash.
geneous physical nodes. Conventional consistent hashing The time complexity of the Lookup operation in Weighted
schemes use virtual nodes to adjust load distribution, where DxHash can be analyzed as follows. The probability for a
P n−1
multiple virtual nodes point to the same physical node, W
x
key to hit a node in round i is x=0 n , where n is the
effectively multiplying the load on that node. However, size of the weighted NSArray and Wx is the weight of
virtual nodes lead to increased memory footprint and can- node x. There are two calculations performed at each round,
not accurately distribute the load. In response, we propose one for S and another for H . The number of calculations
Weighted DxHash. required to query a key, denoted as τ , follows the Geometric
Distribution, and its expectation is given by:
Cluster Weighted NSArray Query k
Node Weight
2n
0 1
𝒊𝒊 = S[i] 𝑾𝑾𝑾𝑾𝑾𝑾𝑾𝑾𝑾𝑾𝑾𝑾 H[i] 𝑾𝑾 ≥ 𝑯𝑯? E(τ ) = Pa−1 (3)
0
1 1 𝟏𝟏 6 W[6] =0 0.4 × i=0 Wi
1 W=1
W=1 2 0.7 𝟐𝟐 2 W[2] =0.7 0.8 ×
The expectation of the load on node b, denoted as lb , can
3 1 𝟑𝟑 7 W[7] =0 0.3 ×
2
4 0 be calculated using the following formula:
W=0.7 𝟒𝟒 3 W[3] =1 0.9 √
5 1
3 5 6 0 𝟓𝟓 … … Wb
W=1 W=1 7 0 Result 3
E(lb ) = Pn−1 ∗L (4)
i=0 Wi
Fig. 3: An example of querying a key in a 5-node weighted where L represents the total load across all nodes.
cluster via weighted DxHash. The mapping result of the key In terms of space complexity, the weighted NSArray in
is node 3. Weighted DxHash is organized as a 4-Byte (32-bit) floating-
point array. The rest of the implementation remains the
Weighted DxHash extends the native DxHash by intro- same. Therefore, the expected memory footprint is 8n Bytes,
ducing node weights and another PRNG called H . Each which includes both the 4-Byte NSArray and the 4-Byte
node is assigned a random floating-point weight between IQueue.
0 and 1. Nodes with higher weights can handle more loads, Weighted DxHash overcomes the limitations of virtual
while the weight of an inactive node is set to 0. The PRNG nodes and provides flexibility in load distribution. By com-
H is used to generate a pseudo-random floating-point se- bining virtual nodes with node weights, the load distri-
quence within the range of [0, 1], with H[i] representing the bution can be adjusted according to the performance of
ith item in the sequence. individual nodes. Nodes with lower performance can be
Figure 3 illustrates an example of querying a key using assigned smaller weights to reduce their load, while high-
Weighted DxHash. The cluster shown in the figure consists performance nodes can have multiple virtual nodes to fully
of 5 nodes, with node 2 having a weight of 0.7 and all other utilize their capabilities.
nodes having a weight of 1. The cluster is represented as a
weighted NSArray of length 8. The weights of the inactive
items in the array (nodes 4, 6, 7) are set to 0. The lookup 6 E VALUATION
process in Weighted DxHash is completed in two steps. In this section, we compare and evaluate the performance
In step 1, similar to DxHash, Weighted DxHash gener- of different consistent hashing (CH) algorithms, including
ates a random node ID S[i] at each calculation cycle. In the Karger Ring (Ring), MaglevHash (Maglev), AnchorHash (AH),
right part of Figure 3, the random node IDs generated for and DxHash (DH). We also consider different trade-offs in
the key k are 6, 2, 7, 3, ... in sequence. the implementation of DxHash, such as using a 1-bit NSAr-
In step 2, Weighted DxHash generates H[i] and com- ray or a 1-Byte NSArray, and using IQueue or not. These
pares it with the weight of node S[i]. If the weight is different implementations are denoted as DH-b, DH-B, and
no less than H[i], Weighted DxHash terminates the loop DH-IQ, respectively. Since deploying a large-scale cluster
and returns the current node ID as the mapping result. for testing is challenging, we evaluate the CH algorithms
Otherwise, Weighted DxHash proceeds to the next cycle through local simulations.
for further searching. From the last column in Figure 3, we To evaluate the performance of DxHash, we initialize an
observe that in the first 3 cycles, the weights of the generated NSArray with 1 million entries, representing the states of 1
8
million mock nodes (active or inactive). We then generate footprint of 400n Bytes. AH has a memory footprint of
batches of 32-bit integers as keys and feed them to DxHash. 16n Bytes, while DH’s memory footprint varies depending
The algorithm returns the corresponding mock node IDs. on the version. We implements three versions of DH. DH-
We measure the lookup performance as the rate of suc- b, which uses a bit array as NSArray, requires n8 Bytes
cessful key queries, the memory footprint as the amount of memory. DH-B, which uses a byte array as NSArray,
of memory used by the process, and the update overhead occupies n Bytes. DH-B&IQ, based on DH-B, utilizes a 32-
as the time consumed to adjust the data structures when bit IQueue to store inactive node IDs, resulting in a memory
inserting new nodes. Similar evaluation methods are used footprint of up to 5n Bytes. Theoretical analysis confirms
for the other CH algorithms. that DH has a smaller memory footprint compared to other
CH algorithms.
6.1 Environment To validate the theoretical analysis, we collected the
memory footprints of the CH algorithms for different num-
As shown in Table 2, all experiments are performed on the
bers of nodes and present the results in Figure 4. The x-
same commercial machine with the processor of Intel Xeon
axis represents the number of nodes ranging from 10K to
CPU E5-2620 at 2.00GHz and 32 GB memory. The system is
100M. The experimental results align with our theoretical
CentOS 7.8. The kernel version is 3.10.0-1127, and the GCC
derivation. In a 100-million-node cluster, AH requires 2 GB
version is 7.3.1. All algorithms are implemented in C++.
of memory, DH-B&IQ occupies 500 MB, DH-B requires 100
The PRNG to generate S is implemented as a hardware-
MB, and DH-b only uses 17 MB. DH-b is the most memory-
supported CRC32 [14] hash function, which a a uniform
saving implementations, reducing the memory footprint by
hash function for 32-bit integers with high randomness and
98.4% compared to AH. DH-B&IQ uses most memory in
speed in number generation.
the three DH’s implementations, still with a reduction on
TABLE 2: Environment Configuration memory footprint of 75.2%.
Ring
1000 case, all nodes are active, resulting in an active ratio (a/n)
489.5 Maglev
AHof 1. Figures 5(b) and 5(c) demonstrate how the active ratio
[Log-Scale]
100 DH-B&IQ
132.0 affects the lookup performance of AH and DH (including
DH-B
10 DH-b and DH-B). The active ratio is varied between 1, 0.9,
DH-b
17.4
0.5, and 0.1. Figure 5b corresponds to a cluster with 1 million
1
10K 100K 1M 10M 100M
nodes, while Figure 5c represents a cluster with 10 million
# Node nodes. From the figures, we can make several observations:
① DH-B outperforms all other CH algorithms in terms of
Fig. 4: Memory Footprint of Ring, Maglev, AH, DH-B&IQ, lookup throughput. AH performs slightly worse than DH-B,
DH-B,
0.6 DH-b in the cluster whose size1 varies from 10K to while Maglev has better lookup performance than Ring but
Ideal Ring Maglev AH DH
100M.
0.5
Remapping Ratio
Ring
Maglev
is still inferior to DH-B and AH. This is because the lookup
0.4 0.1
[Log-Scale]
AH
0.3 DH
than the O(1) complexity of other CH algorithms.
0.2
6.2 Memory Footprint 0.01
② Maglev and AH exhibit satisfactory lookup perfor-
0.1
CH algorithms have varying memory footprints depending mance when the number of nodes is small. For example,
on0 their implementation. Ring and 0.001
100 200 300 400 500 600 700 800 900
Maglev allocate addi- when there are 1,000 nodes, Maglev achieves a lookup rate
100 200 300 400 500 600 700 800 900 1000
tional memory space for load balancing or to minimize #mi-
# Node Node of 41.5 MKPS, and AH achieves a lookup rate of 44.64
gration after node updates. Ring uses a Red-Black Tree (RB- MKPS, which is close to the lookup rate of DH-B. However,
Tree) implementation, with each node occupying 24 Bytes their lookup performance decreases as the number of nodes
of memory. In our experiments, we create 100 virtual nodes increases. When there are 10 million nodes, Maglev’s lookup
for each physical node in Ring to achieve load balance. rate drops to only 5.73 MKPS, and AH’s lookup rate drops
Therefore, Ring’s memory footprint is approximately 2.4n to 11.92 MKPS. This rapid decline is due to their excessive
KB. Maglev utilizes a large lookup table for key routing, memory footprint, which becomes a bottleneck for quick
with each entry requiring 4 Bytes of memory. We allocate querying as the number of nodes increases. In contrast,
100 times the minimum required memory for Maglev to DH-B maintains a lookup rate of 31.06 MKPS even with
ensure less than 1% imbalance [3], resulting in a memory a 10-million-node cluster, which is 2.6 times higher than
9
45.87
(b)
45.45
(a) (c)
43.35
Lookup Throughput /
50
35.34
35.34
33.22
31.06
31.06
28.33
40
30.4
26.46
19.34
MKPS
17.36
30
19.7
14.71
14.08
44.64
15.5
11.99
11.92
43.48
11.92
41.49
10.28
38.61
20
33.22
9.57
34.8
8.06
7.56
6.55
5.41
5.01
5.73
3.19
2.71
17.59
12.25
2.35
1.33
0.88
10
0.4
0.4
0
1K 10K 100K 1M 10M 1 0.9 0.5 0.1 1 0.9 0.5 0.1
# Node (a/n = 1) a/n (# Node = 1M) a/n (# Node = 10M)
Fig. 5: The lookup rate when handling 100 million queries. (a) Lookup comparison of Maglev, Ring, AH and DH-B in
clusters with 1K, 10K, 100K, 1M and 10M nodes. The active ratio of AH and DH-B (a/n) is 1. (b) Lookup comparison of
AH, DH-B, and DH-b when the active ratio (a/n) varies from 1 to 0.1. The cluster size is 1 million. (c) Lookup comparison
of AH, DH-B, and DH-b when the cluster size is 10 million.
1000000
[Log-Scale]
454.6
244.5
1.05E+08
201.5
10000
136.8
4.84E+06
245
80
1.46E+05
25
9.35E+03
100 8.3
2.7 4.8
1.1 1.7
0.12
0.12
0.11
0.11
1
0.02
0.02
0.02
0.02
0.01
1K 10K 100K 1M 0.1 0.5 0.9 0.99
# Node a/n (# Node = 1M)
Fig. 6: Insertion overhead comparison. (a) Insertion latency of Maglev, Ring, AH and DH-IQ in clusters with 1K, 10K, 100K,
and 1M nodes. (b) Insertion latency of DH-B and DH-b when the active ratio (a/n) varies from 0.1 to 0.99. The cluster size
is 1 million.
AH and 5.4 times higher than Maglev. This high lookup 6.4 Update overhead
throughput is due to DH’s nearly constant complexity and
its tiny memory footprint. Update overhead is another critical metric for evaluating
② Figure 5b shows the lookup performance of AH, DH- CH algorithms. Figure 6 illustrates the time required for
B, and DH-b in a 1-million-node cluster with different active inserting nodes into different CH algorithms. DH is imple-
ratios. As the active ratio decreases, the lookup throughput mented in three versions based on optional trade-offs. DH-
of all three algorithms decreases. However, AH experiences IQ uses the IQueue for fast insertions, while DH-B and DH-b
a slower drop in performance compared to DH-B. When employ a slower method that accesses NSArray repeatedly
the active ratio is less than 0.5, DH-B performs worse than and randomly to select an inactive node ID for assignment.
AH. This is because the lookup complexity of DH has a Although the latter method is slower, it reduces memory
linear correlation with the active ratio (O(a/n)), while AH footprint significantly. The efficiency of this method is re-
has a logarithmic correlation with the active ratio (O((1 + lated to the active ratio, where smaller active ratios result
ln( na ))2 )). in faster insertions. The difference between DH-B and DH-
⑦ From Figure 5c, we observe that although AH has bet- b lies in the data structure of NSArray, with DH-B using a
ter lookup complexity, it performs much worse than DH-B byte array and DH-b using a bit array.
in a 10-million-node cluster. The lookup throughput of DH- In Figure 6(a), we observe the influence of cluster scale
B is 2 − 2.7× higher than that of AH. This can be attributed on the update cost of CH algorithms. The x-axis represents
to DH’s smaller memory footprint, which allows it to be the node number, ranging from 1K to 1M, while the y-axis
stored in the CPU cache for faster access. Additionally, DH represents the time taken to adjust the CH algorithm after
has a simpler lookup method that requires fewer memory inserting a new node. The results are averaged over 100
accesses compared to AH, which is discussed in Section trials. Figure 6(b) displays the update cost of DH-B and
6.7. The smaller and simpler data structure in DH incurs DH-b for different active ratios. We make the following
lower memory access costs compared to the larger and more observations from the figures:
complex data structure in AH. ① Maglev exhibits the longest update time, taking more
Overall, DH-B demonstrates comparable lookup than 100 seconds to insert a node into a 1-million-node
throughput with AH and outperforms other CH algorithms, cluster. This high overhead is due to its high complexity.
particularly in large-scale clusters. Its high performance Each node insertion requires O(m log(m)) time, where m
can be attributed to its nearly constant complexity, minimal is the size of the large lookup table, which is 100× greater
memory footprint, and efficient lookup method. than the number of nodes n.
10
② Ring’s update time grows slowly as the cluster ex- a 100-node cluster until the number of active nodes reaches
pands. When the number of nodes increases from one thou- 1000. The cluster size ranges from 100 to 1000 in increments
sand to one million, the update time changes from 137 us to of 100. As shown in Figure 7b, the ideal remapping ratio
455 us. The update overhead of Ring is moderate compared is calculated by dividing the number of updated nodes
to other schemes, as its update complexity is O(log(n)), by the total number of nodes. The ideal remapping ratios
which is far less than that of Maglev. after each insertion are 100/200 = 0.5, 100/300 = 0.33,
③ AH and DH-IQ exhibit very short update times, each 100/400 = 0.25, and so on, represented by the bars in the
less than 1 us. With a constant update complexity, they can figure. After each insertion, we provide duplicate sets of 100
update node states on a nanosecond scale, regardless of the million keys as input to the CH algorithms and calculate
cluster scale. the corresponding node IDs. We compare the current results
④ Compared to DH-IQ, DH-B and DH-b have higher with the previous insertion’s results, count the number of
insertion overhead. As the active ratio increases from 0.1 remapped keys, and divide it by 10 million to obtain the real
to 0.99, the update time of DH-b increases from 2.7 to 245 remapping ratio. The real remapping ratios of the different
us, while that of DH-B increases from 1.1 to 80 us. DH-B CH algorithms are represented by the dashed lines in Figure
performs better than DH-b because the byte array has higher 7b. Comparing the real remapping ratios to the ideal remap-
efficiency for updates. However, both schemes perform ping ratio, we observe that all schemes, except for Maglev,
worse than DH-IQ, indicating that the IQueue significantly exhibit remapping ratios close to the ideal value. Maglev
reduces insertion overhead. deviates slightly from the ideal remapping ratio, indicating
Overall, DH-IQ demonstrates the best update perfor- that it cannot achieve complete minimal disruption, which
mance among the DH variants, while AH also performs is consistent with previous research [3].
impressively. Maglev exhibits the highest update time due
to its complex update process, while Ring maintains a
moderate update overhead. 6.7 Fault Tolerance
In this section, we test the fault tolerance of AH and DH,
6.5 Balance as they have demonstrated comparable performance in the
previous experiments. DH and AH both require multiple
We compare the load balancing performance of Ring, Ma-
searches to return mapping results, and the search length
glev, AH, and DH. It is worth noting that we do not test DH
increases as the node active ratio decreases. We measure
in different versions because the mentioned optional trade-
their fault tolerance using the average search length (ASL).
offs do not affect the load distributions. The cluster consists
Initially, the number of nodes is set to 1000, and we gradu-
of 1000 active nodes. Ring’s load balancing depends on
ally remove 100 nodes until only 100 active nodes remain.
the design of virtual nodes. Hence, we evaluate Ring with
This procedure reduces the active ratio from 1 to 0.05. Figure
each physical node matching 1, 10, 100, and 1000 virtual
8a illustrates the ASL as a function of the active ratio. DH
nodes, respectively. Maglev’s load balancing is influenced
exhibits a larger ASL than AH due to its higher lookup
by the size of the lookup table, which we set as prime
complexity (O( na )) compared to AH’s complexity (O(1 +
numbers approximately 10 times, 100 times, and 1000 times
log( na )2 )) [9]. However, Section 6.3 demonstrated that the
larger than the node size. Since AH and DH not use over-
lookup throughput of DH is higher. This is because AH is a
provisioned memory for balance, they are test once as
stateful algorithm that incurs more memory access overhead
shown in the right of Figure 7a. We randomly generate 100
to maintain the update order of nodes. AH requires four
million integers as keys to query the corresponding node
memory accesses for each search, while DH only requires
IDs. The load on each node is measured by the number
one. To compare the memory access overhead, we present
of keys assigned to it. In this case, the average load on
AH’s Average Number of Memory Access (ANMA) for each
each node is 100M/1000 = 100K queries. We quantify load
lookup in Figure 8a, where AN M AAH = 4 × ASLAH . The
balance using the standard deviation (σ ). A smaller standard
results show that AH consistently has a larger ANMA than
deviation indicates better load balance in a CH algorithm.
DH until the active ratio drops below 7%. Therefore, when
The experimental results are shown in Figure 7a. The
the active ratio is above 7%, DH consistently outperforms
x-axis represents the number of virtual nodes per physical
AH in terms of lookup performance, despite its higher
node, which impacts the load balance of Ring and Maglev.
lookup complexity.
AH and DH are evaluated in a cluster with 1000 active
nodes. The y-axis represents the standard deviation of loads
in the different CH algorithms. To facilitate comparison, all 6.8 Elasticity
results are normalized to DH’s standard deviation. From
DH supports the operations of scaling and shrinking to
the figure, we can observe that all algorithms, except for
adjust the cluster size dynamically. In this section, we
Ring, achieve good load balance. Ring exhibits the worst
specifically evaluate the shrinking operation. We compare
load balance, with a standard deviation approximately 7.8×
three schemes: AH, DH-B without shrink, and DH-B with
higher than that of DH. These results align with previous
shrink. The shrinking operation is triggered when the active
research findings [7].
ratio falls below 0.1. We measure the lookup throughput
of the three schemes in a cluster with 1 million nodes and
6.6 Minimal Disruption active ratios of 0.1, 0.01, and 0.001. The experimental results
We evaluate the remapping ratio of Ring, Maglev, AH, and are shown in Figure 8b. From the figure, we make two
DH after node insertions. We gradually insert 100 nodes into observations.
1
Me
10K 100K 1M 10M 100M
Node Number
11
80 0.6 1
Ring DH Ideal
15 30
ASL to DH)
Ring Mag
15 Maglev AH 0.5 Ring30
Remapping Ratio
60 AH
AH DH
DH AH
AH DH
DH
65.2
Maglev
0.1
Std. Dev
0.4
[Log-Scale]
CV (σ/μ)
AH
(Normalized
10
10
40 0.3 DH 2020
ANMA
ANMA
ASL
ASL
0.2 0.01
29.6
20
5 10
7.8
7.5
5 0.1 10
1.2
1.2
1.0
1.0
0 0 0.001
100 200 300 400 500 600 700 800 100 200 300 400 500
0
0 10× 100× 1000× 0 900 0
# Virtual Nodes # Node # No
0 0.2 0.4 0.6 0.8
00
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(a) Load Balance (b) Remapping Ratio
Failure
Failure Ratio
Ratio Failure
Failure Ratio
Ratio
Fig. 7: (a) Load balance comparison when the number of virtual nodes per physical node varies from 10 to 1000. The Y-axis
is the standard deviation normalized to DH. (b) The remapping ratio after inserting (9 × 100) nodes into a 100-node cluster.
Bars represents the ideal remapping ratio, and the dash lines represents real remapping ratio of CH algorithms.
20 20
20 60
60 1.2
1.2 4
4
AH-ASL
AH-ASL AH
AH 1
1 n
n ASL
ASL
Throughput
46.5
AH-ASL
LookupThroughput
46.5
DH-ASL 0.07
0.07 ,, 14.64
Load
DH-ASL 14.64
NormalizedLoad
DH-ASL 50
50 DH-w/o shrink
DH-w/o shrink 1
1
15 AH-ANMA
AH-ANMA 0.93 , 14.64
AH-ANMA 15
15 3
DH-with
DH-with shrink
shrink 3
40
40 0.8
MKPS
0.8
// MKPS
Normalized
26.4
26.4
ASL
10 10
ASL
30 2
21.6
10 30 0.6 2
21.6
0.6
Lookup
20
20 0.4
5 5 0.4
7.56
5
7.56
1
5.13
4.35
5.13
3.19
4.35
10
3.19
10 0.2
0.26
0.03
0.2
0.26
0.03
0 0
0 00
0
0 0
0
11 0 0.50.5
0.5 00 1 0.9
0.1
0.1 0.99
0.01
0.01 0.999
0.001
0.001 n=0.1
n=0.1 n=0.5
n=0.5 n=0.9
n=0.9
Failure Failure
a/nratio
a/n (#Node
(#Node = = 1M)
= 1M)
a/n Ratio
a/n (#Node 1M)
(a) (b) (c)
Fig. 8: (a) The Average Search Length (ASL) and Average Number of Memory Accesses (ANMA) of querying keys. The
X-axis is the active ratio. (b) The lookup performance of AH, DH-B without shrinking, and DH-B with shrinking when the
failure ratio is 0.9, 0.99, and 0.999 in 1 million nodes. (c) The ASL (drawn as lines) and load per node (drawn as stacked
bars) in weighted DH. The cluster size is 1000, and the load is 100 million lookups. The weights of 512 nodes are 1, and the
weights of others are n. Here, n is set to 0.1, 0.5, and 0.9 respectively.
First, the scheme of DH without shrink performs poorly 6.9 Weighted DxHash
when the active ratio is low. Compared to AH, which main- In this section, we evaluate weighted DxHash and verify
tains a lookup rate of 4.35 MKPS even at an active ratio of whether the load distribution aligns with the theoretical
0.001, DH-B only achieves a lookup rate of 0.03 MKPS. This derivation. We construct a weighted NSArray consisting
confirms that DH has a higher lookup complexity than AH. of 1024 items, divided into two halves. The weights of
Second, the shrinking operation enhances the elasticity of one half are uniformly set to 1 (referred to as 1-nodes),
DH. When shrinking is triggered, DH dynamically adjusts while the weights of the other half range from 0.1 to 0.9
the upper bound of the cluster based on the number of in steps of 0.2 (referred to as n-nodes). We generate 10
active nodes. As a result, the active ratio increases, signif- million random keys as input for weighted DxHash. The
icantly improving the lookup throughput. loads on each node are measured by the number of keys
mapped to that node. Additionally, we record the average
search length (ASL) for all keys. Figure 8c displays the
normalized loads on the two parts and the average ASL. The
It is worth noting that while scaling and shrinking ASL and the load distribution align with Formula 3 and 4
operations improve lookup performance, they come at the within an error margin of 0.1%. This confirms that weighted
cost of remapping a large number of keys. The remapping DxHash effectively adjusts the loads on nodes based on
ratio depends on the ratio of the cluster size before and after their weights, demonstrating quantitative load distribution
scaling or shrinking. For example, if a cluster shrinks to 1% control.
of its original size, the remapping ratio is approximately
99%. If a cluster doubles its size, the remapping ratio is 50%.
Considering the significant remapping overhead, scaling 7 C ONCLUSIONS
and shrinking operations are best suited for inserting or This paper introduces DxHash, an efficient, scalable, and
removing active nodes in batches to amortize the remapping adaptable consistent hashing algorithm. We present the
cost. algorithm, its implementation, and provide a complexity
12
proof for DxHash. Building on naive DxHash, we propose [14] Ronak Singhal. Inside intel® core microarchitecture (nehalem). In
weighted DxHash. The evaluation of DxHash, compared 2008 IEEE Hot Chips 20 Symposium (HCS), pages 1–25. IEEE, 2008.
[15] I. Stoica, R. Morris, D. Liben-Nowell, D.R. Karger, M.F. Kaashoek,
to other existing CH algorithms, demonstrates its ability F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer
to maintain millions of nodes while delivering a high key lookup protocol for internet applications. IEEE/ACM Transactions
lookup rate, occupying minimal memory footprint, and on Networking, 11(1):17–32, 2003.
requiring minimal time for node additions or removals. [16] D.G. Thaler and C.V. Ravishankar. Using name-based mappings to
increase hit rates. IEEE/ACM Transactions on Networking, 6(1):1–14,
Weighted DxHash also achieve its design objectives. Finally, 1998.
the source code for DxHash and all associated tests are [17] Xiaoming Wang and Dmitri Loguinov. Load-balancing perfor-
available as open source 1 . mance of consistent hashing: Asymptotic analysis of random node
join. IEEE/ACM Transactions on Networking, 15(4):892–905, 2007.
[18] Chenggang Wu, Vikram Sreekanti, and Joseph M. Hellerstein.
Autoscaling tiered cloud storage in anna. Proc. VLDB Endow.,
R EFERENCES 12(6):624–638, February 2019.
[1] Elaine Barker, Elaine Barker, William Burr, William Polk, Miles
Smid, et al. Recommendation for key management: Part 1: General.
National Institute of Standards and Technology, Technology Ad-
ministration . . . , 2006.
[2] Chanwoo Chung, Jinhyung Koo, Junsu Im, Arvind, and Sungjin
Lee. Lightstore: Software-defined network-attached key-value
drives. In Proceedings of the Twenty-Fourth International Conference
on Architectural Support for Programming Languages and Operating
Systems, ASPLOS ’19, page 939–953, New York, NY, USA, 2019.
Association for Computing Machinery.
[3] Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith,
Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin
Cheyney, Wentao Shang, and Jinnah Dylan Hosein. Maglev: A
fast and reliable software network load balancer. In 13th USENIX
Symposium on Networked Systems Design and Implementation (NSDI
16), pages 523–535, Santa Clara, CA, March 2016. USENIX Associ-
ation.
[4] Xiang Fu, Can Peng, and Weihong Han. A consistent hashing
based data redistribution algorithm. In Xiaofei He, Xinbo Gao,
Yanning Zhang, Zhi-Hua Zhou, Zhi-Yong Liu, Baochuan Fu,
Fuyuan Hu, and Zhancheng Zhang, editors, Intelligence Science
and Big Data Engineering. Big Data and Machine Learning Techniques,
pages 559–566, Cham, 2015. Springer International Publishing.
[5] Pulkit Goel, Kumar Rishabh, and Vasudeva Varma. An alternate
load distribution scheme in dhts. In 2017 IEEE International
Conference on Cloud Computing Technology and Science (CloudCom),
pages 218–222, 2017.
[6] David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy,
Matthew Levine, and Daniel Lewin. Consistent hashing and
random trees: Distributed caching protocols for relieving hot spots
on the world wide web. In Proceedings of the Twenty-Ninth Annual
ACM Symposium on Theory of Computing, STOC ’97, page 654–663,
New York, NY, USA, 1997. Association for Computing Machinery.
[7] John Lamping and Eric Veach. A fast, minimal memory, consistent
hash algorithm, 2014.
[8] Zaoxing Liu, Zhihao Bai, Zhenming Liu, Xiaozhou Li, Changhoon
Kim, Vladimir Braverman, Xin Jin, and Ion Stoica. Distcache:
Provable load balancing for large-scale storage systems with dis-
tributed caching. In 17th USENIX Conference on File and Storage
Technologies (FAST 19), pages 143–157, Boston, MA, February 2019.
USENIX Association.
[9] Gal Mendelson, Shay Vargaftik, Katherine Barabash, Dean H.
Lorenz, Isaac Keslassy, and Ariel Orda. Anchorhash: A scalable
consistent hash. IEEE/ACM Transactions on Networking, 29(2):517–
528, 2021.
[10] Yuichi Nakatani. Structured allocation-based consistent hashing
with improved balancing for cloud infrastructure. IEEE Transac-
tions on Parallel and Distributed Systems, 32(9):2248–2261, 2021.
[11] Vladimir Olteanu, Alexandru Agache, Andrei Voinescu, and
Costin Raiciu. Stateless datacenter load-balancing with beamer.
In 15th USENIX Symposium on Networked Systems Design and Im-
plementation (NSDI 18), pages 125–139, Renton, WA, April 2018.
USENIX Association.
[12] Andreas N Philippou, Costas Georghiou, and George N Philippou.
A generalized geometric distribution and some of its properties.
Statistics & Probability Letters, 1(4):171–175, 1983.
[13] Jiwu Shu, Youmin Chen, Qing Wang, Bohong Zhu, Junru Li, and
Youyou Lu. Th-dpms: Design and implementation of an rdma-
enabled distributed persistent memory storage system. ACM
Trans. Storage, 16(4), October 2020.