Hopscotch
Hopscotch
Hopscotch hashing
Table of contents : Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion
Hopscotch hashing
Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion
Plan
Hopscotch hashing
In a few words
Maurice Herlihy, Brown University, Providence, RI Nir Shavit, Sun Microsystems, Burlington, MA Moran Tzafrir, Tel-Aviv University, Tel-Aviv, Israel Presented at the DISC08 (DIStributed Computing 2008 at Arcachon, France). First linearly scalable concurrent hash map, efcient both in single and multiple thread applications.
Hopscotch hashing
Hash map?
Data structure, capable of associating a key to a value, thus implementing an associative array Supports four operations :
get(key) : retrieve the data associated to the key contains(key) : test if a key is present in the hash map insert(key, value) : insert a <key,value> pair remove(key) : remove the <key,value> pair
Hopscotch hashing
Main goals?
The immense majority of calls are contains(key) & get(key) Ability to be able to fetch a value from a key as fast as possible Several attack angle possible :
Use an appropriate hash function for a particular dataset Use better algorithms Leverage hardware specicity Use multithreading
Hopscotch hashing
Hopscotch hashing
Hardware prerequisite
What is a cache line, and why does it matter? The minimum amount of data transferable from main memory to the cache (64 bytes on my Intel Core i7) False sharing problem (on multicore/multiprocessor CPU)
Chained Hashing
Bucket key value 0 0x0000 1 0x0004 2 0x0008 value 3 0x000C index=hash(key); 4 0x0010 5 0x0014 6 0x0018 value value value value value value value value value value value value value value value value
value
Extensible Not cache friendly : closed addressing Trivial to implement Pointer space overhead Need to allocate memory on insertion (or to use a pool, . . . )
Hopscotch hashing
Linear probing
1 - Value hashed for this bucket 2 - Linear probing K V K K K K K K K K K K K K V V V V V V V V V V V V Already occupied Bucket for one item 3 - Actual insertion point
Cache friendly : closed addressing Inefcient when the table is rather full (over 75%, most of the implementation reallocate and rehash the table)
Hopscotch hashing
10
Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion
Plan
Hopscotch hashing
11
Hopscotch hashing
12
Hopscotch hashing
13
Hopscotch hashing
14
Example
Hopscotch hashing
15
Hopscotch hashing
16
Hopscotch hashing
17
Hopscotch hashing
18
Linearization points
add(pair) and remove(key) use locks, and are deadlock-free, but not livelock-free, contains(key) is obstruction-free add(pair) : linearized when nding the key when it exists, when adding the bucket to the list of buckets in the linked list (updating the pointers) remove(key) : linearized when failing at nding the key (if the key does in fact no exist), or when the keys table entry is overwritten contains(key) : linearized when it nds the key, or when it reaches the end of the list (on an unsuccessful contains(key)); and the timestamp is unchanged.
Hopscotch hashing
19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
bool contains(KeyType key) { int hash = hash(key); Segment segment = get_segment(hash); Bucket current_bucket = get_bucket(segment, hash); int start_timestamp; do { start_timestamp = segment.timestamp[hash]; short next_delta = current_bucket.first_delta; while (NULL != next_delta) { current_bucket.first_delta += next_delta; if (key == current_bucket.key) { return true; } next_delta = current_bucket.next_delta } } while (start_timestamp != bucket.timestamp[hash]) return false; }
Hopscotch hashing
20
Performances analysis
Most important property : expected constant time performance In the common case, there is very few items in the buckets, which is : ( is the density, <= 0) Number of items in a bucket = 1 + e2 1 2 4
add(pair), remove(key), contains(key) complete in O(1) resize() completes in O(n) (n being the number of elements in the hash map)
Hopscotch hashing
21
Hopscotch hashing
22
Hopscotch hashing
23
A beginning of an explanation?
Hopscotch hashing
24
Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion
Plan
Hopscotch hashing
25
Conclusion
Always have hardware consideration when designing a data structure The rather low concurrency guarantees can lead to very good performances Simple is not always better A 300 line data structure implementation can bring a tremendous speedup to a program
Hopscotch hashing
26
Questions?
Slides available at http://paul.cx/public/hopscotch.pdf
Hopscotch hashing
27
References
The paper itself : http: //www.springerlink.com/content/u710121187m65436/ Obstruction-free introduction paper : Herlihy, M.; Luchangco, V.; Moir, M.,, Distributed Computing Systems, 2003, Conference from Intel : http://www.gdcvault.com/play/1014645/ -SPONSORED-Hotspots-FLOPS-and Wikipedia pages : CPU cache, Locality of reference, Open addressing, Hash table. Master thesis : Sae-eung, Suntorn, "Analysis of False Cache Line Sharing Effects on Multicore CPUs" (2010). Masters Projects. Paper 2. http://scholarworks.sjsu.edu/etd_projects/2 The courses book
Hopscotch hashing
28