0% found this document useful (0 votes)

317 views28 pages

Hopscotch

This document describes Hopscotch hashing, a scalable and concurrent hash map implementation that can linearly scale up to 64 cores. It presents a simplified algorithm and discusses the real-world implementation, which uses pointers and linked lists instead of in-place displacement. Concurrency is handled through fine-grained locks on buckets and timestamps to handle reads and writes. Performance analysis shows Hopscotch hashing provides constant-time performance for common operations like contains and add due to low item density in buckets.

Uploaded by

Moorthi Velu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

317 views28 pages

Hopscotch

Uploaded by

Moorthi Velu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Hopscotch hashing, a scalable, concurrent, resizable hash map implementation

or, how to write a linearly scalable hash map up to 64 cores

Paul ADENOT, <[email protected]>

KTH, CSC Department

November 18, 2011

Hopscotch hashing

Table of contents : Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion

Hopscotch hashing

Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion

Plan

Hopscotch hashing

In a few words
Maurice Herlihy, Brown University, Providence, RI Nir Shavit, Sun Microsystems, Burlington, MA Moran Tzafrir, Tel-Aviv University, Tel-Aviv, Israel Presented at the DISC08 (DIStributed Computing 2008 at Arcachon, France). First linearly scalable concurrent hash map, efcient both in single and multiple thread applications.

Hopscotch hashing

Hash map?
Data structure, capable of associating a key to a value, thus implementing an associative array Supports four operations :
get(key) : retrieve the data associated to the key contains(key) : test if a key is present in the hash map insert(key, value) : insert a <key,value> pair remove(key) : remove the <key,value> pair

Widely used in software you use every day :

Resource caching (key : URL, value : resource) Object implementation (Perl, Python, Javascript, Ruby, etc.) CSS rule matching in browsers rendering engine Database indexing

Hopscotch hashing

Main goals?
The immense majority of calls are contains(key) & get(key) Ability to be able to fetch a value from a key as fast as possible Several attack angle possible :
Use an appropriate hash function for a particular dataset Use better algorithms Leverage hardware specicity Use multithreading

Hopscotch hashing

State of the art

At the moment of the publication of the paper, several implementation families exist : Chained Hashing Linear probing Cuckoo hashing ... Hopscotch hashing is a combination of these techniques, and avoids the limitations of these algorithms.

Hopscotch hashing

Hardware prerequisite
What is a cache line, and why does it matter? The minimum amount of data transferable from main memory to the cache (64 bytes on my Intel Core i7) False sharing problem (on multicore/multiprocessor CPU)

Cache access speed

Register : 1 cycle L1 cache : 4 cycles L2 cache : 10 cycles L3 cache : 40-75 cycles Memory : 60-100ns Disk : 4ms Network : dozens of milliseconds
Hopscotch hashing

Chained Hashing
Bucket key value 0 0x0000 1 0x0004 2 0x0008 value 3 0x000C index=hash(key); 4 0x0010 5 0x0014 6 0x0018 value value value value value value value value value value value value value value value value

value

Extensible Not cache friendly : closed addressing Trivial to implement Pointer space overhead Need to allocate memory on insertion (or to use a pool, . . . )
Hopscotch hashing

Linear probing

1 - Value hashed for this bucket 2 - Linear probing K V K K K K K K K K K K K K V V V V V V V V V V V V Already occupied Bucket for one item 3 - Actual insertion point

Cache friendly : closed addressing Inefcient when the table is rather full (over 75%, most of the implementation reallocate and rehash the table)

Hopscotch hashing

Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion

Plan

Hopscotch hashing

Main characteristics of Hopscotch hash map

Concurrent, and highly scalable Cache friendly (unlike open addressing) Good behavior when the hash table is full Resizable

Hopscotch hashing

General idea (not the real-world implementation) (1)

Closed addressing, like the linear probing Item are hashed into a location using a single hash function Entries have a bit table of width H (H is equal to the size of a machine word) When inserting, and the location resulting from the hashing is empty, insert to this location. When inserting, and the location resulting from the hashing is not empty, the <key, value> pair is displaced to another close and empty location, and the displacement is written in the bit table

Hopscotch hashing

General idea (not the real-world implementation) (2)

If there is no close and empty location, we nd an item y which hashes between the rst empty location (j), and the current entry (i), but within H 1 elements. We displace y to j, thus we create an empty location close to i. If no solution exist, resize and rehash the table (unlikely, but possible, about (1/H!) of probability, that is 3.8 1036 for H = 32 and a good hashing function) When searching, hash the key, and lookup the entry If the entry found does not match the key, follow the displacement sequence until we get to the value If we cant nd the value at the end of the displacement sequence, return false

Hopscotch hashing

Example

Hopscotch hashing

How concurrency is handled

The strategy used is rather simple Fine grained locks are protecting the buckets from concurrent mutation A lock is therefore mapped to each bucket contains(key) is obstruction-free, and relies on timestamps to handle concurrency

Hopscotch hashing

The real Hopscotch hashing implementation

In place of the displacement in the array, we use regular pointers and a linked list, still having the locality constraint (i.e. location of linked-list node are enforced to be within H = hop_range of the initial node) The remove(key) method tries to optimize for cache line alignment, to rely on a minimal number of cache line, to avoid having to fetch from main memory.

Hopscotch hashing

Handling concurrency, nding the hot spot

Analysis of the data structure utilisation : Most of the operations are read-only (contains(key), search(key)) Each group of buckets (called Segment) has a timestamp eld to easily handle concurrent reads and writes (i.e. contains(key) and remove(key)) Each Segment also has a lock, to prevent concurrent mutation of data structure.

Hopscotch hashing

Linearization points
add(pair) and remove(key) use locks, and are deadlock-free, but not livelock-free, contains(key) is obstruction-free add(pair) : linearized when nding the key when it exists, when adding the bucket to the list of buckets in the linked list (updating the pointers) remove(key) : linearized when failing at nding the key (if the key does in fact no exist), or when the keys table entry is overwritten contains(key) : linearized when it nds the key, or when it reaches the end of the list (on an unsuccessful contains(key)); and the timestamp is unchanged.

Hopscotch hashing

Pseudocode for the obstruction-free contains(key) :

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

bool contains(KeyType key) { int hash = hash(key); Segment segment = get_segment(hash); Bucket current_bucket = get_bucket(segment, hash); int start_timestamp; do { start_timestamp = segment.timestamp[hash]; short next_delta = current_bucket.first_delta; while (NULL != next_delta) { current_bucket.first_delta += next_delta; if (key == current_bucket.key) { return true; } next_delta = current_bucket.next_delta } } while (start_timestamp != bucket.timestamp[hash]) return false; }

Hopscotch hashing

Performances analysis
Most important property : expected constant time performance In the common case, there is very few items in the buckets, which is : ( is the density, <= 0) Number of items in a bucket = 1 + e2 1 2 4

add(pair), remove(key), contains(key) complete in O(1) resize() completes in O(n) (n being the number of elements in the hash map)

Hopscotch hashing

Performances, mainly contains(key)

Hopscotch hashing

Performances, various operation

Hopscotch hashing

A beginning of an explanation?

Hopscotch hashing

Introduction Hopscotch algorithm implementation A simplied algorithm The real implementation Concurrency study Performances analysis Conclusion

Plan

Hopscotch hashing

Conclusion
Always have hardware consideration when designing a data structure The rather low concurrency guarantees can lead to very good performances Simple is not always better A 300 line data structure implementation can bring a tremendous speedup to a program

Hopscotch hashing

Questions?
Slides available at http://paul.cx/public/hopscotch.pdf

Hopscotch hashing

References
The paper itself : http: //www.springerlink.com/content/u710121187m65436/ Obstruction-free introduction paper : Herlihy, M.; Luchangco, V.; Moir, M.,, Distributed Computing Systems, 2003, Conference from Intel : http://www.gdcvault.com/play/1014645/ -SPONSORED-Hotspots-FLOPS-and Wikipedia pages : CPU cache, Locality of reference, Open addressing, Hash table. Master thesis : Sae-eung, Suntorn, "Analysis of False Cache Line Sharing Effects on Multicore CPUs" (2010). Masters Projects. Paper 2. http://scholarworks.sjsu.edu/etd_projects/2 The courses book

Hopscotch hashing

Maps
No ratings yet
Maps
36 pages
Implementation Priority Queue Using Array
No ratings yet
Implementation Priority Queue Using Array
3 pages
Linear Hash
No ratings yet
Linear Hash
15 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
No ratings yet
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
78 pages
Cuckoo Hashing For Undergraduates: Rasmus Pagh IT University of Copenhagen March 27, 2006
No ratings yet
Cuckoo Hashing For Undergraduates: Rasmus Pagh IT University of Copenhagen March 27, 2006
6 pages
Hashing
No ratings yet
Hashing
4 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
26 pages
210 Maps PDF
No ratings yet
210 Maps PDF
39 pages
CS2040 Summary
No ratings yet
CS2040 Summary
16 pages
ADS Unit 3
No ratings yet
ADS Unit 3
14 pages
Dsa 240404 220052
No ratings yet
Dsa 240404 220052
9 pages
Study Material On Hashing
No ratings yet
Study Material On Hashing
4 pages
Hash Table
No ratings yet
Hash Table
68 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
DSA Chapter 08 (Searching)
No ratings yet
DSA Chapter 08 (Searching)
65 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
L15 Maps and Hashes
No ratings yet
L15 Maps and Hashes
41 pages
Search vs. Hashing
No ratings yet
Search vs. Hashing
55 pages
Colossion in Hasing
No ratings yet
Colossion in Hasing
22 pages
9.map 1 HashTable
No ratings yet
9.map 1 HashTable
31 pages
Hashing
No ratings yet
Hashing
44 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing
No ratings yet
Hashing
38 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hash Function
No ratings yet
Hash Function
9 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
Hashing 1
No ratings yet
Hashing 1
26 pages
CS301 Lec41
No ratings yet
CS301 Lec41
18 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
DS 8
No ratings yet
DS 8
30 pages
Two-Level Cuckoo Hashing: John Erol Evangelista
No ratings yet
Two-Level Cuckoo Hashing: John Erol Evangelista
21 pages
CH 4
No ratings yet
CH 4
58 pages
Maps and Hashing - Final
No ratings yet
Maps and Hashing - Final
51 pages
06 - APS - Hash Table
No ratings yet
06 - APS - Hash Table
28 pages
ms0068 2 Sem
No ratings yet
ms0068 2 Sem
9 pages
Modifed Hash
No ratings yet
Modifed Hash
42 pages
Hashing
No ratings yet
Hashing
20 pages
Skip List & Hashing: Cse, Postech
No ratings yet
Skip List & Hashing: Cse, Postech
36 pages
Hash Table
No ratings yet
Hash Table
26 pages
Hashing
No ratings yet
Hashing
19 pages
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
No ratings yet
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
49 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
25 pages
11 Hash Tables Slides
No ratings yet
11 Hash Tables Slides
34 pages
DSA Unit VI Hashing and File Organization
No ratings yet
DSA Unit VI Hashing and File Organization
56 pages
DSimp 2
No ratings yet
DSimp 2
21 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
22csc22 Cat-3.1 - Answer Key
No ratings yet
22csc22 Cat-3.1 - Answer Key
22 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Hashing Techniques
No ratings yet
Hashing Techniques
25 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
Theory PDF
No ratings yet
Theory PDF
18 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Digital Signal Processing Prof. T. K. Basu Department of Electrical Engineering Indian Institute of Technology, Kharagpur
No ratings yet
Digital Signal Processing Prof. T. K. Basu Department of Electrical Engineering Indian Institute of Technology, Kharagpur
26 pages
MATLAB Simulink
88% (16)
MATLAB Simulink
89 pages
A Hierarchical Visual Model For Video Object Summarization
No ratings yet
A Hierarchical Visual Model For Video Object Summarization
13 pages
Nikolaev&Van Leeuwen-Psychol Res
No ratings yet
Nikolaev&Van Leeuwen-Psychol Res
28 pages
Gui With Matlab
100% (1)
Gui With Matlab
17 pages
Synopsis On Vedic Multipler
No ratings yet
Synopsis On Vedic Multipler
10 pages
Optiplex 780 Tech Spec Sheet
No ratings yet
Optiplex 780 Tech Spec Sheet
2 pages
File Hash Examples
No ratings yet
File Hash Examples
6 pages
Project Concept: Contents: Transfer Tools and Methods Tasks in Data Transfer Projects
No ratings yet
Project Concept: Contents: Transfer Tools and Methods Tasks in Data Transfer Projects
39 pages
Debug Location Error in Tally Erp9
No ratings yet
Debug Location Error in Tally Erp9
5 pages
Checklists For Sap Administration 47906
No ratings yet
Checklists For Sap Administration 47906
20 pages
Expose Redacted PDF Information
No ratings yet
Expose Redacted PDF Information
2 pages
Cisco Training by Rahul Nadem
No ratings yet
Cisco Training by Rahul Nadem
53 pages
Chapter 8 Homework Assignment Ver 1.0
No ratings yet
Chapter 8 Homework Assignment Ver 1.0
2 pages
To Flash A Caterpillar Engine From A High Interlock File To A Low Interlock File
100% (2)
To Flash A Caterpillar Engine From A High Interlock File To A Low Interlock File
2 pages
PC 851 TroubleshootingGuide
0% (1)
PC 851 TroubleshootingGuide
614 pages
SQL Server System Databases
No ratings yet
SQL Server System Databases
2 pages
KeepSolid VPN Unlimited
No ratings yet
KeepSolid VPN Unlimited
3 pages
Citect 2013
No ratings yet
Citect 2013
12 pages
Mysql Cheat Sheet
No ratings yet
Mysql Cheat Sheet
3 pages
Chapter 2: Problem Solving: - Software Development Method
No ratings yet
Chapter 2: Problem Solving: - Software Development Method
26 pages
Info
No ratings yet
Info
2 pages
Reportlab Userguide
No ratings yet
Reportlab Userguide
119 pages
Familiarization With PC Components: Computer Hardware and Networking Lab (R707)
No ratings yet
Familiarization With PC Components: Computer Hardware and Networking Lab (R707)
93 pages
AWS Notes
No ratings yet
AWS Notes
37 pages
Presentation On New NVSP Through CSC PDF
No ratings yet
Presentation On New NVSP Through CSC PDF
15 pages
2017 CCTechSummit 7 CCX Solutions Troubleshooting
No ratings yet
2017 CCTechSummit 7 CCX Solutions Troubleshooting
125 pages
Help RNC Command
No ratings yet
Help RNC Command
7 pages
Cae Downhole Explorer 3.22.24.0 Beta
No ratings yet
Cae Downhole Explorer 3.22.24.0 Beta
16 pages
Chapter 06
No ratings yet
Chapter 06
50 pages
Intruders
No ratings yet
Intruders
18 pages
Lesson Three. Computer Essentials
No ratings yet
Lesson Three. Computer Essentials
4 pages
RFID Projek - Student Attendance Class Using MySQL Database - Projek Elektronik
No ratings yet
RFID Projek - Student Attendance Class Using MySQL Database - Projek Elektronik
4 pages
GemFire Introduction Hands-On Labs
No ratings yet
GemFire Introduction Hands-On Labs
19 pages
Distributed Memory Programming With Mpi: Collective vs. Point-to-Point Communications
No ratings yet
Distributed Memory Programming With Mpi: Collective vs. Point-to-Point Communications
6 pages

Hopscotch

Uploaded by

Hopscotch

Uploaded by

Hopscotch hashing, a scalable, concurrent, resizable hash map implementation

or, how to write a linearly scalable hash map up to 64 cores

Paul ADENOT, <[email protected]>

November 18, 2011

Widely used in software you use every day :

State of the art

Cache access speed

Main characteristics of Hopscotch hash map

General idea (not the real-world implementation) (1)

General idea (not the real-world implementation) (2)

How concurrency is handled

The real Hopscotch hashing implementation

Handling concurrency, nding the hot spot

Pseudocode for the obstruction-free contains(key) :

Performances, mainly contains(key)

Performances, various operation

You might also like